{"id":4115,"date":"2014-07-15T09:00:20","date_gmt":"2014-07-15T12:00:20","guid":{"rendered":"http:\/\/blog.plataformatec.com.br\/?p=4115"},"modified":"2014-07-14T18:34:32","modified_gmt":"2014-07-14T21:34:32","slug":"rails-4-and-postgresql-arrays","status":"publish","type":"post","link":"https:\/\/blog.plataformatec.com.br\/2014\/07\/rails-4-and-postgresql-arrays\/","title":{"rendered":"Rails 4 and PostgreSQL Arrays"},"content":{"rendered":"

Rails 4 supports arrays fields for PostgreSQL in a nice way, although it is not a very known feature. In order to demonstrate its usage it’s useful to explain the context where this was used.<\/p>\n

PostgreSQL Arrays and Rails Migrations<\/h3>\n

Suppose we have a Product model<\/i> with the following fields: name<\/i>, category_id<\/i> and tags<\/i>. The name <\/i>field will be a simple string, category_id <\/i>will be the foreign key<\/i> of a record in the Category model <\/i>and tags <\/i>will be created by inputting a string of comma-separated words, so: “one, two, forty two”<\/i> will become the tags: “one”<\/i>, “two” <\/i>and “forty two” <\/i>respectively.<\/i><\/p>\n

Creating these tables via migrations is nothing new, except for the column tags <\/i>which will have the Array type in this case. To create this kind of column we use the following syntax in our migration:<\/p>\n

create_table :categories do |t|\r\n  t.string :name, null: false\r\nend\r\n\r\ncreate_table :products do |t|\r\n  t.string :name, null: false\r\n  t.references :category, null: false\r\n  t.text :tags, array: true, default: []\r\nend\r\n<\/pre>\n

Let’s explore what we can do with this kind of field using the postgres console:<\/p>\n

$ rails db\r\n> insert into products(name, category_id, tags) values('T-Shirt', 3, '{clothing, summer}');\r\n> insert into products(name, category_id, tags) values('Sweater', 3, ARRAY['clothing', 'winter']);\r\n> select * from products;\r\n1 \u00a0| \u00a0T-Shirt \u00a0| \u00a03 \u00a0| {clothing, summer}\r\n2 \u00a0| \u00a0Sweater \u00a0| \u00a03 \u00a0| {clothing, winter}\r\n<\/pre>\n

As we can see we need to specify each tag following this syntax:<\/p>\n

‘{ val1, val2, \u2026 }’ <\/i>or ARRAY[‘val1’, ‘val2’, …]<\/i><\/p>\n

Let’s play a little more to understand how this column behaves when queried:<\/p>\n

\r\n> select * from products where tags = '{clothing, summer}';\r\n1 \u00a0| \u00a0T-Shirt \u00a0| \u00a03 \u00a0| {clothing, summer}\r\n\r\n> select * from products where tags = '{summer, clothing}';\r\n(0 rows)\r\n\r\n> select * from products where 'winter' = ANY(tags);\r\n2 \u00a0| \u00a0Sweater \u00a0| \u00a03 \u00a0| \u00a0{clothing, winter}\r\n<\/pre>\n

As this example demonstrates, searching for records by an array with its values in the order they were inserted works, but with the same values in a different order does not. We were also able to find a record searching for a specific tag using the ANY function.<\/p>\n

There’s a lot more to talk about arrays in PostgreSQL, but for our example this is enough. You can find more information at the PostgreSQL official documentation about arrays<\/a> and its functions<\/a>.<\/p>\n

How Rails treats PostgreSQL arrays<\/h3>\n

It’s also valuable to see how to use the array field within Rails, let’s try:<\/p>\n

$ rails c\r\n\r\nProduct.create(name: 'Shoes', category: Category.first, tags: ['a', 'b', 'c'])\r\n#> \r\n\r\nProduct.find(26).tags\r\n#> [\"a\", \"b\", \"c\"]\r\n<\/pre>\n

So Rails treats an array column in PostgreSQL as an Array in Ruby, pretty reasonable!<\/p>\n

Validations<\/h3>\n

We want each product to be unique, let’s see some examples to clarify this concept.<\/p>\n

Given we have the following product:<\/p>\n

Product.create(name: 'Shoes', category: Category.first, tags: ['a', 'b', 'c'])\r\n<\/pre>\n

We can easily create another one if we change the name<\/i> attribute:<\/p>\n

Product.create(name: 'Slippers', category: Category.first, tags: ['a', 'b', 'c'])\r\n<\/pre>\n

We can also create another product with different tags:<\/p>\n

Product.create(name: 'Shoes', category: Category.first, tags: ['a', 'b'])\r\n<\/pre>\n

But we don’t want to create a product with the same attributes, even if the tags are in a different order:<\/p>\n

Product.create(name: 'Shoes', category: Category.first, tags: ['a', 'c', 'b'])\r\n#> false\r\n<\/pre>\n

As PostgreSQL only finds records by tags given the exact order in which they were inserted, then how can we ensure the uniqueness of a product with tags in an order-independent way?<\/p>\n

After much thought we decided that a good approach would involve creating an unique index with all the columns in the products table but with tags sorted when a row is inserted in the database. Something like:<\/p>\n

CREATE UNIQUE INDEX index_products_on_category_id_and_name_and_tags\r\nON products USING btree (category_id, name, sort_array(tags));\r\n<\/pre>\n

And sort_array <\/i>is our custom function responsible for sorting the array, since PostgreSQL does not have a built in function like this.<\/p>\n

Creating a custom function in PostgreSQL using PL\/pgSQL<\/h3>\n

To create a custom function we used the PL\/pgSQL<\/i> language, and since we are adding database specific code like this we can’t use the default schema.rb <\/i>anymore. Let’s change this in config\/application.rb<\/i>:<\/p>\n

# Use SQL instead of AR schema dumper when creating the database\r\nconfig.active_record.schema_format = :sql\r\n<\/pre>\n

With this configuration set, our schema.rb<\/i> file will be replaced by a structure.sql<\/i> file without side effects, our current migrations don’t need to be changed at all. Now we can create a migration with our sort_array <\/i>code:<\/p>\n

\r\ndef up\r\n  execute <<-SQL\r\n    CREATE FUNCTION sort_array(unsorted_array anyarray) RETURNS anyarray AS $$\r\n      BEGIN\r\n        RETURN (SELECT ARRAY_AGG(val) AS sorted_array\r\n        FROM (SELECT UNNEST(unsorted_array) AS val ORDER BY val) AS sorted_vals);\r\n      END;\r\n    $$ LANGUAGE plpgsql IMMUTABLE STRICT;\r\n\r\n    CREATE UNIQUE INDEX index_products_on_category_id_and_name_and_tags ON products USING btree (category_id, name, sort_array(tags));\r\n  SQL\r\nend\r\n\r\ndef down\r\n  execute <<-SQL\r\n    DROP INDEX IF EXISTS index_products_on_category_id_and_name_and_tags;\r\n    DROP FUNCTION IF EXISTS sort_array(unsorted_array anyarray);\r\n  SQL\r\nend\r\n<\/pre>\n

Now, let's take it slow and understand step by step<\/p>\n

CREATE FUNCTION sort_array(unsorted_array anyarray) RETURNS anyarray\r\n<\/pre>\n

The line above tells that we are creating a function named sort_array <\/i>and that it receives a parameter named unsorted_array <\/i>of type anyarray <\/i>and returns something of this same type. This anyarray,<\/i> in fact, is a pseudo-type <\/i>that indicates that a function accepts any array data type.<\/p>\n

RETURN (SELECT ARRAY_AGG(val) AS sorted_array\r\nFROM (SELECT UNNEST(unsorted_array) AS val ORDER BY val) AS sorted_vals);\r\n<\/pre>\n

The trick here is the use of the function unnest <\/i>that expands an Array to a set of rows. Now we can order these rows and after that we use another function called array_agg <\/i>that concatenates the input into a new Array.<\/p>\n

$$ LANGUAGE plpgsql IMMUTABLE STRICT;\r\n<\/pre>\n

The last trick is the use of the keywords IMMUTABLE<\/i> \u00a0and STRICT<\/i>. With the first one we guarantee that our function will always return the same output given the same input, we can't use it in our index if we don't specify so. The other one tells that our function will always return null <\/i>if some of the parameters are not specified.<\/p>\n

And that's it! With this we can check for uniqueness in a performant way with some method like:<\/p>\n

def duplicate_product_exists?\r\n  relation = self.class.\r\n    where(category_id: category_id).\r\n    where('lower(name) = lower(?)', name).\r\n    where('sort_array(tags) = sort_array(ARRAY[?])', tags)\r\n\r\n  relation = relation.where.not(id: id) if persisted?\r\n\r\n  relation.exists?\r\nend\r\n<\/pre>\n

Case insensitive arrays<\/h3>\n

There is still a problem with our code though, the index is not case insensitive! \u00a0What if a user inserts a product with tags ['a', 'b'] and another one inserts the same product but with tags ['A', 'b']? Now we have duplication in our database! We have to deal with this, but unfortunately this will increase the complexity of our sort_array <\/i>function a little bit. To fix this problem we only need to change one single line:<\/p>\n

From this:<\/p>\n

\r\nFROM (SELECT UNNEST(unsorted_array) AS val ORDER BY val) AS sorted_vals);\r\n<\/pre>\n

To:<\/p>\n

\r\nFROM\r\n(SELECT\r\n  UNNEST(string_to_array(lower(array_to_string(unsorted_array, ',')), ','))\r\n  AS val ORDER BY val)\r\nAS sorted_vals);\r\n<\/pre>\n

The difference is that instead of passing unsorted_array directly to the function unnest <\/i>we are transforming it in an String, calling lower on it and transforming it back to an Array before passing it on. With this change it doesn't matter if the user inserts ['a'] or ['A'], every tag will be saved in lowercase in the index. Problem solved!<\/p>\n

As we can see, it's not an easy task to deal with uniqueness and arrays in the database, but the overall result was great.<\/p>\n

Would you solve this problem in a different way? Share with us!<\/i><\/p>\n

\n

\"\"<\/a><\/span><\/span><\/p>\n","protected":false},"excerpt":{"rendered":"

Rails 4 supports arrays fields for PostgreSQL in a nice way, although it is not a very known feature. In order to demonstrate its usage it’s useful to explain the context where this was used. PostgreSQL Arrays and Rails Migrations Suppose we have a Product model with the following fields: name, category_id and tags. The … \u00bb<\/a><\/p>\n","protected":false},"author":30,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"ngg_post_thumbnail":0,"footnotes":""},"categories":[1],"tags":[219,176],"aioseo_notices":[],"jetpack_sharing_enabled":true,"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/blog.plataformatec.com.br\/wp-json\/wp\/v2\/posts\/4115"}],"collection":[{"href":"https:\/\/blog.plataformatec.com.br\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.plataformatec.com.br\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.plataformatec.com.br\/wp-json\/wp\/v2\/users\/30"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.plataformatec.com.br\/wp-json\/wp\/v2\/comments?post=4115"}],"version-history":[{"count":14,"href":"https:\/\/blog.plataformatec.com.br\/wp-json\/wp\/v2\/posts\/4115\/revisions"}],"predecessor-version":[{"id":4130,"href":"https:\/\/blog.plataformatec.com.br\/wp-json\/wp\/v2\/posts\/4115\/revisions\/4130"}],"wp:attachment":[{"href":"https:\/\/blog.plataformatec.com.br\/wp-json\/wp\/v2\/media?parent=4115"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.plataformatec.com.br\/wp-json\/wp\/v2\/categories?post=4115"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.plataformatec.com.br\/wp-json\/wp\/v2\/tags?post=4115"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}