{"id":4115,"date":"2014-07-15T09:00:20","date_gmt":"2014-07-15T12:00:20","guid":{"rendered":"http:\/\/blog.plataformatec.com.br\/?p=4115"},"modified":"2014-07-14T18:34:32","modified_gmt":"2014-07-14T21:34:32","slug":"rails-4-and-postgresql-arrays","status":"publish","type":"post","link":"https:\/\/blog.plataformatec.com.br\/2014\/07\/rails-4-and-postgresql-arrays\/","title":{"rendered":"Rails 4 and PostgreSQL Arrays"},"content":{"rendered":"<p>Rails 4 supports arrays fields for PostgreSQL in a nice way, although it is not a very known feature. In order to demonstrate its usage it&#8217;s useful to explain the context where this was used.<\/p>\n<h3>PostgreSQL Arrays and Rails Migrations<\/h3>\n<p>Suppose we have a <i>Product model<\/i> with the following fields: <i>name<\/i>, <i>category_id<\/i> and <i>tags<\/i>. The <i>name <\/i>field will be a simple string, <i>category_id <\/i>will be the <i>foreign key<\/i> of a record in the <i>Category model <\/i>and <i>tags <\/i>will be created by inputting a string of comma-separated words, so: <i>&#8220;one, two, forty two&#8221;<\/i> will become the tags: <i>&#8220;one&#8221;<\/i>,<i> &#8220;two&#8221; <\/i>and<i> &#8220;forty two&#8221; <\/i>respectively<i>.<\/i><\/p>\n<p>Creating these tables via migrations is nothing new, except for the column <i>tags <\/i>which will have the Array type in this case. To create this kind of column we use the following syntax in our migration:<\/p>\n<pre lang=\"ruby\">create_table :categories do |t|\r\n  t.string :name, null: false\r\nend\r\n\r\ncreate_table :products do |t|\r\n  t.string :name, null: false\r\n  t.references :category, null: false\r\n  t.text :tags, array: true, default: []\r\nend\r\n<\/pre>\n<p>Let&#8217;s explore what we can do with this kind of field using the postgres console:<\/p>\n<pre lang=\"plsql\">$ rails db\r\n> insert into products(name, category_id, tags) values('T-Shirt', 3, '{clothing, summer}');\r\n> insert into products(name, category_id, tags) values('Sweater', 3, ARRAY['clothing', 'winter']);\r\n> select * from products;\r\n1 \u00a0| \u00a0T-Shirt \u00a0| \u00a03 \u00a0| {clothing, summer}\r\n2 \u00a0| \u00a0Sweater \u00a0| \u00a03 \u00a0| {clothing, winter}\r\n<\/pre>\n<p>As we can see we need to specify each tag following this syntax:<\/p>\n<p><i>&#8216;{ val1, val2, \u2026 }&#8217; <\/i>or <i>ARRAY[&#8216;val1&#8217;, &#8216;val2&#8217;, &#8230;]<\/i><\/p>\n<p>Let&#8217;s play a little more to understand how this column behaves when queried:<\/p>\n<pre lang=\"plsql\">\r\n> select * from products where tags = '{clothing, summer}';\r\n1 \u00a0| \u00a0T-Shirt \u00a0| \u00a03 \u00a0| {clothing, summer}\r\n\r\n> select * from products where tags = '{summer, clothing}';\r\n(0 rows)\r\n\r\n> select * from products where 'winter' = ANY(tags);\r\n2 \u00a0| \u00a0Sweater \u00a0| \u00a03 \u00a0| \u00a0{clothing, winter}\r\n<\/pre>\n<p>As this example demonstrates, searching for records by an array with its values in the order they were inserted works, but with the same values in a different order does not. We were also able to find a record searching for a specific tag using the ANY function.<\/p>\n<p>There&#8217;s a lot more to talk about arrays in PostgreSQL, but for our example this is enough. You can find more information at the PostgreSQL official documentation about <a href=\"http:\/\/www.postgresql.org\/docs\/current\/static\/arrays.html\">arrays<\/a> and its <a href=\"http:\/\/www.postgresql.org\/docs\/current\/static\/functions-array.html\">functions<\/a>.<\/p>\n<h3>How Rails treats PostgreSQL arrays<\/h3>\n<p>It&#8217;s also valuable to see how to use the array field within Rails, let&#8217;s try:<\/p>\n<pre lang=\"rails\">$ rails c\r\n\r\nProduct.create(name: 'Shoes', category: Category.first, tags: ['a', 'b', 'c'])\r\n#> \r\n\r\nProduct.find(26).tags\r\n#> [\"a\", \"b\", \"c\"]\r\n<\/pre>\n<p>So Rails treats an array column in PostgreSQL as an Array in Ruby, pretty reasonable!<\/p>\n<h3>Validations<\/h3>\n<p>We want each product to be unique, let&#8217;s see some examples to clarify this concept.<\/p>\n<p>Given we have the following product:<\/p>\n<pre lang=\"rails\">Product.create(name: 'Shoes', category: Category.first, tags: ['a', 'b', 'c'])\r\n<\/pre>\n<p>We can easily create another one if we change the <i>name<\/i> attribute:<\/p>\n<pre lang=\"rails\">Product.create(name: 'Slippers', category: Category.first, tags: ['a', 'b', 'c'])\r\n<\/pre>\n<p>We can also create another product with different tags:<\/p>\n<pre lang=\"rails\">Product.create(name: 'Shoes', category: Category.first, tags: ['a', 'b'])\r\n<\/pre>\n<p>But we don&#8217;t want to create a product with the same attributes, even if the tags are in a different order:<\/p>\n<pre lang=\"rails\">Product.create(name: 'Shoes', category: Category.first, tags: ['a', 'c', 'b'])\r\n#> false\r\n<\/pre>\n<p>As PostgreSQL only finds records by tags given the exact order in which they were inserted, then how can we ensure the uniqueness of a product with tags in an order-independent way?<\/p>\n<p>After much thought we decided that a good approach would involve creating an unique index with all the columns in the products table but with tags sorted when a row is inserted in the database. Something like:<\/p>\n<pre lang=\"plsql\">CREATE UNIQUE INDEX index_products_on_category_id_and_name_and_tags\r\nON products USING btree (category_id, name, sort_array(tags));\r\n<\/pre>\n<p>And <i>sort_array <\/i>is our custom function responsible for sorting the array, since PostgreSQL does not have a built in function like this.<\/p>\n<h3>Creating a custom function in PostgreSQL using PL\/pgSQL<\/h3>\n<p>To create a custom function we used the <i>PL\/pgSQL<\/i> language, and since we are adding database specific code like this we can&#8217;t use the default <i>schema.rb <\/i>anymore. Let&#8217;s change this in <i>config\/application.rb<\/i>:<\/p>\n<pre lang=\"rails\"># Use SQL instead of AR schema dumper when creating the database\r\nconfig.active_record.schema_format = :sql\r\n<\/pre>\n<p>With this configuration set, our <i>schema.rb<\/i> file will be replaced by a <i>structure.sql<\/i> file without side effects, our current migrations don&#8217;t need to be changed at all. Now we can create a migration with our <i>sort_array <\/i>code:<\/p>\n<pre lang=\"rails\">\r\ndef up\r\n  execute <<-SQL\r\n    CREATE FUNCTION sort_array(unsorted_array anyarray) RETURNS anyarray AS $$\r\n      BEGIN\r\n        RETURN (SELECT ARRAY_AGG(val) AS sorted_array\r\n        FROM (SELECT UNNEST(unsorted_array) AS val ORDER BY val) AS sorted_vals);\r\n      END;\r\n    $$ LANGUAGE plpgsql IMMUTABLE STRICT;\r\n\r\n    CREATE UNIQUE INDEX index_products_on_category_id_and_name_and_tags ON products USING btree (category_id, name, sort_array(tags));\r\n  SQL\r\nend\r\n\r\ndef down\r\n  execute <<-SQL\r\n    DROP INDEX IF EXISTS index_products_on_category_id_and_name_and_tags;\r\n    DROP FUNCTION IF EXISTS sort_array(unsorted_array anyarray);\r\n  SQL\r\nend\r\n<\/pre>\n<p>Now, let's take it slow and understand step by step<\/p>\n<pre lang=\"plsql\">CREATE FUNCTION sort_array(unsorted_array anyarray) RETURNS anyarray\r\n<\/pre>\n<p>The line above tells that we are creating a function named <i>sort_array <\/i>and that it receives a parameter named <i>unsorted_array <\/i>of type <i>anyarray <\/i>and returns something of this same type. This <i>anyarray,<\/i> in fact, is a <i>pseudo-type <\/i>that indicates that a function accepts any array data type.<\/p>\n<pre lang=\"plsql\">RETURN (SELECT ARRAY_AGG(val) AS sorted_array\r\nFROM (SELECT UNNEST(unsorted_array) AS val ORDER BY val) AS sorted_vals);\r\n<\/pre>\n<p>The trick here is the use of the function <i>unnest <\/i>that expands an Array to a set of rows. Now we can order these rows and after that we use another function called <i>array_agg <\/i>that concatenates the input into a new Array.<\/p>\n<pre lang=\"plsql\">$$ LANGUAGE plpgsql IMMUTABLE STRICT;\r\n<\/pre>\n<p>The last trick is the use of the keywords <i>IMMUTABLE<\/i> \u00a0and <i>STRICT<\/i>. With the first one we guarantee that our function will always return the same output given the same input, we can't use it in our index if we don't specify so. The other one tells that our function will always return <i>null <\/i>if some of the parameters are not specified.<\/p>\n<p>And that's it! With this we can check for uniqueness in a performant way with some method like:<\/p>\n<pre lang=\"rails\">def duplicate_product_exists?\r\n  relation = self.class.\r\n    where(category_id: category_id).\r\n    where('lower(name) = lower(?)', name).\r\n    where('sort_array(tags) = sort_array(ARRAY[?])', tags)\r\n\r\n  relation = relation.where.not(id: id) if persisted?\r\n\r\n  relation.exists?\r\nend\r\n<\/pre>\n<h3>Case insensitive arrays<\/h3>\n<p>There is still a problem with our code though, the index is not case insensitive! \u00a0What if a user inserts a product with tags ['a', 'b'] and another one inserts the same product but with tags ['A', 'b']? Now we have duplication in our database! We have to deal with this, but unfortunately this will increase the complexity of our <i>sort_array <\/i>function a little bit. To fix this problem we only need to change one single line:<\/p>\n<p>From this:<\/p>\n<pre lang=\"plsql\">\r\nFROM (SELECT UNNEST(unsorted_array) AS val ORDER BY val) AS sorted_vals);\r\n<\/pre>\n<p>To:<\/p>\n<pre lang=\"plsql\">\r\nFROM\r\n(SELECT\r\n  UNNEST(string_to_array(lower(array_to_string(unsorted_array, ',')), ','))\r\n  AS val ORDER BY val)\r\nAS sorted_vals);\r\n<\/pre>\n<p>The difference is that instead of passing unsorted_array directly to the function <i>unnest <\/i>we are transforming it in an String, calling lower on it and transforming it back to an Array before passing it on. With this change it doesn't matter if the user inserts ['a'] or ['A'], every tag will be saved in lowercase in the index. Problem solved!<\/p>\n<p>As we can see, it's not an easy task to deal with uniqueness and arrays in the database, but the overall result was great.<\/p>\n<p><i>Would you solve this problem in a different way? Share with us!<\/i><\/p>\n<p style=\"text-align: center;\">\n<p><span id=\"hs-cta-wrapper-2aeae558-5b72-4df3-bf32-e1119f34d85e\" class=\"hs-cta-wrapper\"><span id=\"hs-cta-2aeae558-5b72-4df3-bf32-e1119f34d85e\" class=\"hs-cta-node hs-cta-2aeae558-5b72-4df3-bf32-e1119f34d85e\"> <a href=\"http:\/\/cta-redirect.hubspot.com\/cta\/redirect\/378213\/2aeae558-5b72-4df3-bf32-e1119f34d85e\"><img decoding=\"async\" id=\"hs-cta-img-2aeae558-5b72-4df3-bf32-e1119f34d85e\" class=\"hs-cta-img aligncenter\" style=\"border-width: 0px;\" src=\"https:\/\/no-cache.hubspot.com\/cta\/default\/378213\/2aeae558-5b72-4df3-bf32-e1119f34d85e.png\" alt=\"\" \/><\/a><\/span><\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Rails 4 supports arrays fields for PostgreSQL in a nice way, although it is not a very known feature. In order to demonstrate its usage it&#8217;s useful to explain the context where this was used. PostgreSQL Arrays and Rails Migrations Suppose we have a Product model with the following fields: name, category_id and tags. The &#8230; <a class=\"read-more-link\" href=\"https:\/\/blog.plataformatec.com.br\/2014\/07\/rails-4-and-postgresql-arrays\/\">\u00bb<\/a><\/p>\n","protected":false},"author":30,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"ngg_post_thumbnail":0,"footnotes":""},"categories":[1],"tags":[219,176],"aioseo_notices":[],"jetpack_sharing_enabled":true,"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/blog.plataformatec.com.br\/wp-json\/wp\/v2\/posts\/4115"}],"collection":[{"href":"https:\/\/blog.plataformatec.com.br\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.plataformatec.com.br\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.plataformatec.com.br\/wp-json\/wp\/v2\/users\/30"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.plataformatec.com.br\/wp-json\/wp\/v2\/comments?post=4115"}],"version-history":[{"count":14,"href":"https:\/\/blog.plataformatec.com.br\/wp-json\/wp\/v2\/posts\/4115\/revisions"}],"predecessor-version":[{"id":4130,"href":"https:\/\/blog.plataformatec.com.br\/wp-json\/wp\/v2\/posts\/4115\/revisions\/4130"}],"wp:attachment":[{"href":"https:\/\/blog.plataformatec.com.br\/wp-json\/wp\/v2\/media?parent=4115"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.plataformatec.com.br\/wp-json\/wp\/v2\/categories?post=4115"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.plataformatec.com.br\/wp-json\/wp\/v2\/tags?post=4115"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}