Posts by Bernardo Chaves

Rails 4 supports arrays fields for PostgreSQL in a nice way, although it is not a very known feature. In order to demonstrate its usage it’s useful to explain the context where this was used.

PostgreSQL Arrays and Rails Migrations

Suppose we have a Product model with the following fields: name, category_id and tags. The name field will be a simple string, category_id will be the foreign key of a record in the Category model and tags will be created by inputting a string of comma-separated words, so: “one, two, forty two” will become the tags: “one”, “two” and “forty two” respectively.

Creating these tables via migrations is nothing new, except for the column tags which will have the Array type in this case. To create this kind of column we use the following syntax in our migration:

create_table :categories do |t|
  t.string :name, null: false
end
 
create_table :products do |t|
  t.string :name, null: false
  t.references :category, null: false
  t.text :tags, array: true, default: []
end

Let’s explore what we can do with this kind of field using the postgres console:

$ rails db
> INSERT INTO products(name, category_id, tags) VALUES('T-Shirt', 3, '{clothing, summer}');
> INSERT INTO products(name, category_id, tags) VALUES('Sweater', 3, ARRAY['clothing', 'winter']);
> SELECT * FROM products;
1  |  T-Shirt  |  3  | {clothing, summer}
2  |  Sweater  |  3  | {clothing, winter}

As we can see we need to specify each tag following this syntax:

‘{ val1, val2, … }’ or ARRAY['val1', 'val2', ...]

Let’s play a little more to understand how this column behaves when queried:

> SELECT * FROM products WHERE tags = '{clothing, summer}';
1  |  T-Shirt  |  3  | {clothing, summer}
 
> SELECT * FROM products WHERE tags = '{summer, clothing}';
(0 rows)
 
> SELECT * FROM products WHERE 'winter' = ANY(tags);
2  |  Sweater  |  3  |  {clothing, winter}

As this example demonstrates, searching for records by an array with its values in the order they were inserted works, but with the same values in a different order does not. We were also able to find a record searching for a specific tag using the ANY function.

There’s a lot more to talk about arrays in PostgreSQL, but for our example this is enough. You can find more information at the PostgreSQL official documentation about arrays and its functions.

How Rails treats PostgreSQL arrays

It’s also valuable to see how to use the array field within Rails, let’s try:

$ rails c
 
Product.create(name: 'Shoes', category: Category.first, tags: ['a', 'b', 'c'])
#> 
 
Product.find(26).tags
#> ["a", "b", "c"]

So Rails treats an array column in PostgreSQL as an Array in Ruby, pretty reasonable!

Validations

We want each product to be unique, let’s see some examples to clarify this concept.

Given we have the following product:

Product.create(name: 'Shoes', category: Category.first, tags: ['a', 'b', 'c'])

We can easily create another one if we change the name attribute:

Product.create(name: 'Slippers', category: Category.first, tags: ['a', 'b', 'c'])

We can also create another product with different tags:

Product.create(name: 'Shoes', category: Category.first, tags: ['a', 'b'])

But we don’t want to create a product with the same attributes, even if the tags are in a different order:

Product.create(name: 'Shoes', category: Category.first, tags: ['a', 'c', 'b'])
#> false

As PostgreSQL only finds records by tags given the exact order in which they were inserted, then how can we ensure the uniqueness of a product with tags in an order-independent way?

After much thought we decided that a good approach would involve creating an unique index with all the columns in the products table but with tags sorted when a row is inserted in the database. Something like:

CREATE UNIQUE INDEX index_products_on_category_id_and_name_and_tags
ON products USING btree (category_id, name, sort_array(tags));

And sort_array is our custom function responsible for sorting the array, since PostgreSQL does not have a built in function like this.

Creating a custom function in PostgreSQL using PL/pgSQL

To create a custom function we used the PL/pgSQL language, and since we are adding database specific code like this we can’t use the default schema.rb anymore. Let’s change this in config/application.rb:

# Use SQL instead of AR schema dumper when creating the database
config.active_record.schema_format = :sql

With this configuration set, our schema.rb file will be replaced by a structure.sql file without side effects, our current migrations don’t need to be changed at all. Now we can create a migration with our sort_array code:

def up
  execute <<-SQL
    CREATE FUNCTION sort_array(unsorted_array anyarray) RETURNS anyarray AS $$
      BEGIN
        RETURN (SELECT ARRAY_AGG(val) AS sorted_array
        FROM (SELECT UNNEST(unsorted_array) AS val ORDER BY val) AS sorted_vals);
      END;
    $$ LANGUAGE plpgsql IMMUTABLE STRICT;
 
    CREATE UNIQUE INDEX index_products_on_category_id_and_name_and_tags ON products USING btree (category_id, name, sort_array(tags));
  SQL
end
 
def down
  execute <<-SQL
    DROP INDEX IF EXISTS index_products_on_category_id_and_name_and_tags;
    DROP FUNCTION IF EXISTS sort_array(unsorted_array anyarray);
  SQL
end

Now, let’s take it slow and understand step by step

CREATE FUNCTION sort_array(unsorted_array anyarray) RETURNS anyarray

The line above tells that we are creating a function named sort_array and that it receives a parameter named unsorted_array of type anyarray and returns something of this same type. This anyarray, in fact, is a pseudo-type that indicates that a function accepts any array data type.

RETURN (SELECT ARRAY_AGG(val) AS sorted_array
FROM (SELECT UNNEST(unsorted_array) AS val ORDER BY val) AS sorted_vals);

The trick here is the use of the function unnest that expands an Array to a set of rows. Now we can order these rows and after that we use another function called array_agg that concatenates the input into a new Array.

$$ LANGUAGE plpgsql IMMUTABLE STRICT;

The last trick is the use of the keywords IMMUTABLE  and STRICT. With the first one we guarantee that our function will always return the same output given the same input, we can’t use it in our index if we don’t specify so. The other one tells that our function will always return null if some of the parameters are not specified.

And that’s it! With this we can check for uniqueness in a performant way with some method like:

def duplicate_product_exists?
  relation = self.class.
    where(category_id: category_id).
    where('lower(name) = lower(?)', name).
    where('sort_array(tags) = sort_array(ARRAY[?])', tags)
 
  relation = relation.where.not(id: id) if persisted?
 
  relation.exists?
end

Case insensitive arrays

There is still a problem with our code though, the index is not case insensitive!  What if a user inserts a product with tags ['a', 'b'] and another one inserts the same product but with tags ['A', 'b']? Now we have duplication in our database! We have to deal with this, but unfortunately this will increase the complexity of our sort_array function a little bit. To fix this problem we only need to change one single line:

From this:

FROM (SELECT UNNEST(unsorted_array) AS val ORDER BY val) AS sorted_vals);

To:

FROM
(SELECT
  UNNEST(string_to_array(LOWER(array_to_string(unsorted_array, ',')), ','))
  AS val ORDER BY val)
AS sorted_vals);

The difference is that instead of passing unsorted_array directly to the function unnest we are transforming it in an String, calling lower on it and transforming it back to an Array before passing it on. With this change it doesn’t matter if the user inserts ['a'] or ['A'], every tag will be saved in lowercase in the index. Problem solved!

As we can see, it’s not an easy task to deal with uniqueness and arrays in the database, but the overall result was great.

Would you solve this problem in a different way? Share with us!