“Do not use floating point for currency”

This simple statement is useful for both novice and experienced developers alike. It avoids problems that might otherwise compromise the correctness of the software you’re building.

But behind every piece of wisdom, every history of success, there is hard work from lots of people. Exploring this background is an enriching experience, the kind of thing that elevates the level of knowledge and makes you more confident about what you are doing.

Do not expect to find here a complete guide to floating point or currency handling. There are enough references on each matter separately out there already. The idea is to provide a brief exploration in a top-down fashion, gradually digging each layer of the stack, from the perspective of someone that do not master floating point.

Let’s start with a simple example in irb (Ruby’s Interactive Shell), which happens to be a common source of questions:

0.1 + 0.02 == 0.12# => false
 
(0.1 + 0.02) + 0.3 == 0.1 + (0.02 + 0.3)
# => false

Wasn’t it supposed to return true? What’s going on behind the scenes?

Debugging the results from each side of the first expression.

0.1 + 0.02# => 0.12000000000000001
 


0.12# => 0.12

It looks like there is something wrong with the operation 0.1 + 0.02. Lets take a closer look, using a more accurate format.

sprintf("%0.50f", 0.12)# => "0.11999999999999999555910790149937383830547332763672"
 


sprintf("%0.50f", 0.10 + 0.02)# => "0.12000000000000000943689570931383059360086917877197"


 
sprintf("%0.50f", 0.02)# => "0.02000000000000000041633363423443370265886187553406"


 
sprintf("%0.50f", 0.10)# => "0.10000000000000000555111512312578270211815834045410"

So the error happens even before the arithmetic operations. It’s time to dive deeper into the stack.

Conversions

Once the interpreter parses the decimal string 0.1, for example, it converts that to an internal representation, which we’ll review briefly. This is achieved with the help of C strtod function, in MRI.

Curiously, this conversion step had already lead to security issues before.

Another conversion takes place when printing a float back to the decimal string. These transformations are not as trivial as one might think. There had been many studies in the development of the algorithms used in order to make it accurately and efficiently.

Since all of this happens transparently, some might expect that all languages work this way by default. It does, but not always for free. For example, in the infant days of the Elixir language, a developer had the following question after typing a decimal string into IEx (Elixir’s Interactive Shell). This had been solved by borrowing functionality already present in Erlang.

A counter intuitive fact is that even if a decimal number is not exactly representable as float, say 0.12, we are able to convert it back and forth without ever seeing the rounding error due to such carefully designed algorithms.

This automatic rounding behavior hides complexity but may encourage some to operate on floats expecting that the final conversion is going to provide a desired decimal result. But that is not always true due to rounding errors accumulation.

But what’s this internal representation and why does it impose such limitations?

Thinking in terms of binaries

We are not going to dive into the details of the IEEE754 nor explain how to do such conversions. That’s an interesting topic but it has been covered before and would end up being too extensive to fit in this post.

Instead we are going to show some examples that provides a glimpse of the real problem.

For instance, take the decimal number 0.625. This number represents the sum of fractions 6/10^1 + 2/10^2 + 5/10^3. The same reasoning can be used to find the binary representation replacing the denominator by powers of two 1/2^1 + 0/2^2 + 1/2^3 = 0.5 + 0.125 = 0.625, which leads to the binary 0.101 when we place the numerators in a row.

But if we had chosen another number, let’s say the decimal 0.1, we would find that it’s not possible to write it as a finite sum of fractions whose denominator is a power of 2. E.g. it does not have a finite binary representation.

Since the space to store a double precision float is limited to 64 bits, some sort of rounding would be required. And here we have one of the causes of precision loss.

What type should I use for representing currency?

The two most frequent answers include Integers or BigDecimals as the way to go. I’ve personally used both and it worked fine, but it would be too pretentious to say that one or another is the best for your problem.

Most Rails applications use databases. If that’s your case, it’s important to understand the limitations offered by the types available and the conversions done by the adapter during information storage and retrieval.

For example, Postgres recommends using its numeric type for currency, which Rails maps to BigDecimals. This type also supports the aggregation functions we are used to.

Other databases may require other approaches, like mongodb that does not have a type compatible with BigDecimal. But it doesn’t mean that it’s impossible to use BigDecimals with mongodb either.

Currency in cents, like $19.90 could be represented as the integer 1990. It avoids precision problems and works well with most databases too. One inconvenience is formatting numbers as currency, but there are gems to make this task easier and coding your own helper methods is also an alternative.

Some may experience performance issues when using BigDecimals, but I’ve never faced such issues personally. One thing to be aware of is that BigDecimals are not as simple as Integers and I like to keep things as simple as possible.

Even floating point may be required under some contexts if numerical methods get involved. The most important thing is to be aware of the limitations and make informed decisions.

So have you ever had problems with currency handling? I’m really willing to hear about how you solve this problem. If you have an experience to share, please, leave a comment below or find me on twitter @rcillo.

References


Yeah, it might seem a little bit lame at first, but the whole idea is to talk about agile teams, so do not nitpick about the metaphors and examples using the movie. This post has spoilers, so if you have not watched Guardians of the Galaxy yet, go there and watch it, but come back to read the post =).

It’s been some time since I started writing down some ideas about agile teams, culture and how these things reflect in actions. After watching the Guardians of the Galaxy movie, my ideas and thoughts became this blog post. Each section is about skills and practices that agile teams should master. To better understand them, examples from the Guardians of the Galaxy will be used.

The sense of team

The main character of the movie, Quill – also known as Star-Lord -, acts as a scrum master. During the first minutes of the movie he managed to group different people with different purposes and abilities to work together in a very efficient way. He did it in the same way agile teams work: making everyone to have a common goal.

Quill wanted to sell the Orb for money; Rocket and Groot were hunting Quill for money; Gamora wanted to sell the Orb too, not for the money, but because she did not want it fall into the wrong hands; Drax wanted revenge, and hurt Ronan the way Ronan hurted him. Quill managed to set their goals to save the galaxy. Now, everyone works for the same purpose. All the personal problems were vanished, the goal of the team was now bigger than the personal goals.

Selfishness hurts

Have you ever not called for help just because you wanted to show to someone (or even to yourself) that you could accomplish it by yourself? Well, Drax did it. And It was selfish. He called Ronan to Knowhere saying the Orb was there and so he could kill Ronan. He thought of it as an opportunity. But he lost the fight and the Orb as well.

In software development, trying to decide a software architecture, pushing code to master without code review or ignoring known bugs are some selfish acts that someone could perform.

Although it should be avoided, sometimes we still see things like these happening in agile teams. When it happens, we should reinforce that it’s a team. And a team works like a team, there is no space for solo stars. Drax realized he made a mistake doing that. He asked for apologies and the team understood and moved on.

Solving conflicts among members

Drax joined the team, firstly, because of Gamora. He wanted to kill her in order to hurt Ronan. But when he figured out it wouldn’t help, he kept in the team, for some reason. It could be a problem and may put the goal in risk if not managed well. But Star-Lord argued well and made people collaborate with each other.

Solving conflicts in the team comes from listening both sides. Drax explained why he wanted to kill her. She said she was against Ronan as well. Star-Lord helped both to get in a deal, since it probably was a communication issue. On a daily basis, we can face ourselves criticizing someone, not happy with something or two members that do not go along. Having an health environment where everyone is able to tell their issues is half the way. How to conduce the conflict resolution is the second one.

That’s a leader role, but it does not require only the scrum master to perform it. When things goes chaotic, the team must figure out a way to cool down and make things go well again. In an agile team, we can think in chaotic moments to be, for instance, when pressure from the external world come into to the team, for instance. Rushing the development, cutting edges, letting quality goes down, everything could be taken as the “chaotic” moment. Someone must notice it and figure out a way to solve it.

Behaving like a team member

Star-Lord never said “I am the leader”. This role came to him naturally when everyone trusted him enough. That’s the final scene where he was asked where to go to, and so Gamora says: “You lead, Star-Lord”. No roles are set, they must come naturally. When a member is comfortable performing some role, he must naturally say “let me handle this”.

As a leader, he is not supposed to solve all the problems. He needs the team to be part of the solution, so they would be more engaged and committed with the results. You can see it while they plan how to escape from the prison. It was not Star-Lord who came up with a plan. Rocket made it up and everyone was part of its execution, which, by the way, was one of the funniest parts.

Everyone had a task to perform and the plan’s success was depending on their execution. Rocket let everyone perform it without monitoring or over controlling. He just trusted people would do their best to succeed. And they did. Rocket did his part as well. Everyone was part of it.

Backing the team up

When Gamora was in the vacuum, after her ship explosion, Star-Lord put himself in risk in order to save her. Ok, it might have been for love. But, in real life, members of a good agile team will always do small sacrifices for other members. For love? Maybe not. I don’t think there is a word to describe the feeling of being united and friend with the whole team to the point of doing small sacrifices.

Have you ever exchanged your chair because someone else’s was not comfortable or kind of broken? Have you worked on someone’s user story in order to finish it because the owner had personal issues to deal during the sprint? If you had answered yes for any of these questions or you can think of a similar experiences, so you’ve done small sacrifices.

Do you remember the “We are Groot” scene? That was a sacrifice. A very nice scene indeed. Very touchy. Well, you might think you will not save someone’s life at this point. But you can totally save someone’s day. On daily basis, these small acts count a lot. They are an important detail that makes the team united and raises the trust sense.

Also, it is very interesting how those sacrifices are taken by examples and encourage other members to sacrifice themselves too. It is very nice when it happens because you feel safe. You feel safe to fail, to expose your ideas and to perform better.

Communication

The members should always be communicating with each other. It can be through Campfire, Skype, HipChat, IRC or even in person. But the communication must happen, its lack can cause huge problems like when Groot, Rocket and Drax were going to save Gamora and Quill from Yondu.

Rocket tells to Yondu’s ship crew that he would blow the ship away in case they do not set Gamora and Quill free. Actually, they already had an agreement, but it was not informed. The situation went weird. Sometimes, specially in consulting, communication fails and either we feel or the customer feels threatened like Yondu felt. If people take it personally, you will have a situation.

A lot of software projects fail due to the lack of communication. People do not clearly understand what’s going on, expectations are not met and fights among members can happen. It might be a change in the requirements that nobody knew except for the guy developing the feature or it could be a phrase without any context, it does not matter what it is, the lack of communication will lead to low trust sense, lower collaboration and also contributes to a bad work environment.

Having fun together

There is another moment that I need to talk about. While Gamora and Star-Lord were talking with the Collector, the rest of the guys were in the bar drinking and having fun. That’s important for agile teams. The more they spend time outside the work hours, the more they know each other and the sense of trust raises.

The sense of trust is the primary factor that unite people. They become happy to work with each other, they trust in the work and that they will cover them back. They will know each other as better as Rocket knows Groot! Yeah, they know each other pretty well. When Groot says “I am Groot”, Rocket knows exactly what he meant, just like when you see a disapproval look from your team member and knows he is shy enough to not say anything at first, so you gently say “I am not sure about this, does someone have any thoughts?” and then he answers.

The final scene is what best describes the journey of agile teams. They finally reached their purpose. And how it happened? Well, Star-Lord starts this very interesting dancing act for Ronan. In the mean time, Rocket was able to build the weapon again and they could strike Ronan with it. That’s teamwork. Unfortunately, it was not enough to defeat Ronan. Drax and Rocket shoot the hammer and the Infinity Stone broke apart. Star-Lord grabs the Stone and then everyone helps him. Everyone was needed, otherwise Star-Lord would blow up!

When everyone is holding each other and Star-Lord is capable of controlling it, everyone – the whole team – feels part of their victory. Groot in pieces was as important as the others.

The same logic is applied for project teams. When one fail, the team fails, the team learns, and the team improves itself. No guilty, no finger pointing. It is about not making the same mistakes again. So, who wins when the project succeeds? The team. The team that in that moment can feel like they have saved the Galaxy.


Today we reach the end of a trip. Carlos Antonio, our first employee, is moving forward after 5 years at Plataformatec.

During this time, we have learned a lot, made some mistakes, grew together as people and as a company. In this blog post, we would like to share a bit of this story and what Carlos is leaving as legacy to us.

A story about Open Source

Carlos’ relationship with Plataformatec started with Open Source. When Carlos joined our team, back in August 2009, Plataformatec existed as a company only for 6 months and we were very excited to have him aboard. After all, he was the #1 employee.

At the time, the choice was really clear: Carlos had done many contributions to our only Open Source project at that point, Inherited Resources, and from his contributions we observed many of the traits that he exhibited throughout his stay at Plataformatec: tidy code, excellent communication skills and a great ability to evaluate trade-offs.

Being our first employee, Carlos helped us perpetuate those qualities as part of Plataformatec’s culture.

Focused

We often promote feedback between our team members and one word that was frequently used to describe Carlos is: focused.

Focus, on its own, can be a dangerous trait. Some people can focus and, as a consequence, lose the ability to understand how all components fit together. However, that was not Carlos’ case, as he aimed at having a holistic view of his environment and tasks, their causes and consequences. For example:

  • As the company grew, Carlos was always interested in understanding and participating in its diverse activities

  • At every project he worked, he was attentive to what makes the project important to the client. This often showed up when prioritizing features as Carlos would ask: “which one brings more value to the client?”

  • At the code level, Carlos sees how every piece of the application fits the whole design

Then, as a consequence of knowing how the involved components work, Carlos could always focus and reach a particular component to improve, fix or discuss it with precision.

Discussing trade-offs

My favorite quality in Carlos is his ability to understand and evaluate trade-offs.

When working together on a project, if I was unsure which solution would be the best to take, I could always call Carlos and have a quick discussion. I would expose the options I had evaluated so far, Carlos would help me expand them, maybe add some options of his own, and we would debate which one would suit best the current project.

In those discussions, I never had to worry about proposing something silly nor have I heard Carlos saying “we don’t do X”. If something was a bad fit, Carlos often guided me to understand why that was a bad option on my own.

Of course, this always worked both ways. When Carlos had to call someone to discuss a possible solution, he was always open to the team feedback, suggestions and so on.

Farewell?

Most of all, Carlos is a great Open Source partner.

After Carlos joined us, we started a great partnership towards Open Source. Carlos and I were the ones behind the conception of both Devise and Simple Form which would become the biggest Open Source projects from Plataformatec inside the Rails community.

Carlos is also a member of the Rails Core Team which means there are still plenty of opportunity for us to work together on Devise, Simple Form, and many other Open Source projects to come.

This is the end of a trip but definitely not the end of the journey.

We’ll see you around, Carlos!

Last month some amazing developers gave a sneak peek on how they write CSS in their companies – Mark Otto has written about CSS at GitHub, followed by Ian Feather’s post about Lonely Planet’s CSS and Chris Coyier write up about CodePen’s code – so I thought about sharing a bit of how we have been doing CSS in our projects here at Plataformatec over the last years.

After working on different projects with different styles of CSS, we wrote some Guidelines of what kind of code we would like to work with. These guidelines and some other practices have proven to be successful so far, and I want to tell you a bit about how we are doing this here.

Quick Facts

  • We use SCSS.
  • We usually just have Normalize.css (and sometimes Bootstrap) as third party dependencies.
  • Every developer in our team can jump through the codebase and work on our front end code.

Preprocessors and the Pipeline

We have always used Sass (with the SCSS syntax), using most of the Sass features wherever seemed fit and without making the code too complex to grasp so developers outside the project – or without a long experience with preprocessors – could get things done right after jumping in the code.

We do our best to use the most of the Rails Asset Pipeline. I know that it isn’t the most beloved Rails feature out there, but we are pretty happy with it. The Sprockets + Rails integration in Rails 4 is way better than it was before (you can read a bit about what was done on this post), thanks to the work of Guillermo, Rafael and Richard (the Sprocket heroes) and everybody else who contributed to sprockets-rails, and things will only get better on future releases. If you had a hard time with a Rails 3 app, I recommend that you try it out the latest releases and see what have improved.

Architecture

We don’t have strict guidelines about how we should organize and architect our CSS code, but we have some general rules and conventions. We organize most of our code into isolated partial stylesheets under something like modules or components. And we also break functions, mixins and generic placeholders into specific files and @import everything on the application application.css.scss file.

We do our best to keep our selectors small and using only classes, somewhat based on Wealthfront post on Functional CSS and OOCSS-ish. With this setup we can avoid complex nested blocks and keep things quite readable for everybody.

Linting

We don’t have a specific guideline on linting, but I have been experimenting with SCSS Lint on the project that I’m current working on, and I want to evolve this into a default configuration for future projects.

Frameworks

We have some different setup across our projects, but we usually just have Normalize.css as our “reset” stylesheet and everything else is custom made – buttons, grids, typography rules, etc. And of all the existing CSS frameworks out there, we had some encounters with Bootstrap and Foundation once.

Documentation

We love documentation – not obvious code comments, but real documentation that makes easier to understand how to use a specific piece of code.

We started writing docs for our CSS to make more sense out of a SCSS partial, so through a single file we can understand how the application can use those styles. We have adopted some loose form of the KSS syntax, because we currently don’t care about generating pretty styleguides with live examples of the styles in use. But the KSS format is human-readable enough and does the job of explaining the purpose of a set of classes to someone.

Sprites

Last year I created a gem called Spriteful to help us manage the image sprites in some projects without having to bring Compass and compass-rails as dependencies to our pipeline. Spriteful has proven useful to me and my coworkers, and now we can generate sprites and SCSS partials out of icons and SVG images with just a single command.

Some close friends from outside the company have used it on their projects and it’s awesome that someone else has found our little tool handy for this task.

Who’s in charge?

One important aspect of how we work is that every developer in the team is capable to work with the front end code of the application and build new things or fix existing bugs, not just turn some static markup into ERB blocks. This provides a higher sense of collective ownership of the code, instead of enlisting one or two programmers who are in charge of half of the application while everybody else works elsewhere without caring about that layer.


The article below was originally written by Kasper Timm Hansen (@kaspth on github & twitter) about his work during the Google Summer of Code 2013.

Kasper and I worked a lot changing the underlying implementation of the sanitize helper to give Rails developers a more robust, faster and secure solution to sanitize user input.

This new implementation should be fully backward compatible, with no changes to the API, which should make the update easier.

You can see more information about the previous and the new implementation on this talk I presented in a Brazillian conference this year (the slides are in English).

Now, I’ll let Kasper share his words with you.

Scrubbing Rails Free of HTML-scanner

Everyone, at least one time, has already needed to use the sanitize method to scrub some pesky HTML away.

<%= sanitize @article.body %>

If you were to run this on Rails 4.1 (and before) this would take advantage of the html-scanner, a vendored library inside Rails, for the sanitization. Since the summer of 2013 I have been working to destroy that notion by wiping the traces of html-scanner throughout Rails. Before you become concerned of my mental health, I didn’t do this unwarranted. I’m one of the Google Summer of Code students working on Ruby on Rails. My project proposal was to kick html-scanner to the curb (technical term) and grab a hold of Loofah instead. Why did the old library need replacing, though?

The out washed HTML-scanner

html-scanner has been with us for a long time now. The copyright notice in the library clocks it in at 2006, when Assaf Arkin created it. This library relies on Regular Expressions to recognize HTML (and XML) elements. This made the code more brittle. It was easier to introduce errors via complex Regular Expressions, which also gave it a higher potential for security issues.

The Rails Team wanted something more robust and faster, so we picked Loofah. Loofah uses Nokogiri for parsing, which provides a Ruby interface to either a C or Java parser depending on the Ruby implementation you use. This means Loofah is fast. It’s up to 60 to 100% faster than html-scanner on larger documents and fragments.

I started by taking a look at the SanitizeHelper in Action View, which consists of four methods and some settings. The four methods of the are sanitize, sanitize_css, strip_tags and strip_links.

Let’s take a look at the sanitize method.

Comparing with the old implementation, sanitize still uses the WhiteListSanitizer class to do it’s HTML stripping. However, since Action View was pulled out of Action Pack and both needed to use this functionality, we’ve extracted this to it’s own gem.

Developers meet Rails::Html::WhiteListSanitizer

When you use sanitize, you’re really using WhiteListSanitizer‘s sanitize method. Let me show you the new version.

def sanitize(html, options = {})
  return nil unless html
  return html if html.empty?

No surprises here.

  loofah_fragment = Loofah.fragment(html)

The first trace of Loofah. A fragment is a part of a document, but without a DOCTYPE declaration and html and body tags. A piece of a document essentially. Internally Nokogiri creates a document and pulls the parsed html out of the body tag, leaving us with a fragment.

  if scrubber = options[:scrubber]
    # No duck typing, Loofah ensures subclass of Loofah::Scrubber
    loofah_fragment.scrub!(scrubber)

You can pass your own Scrubber to sanitize! Giving you the power to choose if and how elements are sanitized. As the comment alludes, any scrubber has to be either a subclass of Loofah::Scrubber or it can wrap a block. I’ll show an example later.

  elsif allowed_tags(options) || allowed_attributes(options)
    @permit_scrubber.tags = allowed_tags(options)
    @permit_scrubber.attributes = allowed_attributes(options)
    loofah_fragment.scrub!(@permit_scrubber)

We have been very keen on maintaining backwards compatibility throughout this project, so you can still supply Enumerables of tags and attributes to sanitize. That’s what the PermitScrubber used here handles. It manages these options and makes them work independently. If you pass one it’ll use the standard behavior for the other. See the documentation on what the standard behavior is.
You can also set the allowed tags and attributes on the class level. Like this:

Rails::Html::Sanitizer.allowed_tags = Set.new %w(for your health)

That’s simply what allowed_tags and allowed_attributes methods are there for. They’ll return the tags or attributes from the options and fallback to the class level setting if any.

  else
    remove_xpaths(loofah_fragment, XPATHS_TO_REMOVE)
    loofah_fragment.scrub!(:strip)
  end

The StripScrubber built in to Loofah will strip the tags but leave the contents of elements. Which is usually what we want. We use remove_xpaths to remove elements along with their subtrees in the few instances where we don’t. If you have trouble with the syntax above, they’re XPath Selectors.

  loofah_fragment.to_s
end

Lastly we’ll take the elements and extract the remaining markup with to_s. Internally Nokogiri will call either to_xml or to_html depending on the kind of document or fragment you have.

Rub, buff or clean it off, however you like

So there you have it. I could go through how the other sanitizers work, but they’re not that complex. So go code spelunking in the source.

If this was the first time you’ve seen a Loofah::Scrubber, be sure to check out the source for PermitScrubber and see an example of how to implement one. You can also subclass PermitScrubber and get the sanitization you need without worrying about the implementation details of stripping elements and scrubbing attributes. Take a look at TargetScrubber – the weird PermitScrubber – and how it uses that to get scrubbing fast.

Before I scrub off though, I promised you an example of a custom scrubber. I’ll use the option that wraps a block here, but you could easily create a subclass of Loofah::Scrubber (in a helper maybe?) and override scrub(node). So here goes:

<%= sanitize @article.body,
  scrubber: Loofah::Scrubber.new { |node| node.name = "script" } %>

The code above changes all the HTML tags included in the article body to be a tag <script>.

<sarcasm>
If you’re going to introduce bugs, why not make everything a potential risk of running arbitrary code?
</sarcasm>


Rails 4 supports arrays fields for PostgreSQL in a nice way, although it is not a very known feature. In order to demonstrate its usage it’s useful to explain the context where this was used.

PostgreSQL Arrays and Rails Migrations

Suppose we have a Product model with the following fields: name, category_id and tags. The name field will be a simple string, category_id will be the foreign key of a record in the Category model and tags will be created by inputting a string of comma-separated words, so: “one, two, forty two” will become the tags: “one”, “two” and “forty two” respectively.

Creating these tables via migrations is nothing new, except for the column tags which will have the Array type in this case. To create this kind of column we use the following syntax in our migration:

create_table :categories do |t|
  t.string :name, null: false
end
 
create_table :products do |t|
  t.string :name, null: false
  t.references :category, null: false
  t.text :tags, array: true, default: []
end

Let’s explore what we can do with this kind of field using the postgres console:

$ rails db
> INSERT INTO products(name, category_id, tags) VALUES('T-Shirt', 3, '{clothing, summer}');
> INSERT INTO products(name, category_id, tags) VALUES('Sweater', 3, ARRAY['clothing', 'winter']);
> SELECT * FROM products;
1  |  T-Shirt  |  3  | {clothing, summer}
2  |  Sweater  |  3  | {clothing, winter}

As we can see we need to specify each tag following this syntax:

‘{ val1, val2, … }’ or ARRAY['val1', 'val2', ...]

Let’s play a little more to understand how this column behaves when queried:

> SELECT * FROM products WHERE tags = '{clothing, summer}';
1  |  T-Shirt  |  3  | {clothing, summer}
 
> SELECT * FROM products WHERE tags = '{summer, clothing}';
(0 rows)
 
> SELECT * FROM products WHERE 'winter' = ANY(tags);
2  |  Sweater  |  3  |  {clothing, winter}

As this example demonstrates, searching for records by an array with its values in the order they were inserted works, but with the same values in a different order does not. We were also able to find a record searching for a specific tag using the ANY function.

There’s a lot more to talk about arrays in PostgreSQL, but for our example this is enough. You can find more information at the PostgreSQL official documentation about arrays and its functions.

How Rails treats PostgreSQL arrays

It’s also valuable to see how to use the array field within Rails, let’s try:

$ rails c
 
Product.create(name: 'Shoes', category: Category.first, tags: ['a', 'b', 'c'])
#> 
 
Product.find(26).tags
#> ["a", "b", "c"]

So Rails treats an array column in PostgreSQL as an Array in Ruby, pretty reasonable!

Validations

We want each product to be unique, let’s see some examples to clarify this concept.

Given we have the following product:

Product.create(name: 'Shoes', category: Category.first, tags: ['a', 'b', 'c'])

We can easily create another one if we change the name attribute:

Product.create(name: 'Slippers', category: Category.first, tags: ['a', 'b', 'c'])

We can also create another product with different tags:

Product.create(name: 'Shoes', category: Category.first, tags: ['a', 'b'])

But we don’t want to create a product with the same attributes, even if the tags are in a different order:

Product.create(name: 'Shoes', category: Category.first, tags: ['a', 'c', 'b'])
#> false

As PostgreSQL only finds records by tags given the exact order in which they were inserted, then how can we ensure the uniqueness of a product with tags in an order-independent way?

After much thought we decided that a good approach would involve creating an unique index with all the columns in the products table but with tags sorted when a row is inserted in the database. Something like:

CREATE UNIQUE INDEX index_products_on_category_id_and_name_and_tags
ON products USING btree (category_id, name, sort_array(tags));

And sort_array is our custom function responsible for sorting the array, since PostgreSQL does not have a built in function like this.

Creating a custom function in PostgreSQL using PL/pgSQL

To create a custom function we used the PL/pgSQL language, and since we are adding database specific code like this we can’t use the default schema.rb anymore. Let’s change this in config/application.rb:

# Use SQL instead of AR schema dumper when creating the database
config.active_record.schema_format = :sql

With this configuration set, our schema.rb file will be replaced by a structure.sql file without side effects, our current migrations don’t need to be changed at all. Now we can create a migration with our sort_array code:

def up
  execute <<-SQL
    CREATE FUNCTION sort_array(unsorted_array anyarray) RETURNS anyarray AS $$
      BEGIN
        RETURN (SELECT ARRAY_AGG(val) AS sorted_array
        FROM (SELECT UNNEST(unsorted_array) AS val ORDER BY val) AS sorted_vals);
      END;
    $$ LANGUAGE plpgsql IMMUTABLE STRICT;
 
    CREATE UNIQUE INDEX index_products_on_category_id_and_name_and_tags ON products USING btree (category_id, name, sort_array(tags));
  SQL
end
 
def down
  execute <<-SQL
    DROP INDEX IF EXISTS index_products_on_category_id_and_name_and_tags;
    DROP FUNCTION IF EXISTS sort_array(unsorted_array anyarray);
  SQL
end

Now, let’s take it slow and understand step by step

CREATE FUNCTION sort_array(unsorted_array anyarray) RETURNS anyarray

The line above tells that we are creating a function named sort_array and that it receives a parameter named unsorted_array of type anyarray and returns something of this same type. This anyarray, in fact, is a pseudo-type that indicates that a function accepts any array data type.

RETURN (SELECT ARRAY_AGG(val) AS sorted_array
FROM (SELECT UNNEST(unsorted_array) AS val ORDER BY val) AS sorted_vals);

The trick here is the use of the function unnest that expands an Array to a set of rows. Now we can order these rows and after that we use another function called array_agg that concatenates the input into a new Array.

$$ LANGUAGE plpgsql IMMUTABLE STRICT;

The last trick is the use of the keywords IMMUTABLE  and STRICT. With the first one we guarantee that our function will always return the same output given the same input, we can’t use it in our index if we don’t specify so. The other one tells that our function will always return null if some of the parameters are not specified.

And that’s it! With this we can check for uniqueness in a performant way with some method like:

def duplicate_product_exists?
  relation = self.class.
    where(category_id: category_id).
    where('lower(name) = lower(?)', name).
    where('sort_array(tags) = sort_array(ARRAY[?])', tags)
 
  relation = relation.where.not(id: id) if persisted?
 
  relation.exists?
end

Case insensitive arrays

There is still a problem with our code though, the index is not case insensitive!  What if a user inserts a product with tags ['a', 'b'] and another one inserts the same product but with tags ['A', 'b']? Now we have duplication in our database! We have to deal with this, but unfortunately this will increase the complexity of our sort_array function a little bit. To fix this problem we only need to change one single line:

From this:

FROM (SELECT UNNEST(unsorted_array) AS val ORDER BY val) AS sorted_vals);

To:

FROM
(SELECT
  UNNEST(string_to_array(LOWER(array_to_string(unsorted_array, ',')), ','))
  AS val ORDER BY val)
AS sorted_vals);

The difference is that instead of passing unsorted_array directly to the function unnest we are transforming it in an String, calling lower on it and transforming it back to an Array before passing it on. With this change it doesn’t matter if the user inserts ['a'] or ['A'], every tag will be saved in lowercase in the index. Problem solved!

As we can see, it’s not an easy task to deal with uniqueness and arrays in the database, but the overall result was great.

Would you solve this problem in a different way? Share with us!