{"id":5939,"date":"2016-12-07T17:02:29","date_gmt":"2016-12-07T19:02:29","guid":{"rendered":"http:\/\/blog.plataformatec.com.br\/?p=5939"},"modified":"2017-01-16T19:06:54","modified_gmt":"2017-01-16T21:06:54","slug":"many-to-many-and-upserts","status":"publish","type":"post","link":"https:\/\/blog.plataformatec.com.br\/2016\/12\/many-to-many-and-upserts\/","title":{"rendered":"Many to many and upserts"},"content":{"rendered":"<div style=\"border:solid 1px #FFE041; background-color:#FFF9CA; padding:20px; font:italic 13px Gotham, Helvetica Neue, Helvetica, Arial,' sans-serif';\">Note: This is a sample chapter <del datetime=\"2017-01-16T21:04:33+00:00\">of the upcoming beta version<\/del> of our &#8220;What&#8217;s new in Ecto 2.0&#8221; free book. <a href=\"http:\/\/pages.plataformatec.com.br\/ebook-whats-new-in-ecto-2-0\"><del datetime=\"2017-01-16T21:05:26+00:00\">Reserve<\/del> Download your copy now<\/a> <del datetime=\"2017-01-16T21:05:26+00:00\">if you want to receive the next beta and be notified of future versions<\/del>.<\/div>\n<p>In the previous chapter we have learned about <code>many_to_many<\/code> associations and how to map external data to associated entries with the help of <code>Ecto.Changeset.cast_assoc\/3<\/code>. While in the previous chapter we were able to follow the rules imposed by <code>cast_assoc\/3<\/code>, doing so is not always possible nor desired.<\/p>\n<p>In this chapter we are going to look at <code>Ecto.Changeset.put_assoc\/4<\/code> in contrast to <code>cast_assoc\/3<\/code> and explore some examples. We will also peek at the upsert features coming in Ecto 2.1.<\/p>\n<h2>put_assoc vs cast_assoc<\/h2>\n<p>Imagine we are building an application that has blog posts and such posts may have many tags. Not only that, a given tag may also belong to many posts. This is a classic scenario where we would use <code>many_to_many<\/code> associations. Our migrations would look like:<\/p>\n<pre><code class=\"elixir\">create table(:posts) do\n  add :title\n  add :body\n  timestamps()\nend\n\ncreate table(:tags) do\n  add :name\n  timestamps()\nend\n\ncreate unique_index(:tags, [:name])\n\ncreate table(:posts_tags, primary_key: false) do\n  add :post_id, references(:posts)\n  add :tag_id, references(:tags)\nend\n<\/code><\/pre>\n<p>Note we added a unique index to the tag name because we don&#8217;t want to have duplicated tags in our database. It is important to add an index at the database level instead of using a validation since there is always a chance two tags with the same name would be validated and inserted simultaneously, passing the validation and leading to duplicated entries.<\/p>\n<p>Now let&#8217;s also imagine we want the user input such tags as a list of words split by comma, such as: &#8220;elixir, erlang, ecto&#8221;. Once this data is received in the server, we will break it apart into multiple tags and associate them to the post, creating any tag that does not yet exist in the database.<\/p>\n<p>While the constraints above sound reasonable, that&#8217;s exactly what put us in trouble with <code>cast_assoc\/3<\/code>. Remember the <code>cast_assoc\/3<\/code> changeset function was designed to receive external parameters and compare them with the associated data in our structs. To do so correctly, Ecto requires tags to be sent as a list of maps. However here we expect tags to be sent in a string separated by comma.<\/p>\n<p>Furthermore <code>cast_assoc\/3<\/code> relies on the primary key field for each tag sent in order to decide if it should be inserted, updated or deleted. Again, because the user is simply passing a string, we don&#8217;t have the ID information at hand.<\/p>\n<p>When we can&#8217;t cope with <code>cast_assoc\/3<\/code>, it is time to use <code>put_assoc\/4<\/code>. In <code>put_assoc\/4<\/code>, we give Ecto structs or changesets instead of parameters, giving us the ability to manipulate the data as we want. Let&#8217;s define the schema and the changeset function for a post which may receive tags as a string:<\/p>\n<pre><code class=\"elixir\">defmodule MyApp.Post do\n  use Ecto.Schema\n\n  schema \"posts\" do\n    field :title\n    field :body\n    many_to_many :tags, MyApp.Tag, join_through: \"posts_tags\"\n    timestamps()\n  end\n\n  def changeset(struct, params \\\\ %{}) do\n    struct\n    |&gt; Ecto.Changeset.cast(params, [:title, :body])\n    |&gt; Ecto.Changeset.put_assoc(:tags, parse_tags(params))\n  end\n\n  defp parse_tags(params)  do\n    (params[\"tags\"] || \"\")\n    |&gt; String.split(\",\")\n    |&gt; Enum.map(&amp;String.trim\/1)\n    |&gt; Enum.reject(&amp; &amp;1 == \"\")\n    |&gt; Enum.map(&amp;get_or_insert_tag\/1)\n  end\n\n  defp get_or_insert_tag(name) do\n    Repo.get_by(MyApp.Tag, name: name) ||\n      Repo.insert!(MyApp.Tag, %Tag{name: name})\n  end\nend\n<\/code><\/pre>\n<p>In the changeset function above, we moved all the handling of tags to a separate function, called <code>parse_tags\/1<\/code>, which checks for the parameter, breaks its entries apart via <code>String.split\/2<\/code>, then removes any left over whitespace with <code>String.trim\/1<\/code>, rejects any empty string and finally checks if the tag exists in the database or not, creating one in case none exists.<\/p>\n<p>The <code>parse_tags\/1<\/code> function is going to return a list of <code>MyApp.Tag<\/code> structs which are then passed to <code>put_assoc\/3<\/code>. By calling <code>put_assoc\/3<\/code>, we are telling Ecto those should be the tags associated to the post from now on. In case a previous tag was associated to the post and not given in <code>put_assoc\/3<\/code>, Ecto will also take care of removing the association between the post and the removed tag from the database.<\/p>\n<p>And that&#8217;s all we need to use <code>many_to_many<\/code> associations with <code>put_assoc\/3<\/code>. <code>put_assoc\/3<\/code> works with <code>has_many<\/code>, <code>belongs_to<\/code> and all others association types. However, our code is not yet ready for production. Let&#8217;s see why.<\/p>\n<h2>Constraints and race conditions<\/h2>\n<p>Remember we added a unique index to the tag <code>:name<\/code> column when creating the tags table. We did so to protect us from having duplicate tags in the database.<\/p>\n<p>By adding the unique index and then using <code>get_by<\/code>  with a <code>insert!<\/code> to get or insert a tag, we introduced a potential error in our application. If two posts are submitted at the same time with a similar tag, there is a chance we will check if the tag exists at the same time, leading both submissions to believe there is no such tag in the database. When that happens, only one of the submissions will succeed while the other one will fail. That&#8217;s a race condition: your code will error from time to time, only when certain conditions are met. And those conditions are time sensitive.<\/p>\n<p>Many developers have a tendency to think such errors won&#8217;t happen in practice or, if they happened, they would be irrelevant. But in practice they often lead to very frustrating user experiences. I have heard a first-hand example coming from a mobile game company. In the game, a player is able to play quests and on every quest you have to choose a guest character from another player out of a short list to go on the quest with you. At the end of the quest, you have the option to add the guest character as a friend.<\/p>\n<p>Originally the whole guest list was random but, as time passed, players started to complain sometimes old accounts, often inactive, were being shown in the guests options list. To improve the situation, the game developers started to sort the guest list by most recently active. This means that, if you have just played recently, there is a higher chance of you to be on someone guest lists.<\/p>\n<p>However, when they did such change, many errors started to show up and users were suddenly furious in the game forum. That&#8217;s because when they sorted players by activity, as soon two players logged in, their characters would likely appear on each others guest list. If those players picked each others characters, the first to add the other as friend at the end of a quest would be able to succeed but an error would appear when the second player tried to add that character as a friend since the relationship already existed in the database! Not only that, all the progress done in the quest would be lost, because the server was unable to properly persist the quest results to the database. Understandably, players started to file complaints.<\/p>\n<p>Long story short: we must address the race condition.<\/p>\n<p>Luckily Ecto gives us a mechanism to handle constraint errors from the database.<\/p>\n<h2>Checking for constraint errors<\/h2>\n<p>Since our <code>get_or_insert_tag(name)<\/code> function fails when a tag already exists in the database, we need to handle such scenarios accordingly. Let&#8217;s rewrite it taking race conditions into account in mind:<\/p>\n<pre><code class=\"elixir\">defp get_or_insert_tag(name) do\n  %Tag{}\n  |&gt; Ecto.Changeset.change(name: name)\n  |&gt; Ecto.Changeset.unique_constraint(:name)\n  |&gt; Repo.insert\n  |&gt; case do\n    {:ok, tag} -&gt; tag\n    {:error, _} -&gt; Repo.get_by!(MyApp.Tag, name: name)\n  end\nend\n<\/code><\/pre>\n<p>Instead of inserting the tag directly, we know build a changeset, which allows us to use the <code>unique_constraint<\/code> annotation. Now if the <code>Repo.insert<\/code> operation fails because the unique index for <code>:name<\/code> is violated, Ecto won&#8217;t raise, but return an <code>{:error, changeset}<\/code> tuple. Therefore, if the <code>Repo.insert<\/code> succeeds, it is because the tag was saved, otherwise the tag already exists, which we then fetch with <code>Repo.get_by!<\/code>.<\/p>\n<p>While the mechanism above fixes the race condition, it is a quite expensive one: we need to perform two queries for every tag that already exists in the database: the (failed) insert and then the repository lookup. Given that&#8217;s the most common scenario, we may want to rewrite it to the following:<\/p>\n<pre><code class=\"elixir\">defp get_or_insert_tag(name) do\n  Repo.get_by(MyApp.Tag, name: name) || maybe_insert_tag(name)\nend\n\ndefp maybe_insert_tag(name) do\n  %Tag{}\n  |&gt; Ecto.Changeset.change(name: name)\n  |&gt; Ecto.Changeset.unique_constraint(:name)\n  |&gt; Repo.insert\n  |&gt; case do\n    {:ok, tag} -&gt; tag\n    {:error, _} -&gt; Repo.get_by!(MyApp.Tag, name: name)\n  end\nend\n<\/code><\/pre>\n<p>The above performs 1 query for every tag that already exists, 2 queries for every new tag and possibly 3 queries in the case of race conditions. While the above would perform slightly better on average, Ecto 2.1 has a better option in stock.<\/p>\n<h2>Upserts<\/h2>\n<p>Ecto 2.1 supports the so-called &#8220;upsert&#8221; command which is an abbreviation for &#8220;update or insert&#8221;. The idea is that we try to insert a record and in case it conflicts with an existing entry, for example due to a unique index, we can choose how we want the database to act by either raising an error (the default behaviour), ignoring the insert (no error) or by updating the conflicting database entries.<\/p>\n<p>&#8220;upsert&#8221; in Ecto 2.1 is done with the <code>:on_conflict<\/code> option. Let&#8217;s rewrite <code>get_or_insert_tag(name)<\/code> once more but this time using the <code>:on_conflict<\/code> option. Remember that &#8220;upsert&#8221; is a new feature in PostgreSQL 9.5, so make sure you are up to date.<\/p>\n<p>Your first try in using <code>:on_conflict<\/code> may be by setting it to <code>:nothing<\/code>, as below:<\/p>\n<pre><code class=\"elixir\">defp get_or_insert_tag(name) do\n  Repo.insert!(%MyApp.Tag{name: name}, on_conflict: :nothing)\nend\n<\/code><\/pre>\n<p>While the above won&#8217;t raise an error in case of conflicts, it also won&#8217;t update the struct given, so it will return a tag without ID. One solution is to force an update to happen in case of conflicts, even if the update is about setting the tag name to its current name. In such cases, PostgreSQL also requires the <code>:conflict_target<\/code> option to be given, which is the column (or a list of columns) we are expecting the conflict to happen:<\/p>\n<pre><code class=\"elixir\">defp get_or_insert_tag(name) do\n  Repo.insert!(%MyApp.Tag{name: name},\n               on_conflict: [set: [name: name]], conflict_target: :name)\nend\n<\/code><\/pre>\n<p>And that&#8217;s it! We try to insert a tag with the given name and if such tag already exists, we tell Ecto to update its name to the current value, updating the tag and fetching its id. While the above is certainly a step up from all solutions so far, it still performs one query per tag. If 10 tags are sent, we will perform 10 queries. Can we further improve this?<\/p>\n<h2>Upserts and insert_all<\/h2>\n<p>Ecto 2.1 did not only add the <code>:on_conflict<\/code> option to <code>Repo.insert\/2<\/code> but also to the <code>Repo.insert_all\/3<\/code> function introduced in Ecto 2.0. This means we can build one query that attempts to insert all missing tags and then another query that fetches all of them at once. Let&#8217;s see how our <code>Post<\/code> schema will look like after those changes:<\/p>\n<pre><code class=\"elixir\">defmodule MyApp.Post do\n  use Ecto.Schema\n\n  # Schema is the same\n  schema \"posts\" do\n    field :title\n    field :body\n    many_to_many :tags, MyApp.Tag, join_through: \"posts_tags\"\n    timestamps()\n  end\n\n  # Changeset is the same\n  def changeset(struct, params \\\\ %{}) do\n    struct\n    |&gt; Ecto.Changeset.cast(params, [:title, :body])\n    |&gt; Ecto.Changeset.put_assoc(:tags, parse_tags(params))\n  end\n\n  # Parse tags has slightly changed\n  defp parse_tags(params)  do\n    (params[\"tags\"] || \"\")\n    |&gt; String.split(\",\")\n    |&gt; Enum.map(&amp;String.trim\/1)\n    |&gt; Enum.reject(&amp; &amp;1 == \"\")\n    |&gt; insert_and_get_all()\n  end\n\n  defp insert_and_get_all([]) do\n    []\n  end\n  defp insert_and_get_all(names) do\n    maps = Enum.map(names, &amp;%{name: &amp;1})\n    Repo.insert_all MyApp.Tag, maps, on_conflict: :nothing\n    Repo.all(from t in MyApp.Tag, where: t.name in ^names)\n  end\nend\n<\/code><\/pre>\n<p>Instead of attempting to get and insert each tag individually, the code above work on all tags at once, first by building a list of maps which is given to <code>insert_all<\/code> and then by looking up all tags with the existing names. Therefore, regardless of how many tags are sent, we will perform only 2 queries (unless no tag is sent, in which we return an empty list back promptly). This solution is only possible in Ecto 2.1 thanks to the <code>:on_conflict<\/code> option, which guarantees <code>insert_all<\/code> won&#8217;t fail in case a given tag name already exists.<\/p>\n<p>Finally, keep in mind that we haven&#8217;t used transactions in any of the examples so far. Such decision was deliberate. Since getting or inserting tags is an idempotent operation, i.e. we can repeat it many times and it will always give us the same result back. Therefore, even if we fail to introduce the post to the database due to a validation error, the user will be free to resubmit the form and we will just attempt to get or insert the same tags once again. The downside of this approach is that tags will be created even if creating the post fails, which means some tags may not have posts associated to them. In case that&#8217;s not desired, the whole operation could be wrapped in a transaction or modeled with the <a href=\"https:\/\/hexdocs.pm\/ecto\/Ecto.Multi.html\"><code>Ecto.Multi<\/code><\/a> abstraction we will learn about in future chapters.<\/p>\n<hr style=\"margin-top:30px\" \/>\n<p><a href=\"http:\/\/pages.plataformatec.com.br\/ebook-whats-new-in-ecto-2-0?utm_source=our-blog&#038;utm_medium=referral&#038;utm_campaign=ebook-ecto-2-0&#038;utm_content=cta-blog-post-bottom\" target=\"_blank\"><br \/>\n<img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/blog.plataformatec.com.br\/wp-content\/uploads\/2016\/12\/CTA-blog-ebook-ecto-2-0.jpg\" alt=\"What&#039;s new in Ecto 2.0 -- Download your copy\" width=\"831\" height=\"147\" class=\"aligncenter size-full wp-image-5997\" srcset=\"https:\/\/blog.plataformatec.com.br\/wp-content\/uploads\/2016\/12\/CTA-blog-ebook-ecto-2-0.jpg 831w, https:\/\/blog.plataformatec.com.br\/wp-content\/uploads\/2016\/12\/CTA-blog-ebook-ecto-2-0-300x53.jpg 300w, https:\/\/blog.plataformatec.com.br\/wp-content\/uploads\/2016\/12\/CTA-blog-ebook-ecto-2-0-768x136.jpg 768w\" sizes=\"(max-width: 831px) 100vw, 831px\" \/><br \/>\n<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Note: This is a sample chapter of the upcoming beta version of our &#8220;What&#8217;s new in Ecto 2.0&#8221; free book. Reserve Download your copy now if you want to receive the next beta and be notified of future versions. In the previous chapter we have learned about many_to_many associations and how to map external data &#8230; <a class=\"read-more-link\" href=\"https:\/\/blog.plataformatec.com.br\/2016\/12\/many-to-many-and-upserts\/\">\u00bb<\/a><\/p>\n","protected":false},"author":4,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"ngg_post_thumbnail":0,"footnotes":""},"categories":[1],"tags":[238,143],"aioseo_notices":[],"jetpack_sharing_enabled":true,"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/blog.plataformatec.com.br\/wp-json\/wp\/v2\/posts\/5939"}],"collection":[{"href":"https:\/\/blog.plataformatec.com.br\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.plataformatec.com.br\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.plataformatec.com.br\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.plataformatec.com.br\/wp-json\/wp\/v2\/comments?post=5939"}],"version-history":[{"count":16,"href":"https:\/\/blog.plataformatec.com.br\/wp-json\/wp\/v2\/posts\/5939\/revisions"}],"predecessor-version":[{"id":6025,"href":"https:\/\/blog.plataformatec.com.br\/wp-json\/wp\/v2\/posts\/5939\/revisions\/6025"}],"wp:attachment":[{"href":"https:\/\/blog.plataformatec.com.br\/wp-json\/wp\/v2\/media?parent=5939"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.plataformatec.com.br\/wp-json\/wp\/v2\/categories?post=5939"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.plataformatec.com.br\/wp-json\/wp\/v2\/tags?post=5939"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}