Ecto’s insert_all and schemaless queries

One of the functions added to Ecto 2.0 is Ecto.Repo.insert_all/3. insert_all allows developers to insert multiple entries at once into a repository:

MyApp.Repo.insert_all(Post, [[title: "hello", body: "world"],
                             [title: "another", body: "post"]])

Although insert_all is just a regular Elixir function, it plays an important role in Ecto 2.0 goals. To understand more about these goals, let’s talk about Ecto schemas.

The trouble with schemas

Conceptually speaking, Ecto 2.0 is quite different from Ecto 1.0 as it moves to a more data-oriented approach. We want developers to think of Ecto as a tool instead of their domain layer. One important decision in this direction was the removal of Ecto.Model in favor of Ecto.Schema.

At this point, it is worth asking: what are schemas?

Ecto schemas are used to map any data source into an Elixir struct. Schemas are useful because they give shape to external data and enforce its types:

defmodule Post do
  use Ecto.Schema

  schema "posts" do
    field :title
    field :body
    field :votes, :integer, default: 0
    timestamps
  end
end

One possible benefit of using schemas is that you define the shape of the data once and you can use this shape to retrieve data from the database as well as coordinate changes happening on the data. For example:

params = [title: "new title", votes: "0"]

Post
|> MyApp.Repo.get!(13)
|> Ecto.Changeset.cast(params, [:title, :votes])
|> MyApp.Repo.update!

By relying on the schema information, Ecto knows the shape of the data when it reads from the database and know how to manage changes. In the example above, the “votes” field was automatically cast from string to an integer based on its schema type.

While the benefits of schemas are known, we don’t talk as frequently about the downsides of schemas: which is exactly the coupling of the database representation to your application, leading developers to represent both reads and writes operations on top of the same structure.

With schemaless queries, we get direct access to all underlying database operations, allowing us to perform both reads and writes operations without being coupled to a schema.

Schemaless queries

Ecto 2.0 allows read, create, update and delete operations to be done without a schema. insert_all was the last piece of the puzzle. Let’s see some examples.

If you are writing a reporting view, it may be counter-productive to think how your existing application schemas relate to the report being generated. It is often simpler to write a query that returns only the data you need, without taking schemas into account:

import Ecto.Query

def running_activities(start_at, end_at)
  MyApp.Repo.all(
    from u in "users",
      join: a in "activities",
      on: a.user_id == u.id,
      where: a.start_at > type(^start_at, Ecto.DateTime) and
             a.end_at < type(^end_at, Ecto.DateTime),
      group_by: a.user_id,
      select: %{user_id: a.user_id, interval: a.start_at - a.end_at, count: count(u.id)}
  )
end

The function above does not care about your schemas. It returns only the data that matters for building the report. Notice how we use the type/2 function to specify what is the expected type of the argument we are interpolating, allowing us to benefit from the same type casting guarantees a schema would give.

Inserts, updates and deletes can also be done without schemas via insert_all, update_all and delete_all respectively:

# Insert data into posts and return its ID
[%{id: id}] =
  MyApp.Repo.insert_all "posts", [[title: "hello"]], returning: [:id]

# Use the ID to trigger updates
post = from p in "posts", where: p.id == ^id
{1, _} = MyApp.Repo.update_all post, set: [title: "new title"]

# As well as for deletes
{1, _} = MyApp.Repo.delete_all post

It is not hard to see how these operations directly map to their SQL variants, keeping the database at your fingertips without the need to intermediate all operations through schemas.

Schemas are mappers

When we defined schemas above, we said:

Ecto schemas are used to map any data source into an Elixir struct.

We put emphasis on any because it is a common misconception to think Ecto schemas map only to your database tables.

For instance, when you write a web application using Phoenix and you use Ecto to receive external changes and apply such changes to your database, we are actually mapping the schema to two different sources:

Database <-> Ecto schema <-> Forms / API

It is important to understand how the schema is sitting between your database and your API because in many situations it is better to break this mapping in two. Let’s see some practical examples.

Imagine you are working with a client that wants the “Sign Up” form to contain the fields “First name”, “Last name” along side “E-mail” and other information. You know there are a couple problems with this approach.

First of all, not everyone has a first and last name. Although your client is decided on presenting both fields, they are a UI concern, and you don’t want the UI to dictate the shape of your data. Furthermore, you know it would be useful to break the “Sign Up” information across two tables, the “accounts” and “profiles” tables.

Given the requirements above, how would we implement the Sign Up feature in the backend?

One approach would be to have two schemas, Account and Profile, with virtual fields such as first_name and last_name, and use associations along side nested forms to tie the schemas to your UI. One of such schemas would be:

defmodule Profile do
  use Ecto.Schema

  schema "profiles" do
    field :name
    field :first_name, :string, virtual: true
    field :last_name, :string, virtual: true
    ...
  end
end

It is not hard to see how we are polluting our Profile schema with UI requirements by adding fields such first_name and last_name. If the Profile schema is used for both reading and writing data, it may end-up in an awkward place where it is not useful for any, as it contains fields that map just to one or the other operation.

One alternative solution is to break the “Database Ecto schema Forms / API” mapping in two parts. The first will cast and validate the external data with its own structure which you then transform and write to the database. For such, let’s define a schema named Registration that will take care of casting and validating the form data exclusively, mapping directly to the UI fields:

defmodule Registration do
  use Ecto.Schema

  embedded_schema do
    field :first_name
    field :last_name
    field :email
  end
end

We used embedded_schema because it is not our intent to persist it anywhere. With the schema in hand, we can use Ecto changesets and validations to process the data:

fields = [:first_name, :last_name, :email]

changeset =
  %Registration{}
  |> Ecto.Changeset.cast(params["sign_up"], fields)
  |> validate_required(...)
  |> validate_length(...)

Now that the registration changes are mapped and validated, we can check if the resulting changeset is valid and act accordingly:

if changeset.valid? do
  # Get the modified registration struct out of the changeset
  registration = Ecto.Changeset.apply_changes(changeset)

  MyApp.Repo.transaction fn ->
    MyApp.Repo.insert_all "accounts", Registration.to_account(registration)
    MyApp.Repo.insert_all "profiles", Registration.to_profile(registration)
  end

  {:ok, registration}
else
  # Annotate the action we tried to perform so the UI shows errors
  changeset = %{changeset | action: :registration}
  {:error, changeset}
end

The to_account/1 and to_profile/1 functions in Registration would receive the registration struct and split the attributes apart accordingly:

def to_account(registration) do
  Map.take(registration, [:email])
end

def to_profile(%{first_name: first, last_name: last}) do
  %{name: "#{first} #{last}"}
end

In the example above, by breaking apart the mapping between the database and Elixir and between Elixir and the UI, our code becomes clearer and our data-structures simpler.

Note we have used MyApp.Repo.insert_all/2 to add data to both “accounts” and “profiles” tables directly. We have chosen to bypass schemas altogether. However, there is nothing stopping you from also defining both Account and Profile schemas and changing to_account/1 and to_profile/1 to respectively return %Account{} and %Profile{} structs. Once structs are returned, they could be inserted through the usual Repo.insert/2 operation.

Similarly, we chose to define a Registration schema to use in the changeset but Ecto 2.0 also allows developers to use changesets without schemas. We can dynamically define the data and their types. Let’s rewrite the registration changeset above to bypass schemas:

data  = %{}
types = %{first_name: :string, last_name: :string, email: :string}

changeset =
  {data, types} # The data+types tuple is equivalent to %Registration{}
  |> Ecto.Changeset.cast(params["sign_up"], Map.keys(types))
  |> validate_required(...)
  |> validate_length(...)

You can use this technique to validate API endpoints, search forms, and other sources of data. The choice of using schemas depends mostly if you want to use the same mapping in different places and/or if you desire the compile-time guarantees Elixir structs gives you. Otherwise, you can bypass schemas altogether, be it when using changesets or interacting with the repository.

Summary

Ecto 2.0 introduces insert_all that directly inserts data into a given table. insert_all, alongside all, update_all and delete_all, allows developers to work closer to the database without the need for using schemas.

Such possibilities make Ecto 2.0 a substantial departure from earlier versions, as developers can focus more on how their data map to different domains, like the database and the UI, relying on Ecto as a tool to interact with each of those domains in the best possible way.

This article was extracted from an ebook ~~we’re writing~~ we’ve written on What’s new in Ecto 2.0. Click here to receive a copy of the ebook now ~~when it’s ready~~.

12 responses to “Ecto’s insert_all and schemaless queries”

solnic says:

May 5, 2016 at 11:25 am

Congratulations on the amazing progress with Ecto 2.0. Seeing this changes makes me very happy.
josevalim says:

May 5, 2016 at 12:32 pm

Your input played a big part in this, so thank you for the great feedback. 🙂
Andrea Rossi says:

May 5, 2016 at 1:29 pm

I’m amazed at how fantastically decoupled Ecto feels in all its aspect, especially in regards to databases. The whole “use it to map stuff to whatever you want” concept is truly exceptional. Fantastic work, and excited to follow its development! 👏🏻
solnic says:

May 6, 2016 at 4:52 am

Awesome to hear 🙂 We should collaborate more, there are *lots* of similarities between Ecto 2.0 and rom-rb projects. Right now I’m interested in introducing Changeset API similar to what you have, just not sure where it will fit in the stack yet (closer to persistence, or closer to domain layer, or maybe actually both?). I’m also curious to look deeper into how you implemented validations, esp. wrt ditching classic uniqueness checks in favor of constraints-only approach (which is what I did as well *initially* in rom-rb and then we decided to remove it and use classic approach, but maybe we should revisit).

Keep up the good work, can’t wait to use Ecto 2.0 in the real world now.
josevalim says:

May 6, 2016 at 6:20 am

> Right now I’m interested in introducing Changeset API similar to what you have, just not sure where it will fit in the stack yet

I believe changesets will actually be simpler to implement in Ruby in a way it can map to both parts of the stack. Today changesets in Ecto has both, it has all of the functionality for working on the domain layer (casting, validations, etc) as well as functions for working on the persistence layer (for example, constraints).

Ideally, those two should be kept apart but this would mean having two modules in Ecto: Ecto.Changeset and, let’s say, “Ecto.Persistenceset”. I decided to not do such to avoid splitting the functions in two different modules since it has its own costs in maintenance, discoverability, etc.

In Ruby, because of duck typing, you can easily implement both. The persistence one may inherit all of the functionality from the regular changeset.

Other than that, the changeset contract in Ecto is very simple: it just calls a `__changeset__` function in the struct module which returns its types. The data plus the types are all you need.

> I’m also curious to look deeper into how you implemented validations, esp. wrt ditching classic uniqueness checks in favor of constraints-only approach

In Ecto it works by storing constraints in the changeset. I think it helps a lot that the same module that provides validations in Ecto is the one that provides constraints (see comments above).

Implementation wise, we store in the changeset the constraint name you expect to fail, the type of the constraint (unique, check, etc) and the message we’d add as error in case the constraint is violated. In case the database operation fails due to a constraint, the adapter converts the error into data:

https://github.com/elixir-lang/ecto/blob/37f9a05c3cb5e48d10d1a5c55bcaeaaa18b1c077/lib/ecto/adapters/postgres/connection.ex#L28-L35

Then Ecto checks if the violated constraint was declared in the changeset and if so, add its error message to `changeset.errors`. In case the violated constraint does not belong to the changeset, we raise a nice error message:

https://github.com/elixir-lang/ecto/blob/37f9a05c3cb5e48d10d1a5c55bcaeaaa18b1c077/lib/ecto/adapters/postgres/connection.ex#L37-L61

There are a couple things to consider. First, not all databases make it easy to parse the reason the constraint failed. For example, in Postgres 9.2 and earlier, you need to parse error messages:

https://github.com/elixir-lang/ecto/blob/37f9a05c3cb5e48d10d1a5c55bcaeaaa18b1c077/lib/ecto/adapters/postgres/connection.ex#L37-L61

Databases like SQLite3 do not provide this information at all. You basically can’t implement this functionality there (afaik). Finally, you need to be careful with transactions, if an operation like insert or update fails inside a transaction due to constraints, all other operations will error. Ecto has a quite explicit approach to transactions, so that’s less of an issue, but depending on how developers expect transactions to work it may be surprising.

I hope this give some initial direction on how those features are implemented in Ecto but you are welcome to ping me any time!
solnic says:

May 16, 2016 at 7:25 am

Thanks for reply! I feel like we should revisit db constraint error handling. Our conclusion was that swallowing low level db errors and presenting them as high-level errors will cause more damage than give benefits, so it’s gonna be interesting to see how it’s gonna work for you with Ecto. Looking forward to hearing more about real world usage of this feature.
Ernesto Cambustón says:

May 21, 2016 at 8:52 pm

I was playing with schemaless queries. I have a requirement in which a customer needs to define a table, and 2 fields at runtime. For example. I was trying to do a query that looks like:
“`
from r in “#{table}”, limit: 1, select: %{ r.”#{field_1}”, r.”#{field2}” }
“`
…. I guess the metaprogramming behind the DSL makes the compiler cry, but it would be a nice feature to have. Is there any workaround for this? I would love to avoid implementing a query for each adapter I want to support.
Ernesto Cambustón says:

May 21, 2016 at 10:05 pm

solved it by modifying my select to:
“`
select: map(r, [:field1, :field2])
“`
Timur says:

May 23, 2016 at 3:26 am

Glad to see Ecto is going to the next level! I’d like to share some thoughts from my pythonic point of view.

When I need to deserialize some HTTP form data into a Python data structure, I use colander, a library which doesn’t have anything to do with database, to define a schema, declaratively or imperatively, which takes care of type conversion and validation. The same colander schema may be used to serialize a data structure back into HTTP form.

When I need something like Ecto’s insert_all/3, I use SQLAlchemy core API to define a table object, build an insert query with it and then execute the query providing a list of data structures. SQLAlchemy takes care of type conversion and I don’t have to use anything like type/2 because table definition already contains column types and I’ll just use the same standard Python datetime objects that I got after deserialization.

If I want an object-oriented domain model, I use SQLAlchemy ORM which is build on top of SQLAlchemy core and allows me to map my objects to tables.

Now I see that Ecto tries to combine input deserialization/validation with data source access layer which goes against my instincts.

Also, type/2 seems like duplicating type specification by spreading it all over the code instead of defining in one place, the schema. After all the data source already knows the type and even if it’s schemaless we may still enforce it at the data access library level if we really want.

Where am I missing the point?
josevalim says:

May 23, 2016 at 5:32 am

Hi Timur!

> When I need to deserialize some HTTP form data into a Python data structure, I use colander

This would be equivalent to Ecto Changeset. You would define the initial data, the types, and the changeset takes care of validating the input, casting parameters, and give you the parsed and validated data afterwards.

> When I need something like Ecto’s insert_all/3, I use SQLAlchemy core API to define a table object, build an insert query with it and then execute the query providing a list of data structures.

For talking to the database, you use the Ecto repository and queries. Our insert expects a key-value data-structure in contrast to building an insert query and providing a list of parameters. You are free to map, filter and modify the data in any way before giving it to “insert_all”.

> Also, type/2 seems like duplicating type specification by spreading it all over the code instead of defining in one place, the schema.

You only need to use type/2 if you are building a query without a schema (i.e. without specifying the types). If you provide a schema then you don’t need to repeat this information.

> Now I see that Ecto tries to combine input deserialization/validation with data source access layer which goes against my instincts.

The idea of this blog post is to show exactly this is not the case. If you define an Ecto schema, as mentioned above, then you can use the same schema (i.e. the same type information) for both input deserialization and data source layers but there is nothing in Ecto forcing you to do it. If you want to break it apart and first deserialize the data using certain schema and then convert the data into something else that you insert into the data store, using a different schema or using just pure Elixir data, you are free to do so.

Now is my turn: where did I miss the point? 🙂
Timur says:

May 24, 2016 at 5:37 am

Jose, thanks for discarding my concerns! 🙂 Now I see it was just my ignorance of Ecto.
Lance Johnson says:

July 19, 2016 at 9:26 pm

Thank you for these changes and for the example of using changeset with {data, types}. It was extremely helpful and exactly what I needed for our current project.