ecto « Plataformatec Blog

Building a new MySQL adapter for Ecto Part IV: Ecto Integration

Wojtek Mach — Fri, 04 Jan 2019 13:46:17 +0000

Welcome to the “Building a new MySQL adapter for Ecto” series:

Part I: Hello World
Part II: Encoding/Decoding
Part III: DBConnection Integration
Part IV: Ecto Integration (you’re here!)

After DBConnection integration we have a driver that should be usable on its own. The next step is to integrate it with Ecto so that we can:

leverage Ecto (doh!) meaning, among other things, using changesets to cast and validate data before inserting it into the DB, composing queries instead of concatenating SQL strings, defining schemas that map DB data into Elixir structs, being able to run Mix tasks like mix ecto.create and mix ecto.migrate, and finally using Ecto SQL Sandbox to manage clean slate between tests
tap into greater Ecto ecosystem: integration with the Phoenix Web framework, various pagination libraries, custom types, admin builders etc

Ecto Adapter

If you ever worked with Ecto, you’ve seen code like:

defmodule MyApp.Repo do
  use Ecto.Repo,
    adapter: Ecto.Adapters.MySQL,
    otp_app: :my_app
end

The adapter is a module that implements Ecto Adapter specifications:

Ecto.Adapter – minimal API required from adapters
Ecto.Adapter.Queryable – plan, prepare, and execute queries leveraging query cache
Ecto.Adapter.Schema – insert, update, and delete structs as well as autogenerate IDs
Ecto.Adapter.Storage – storage API used by e.g. mix ecto.create and mix ecto.drop
Ecto.Adapter.Transaction – transactions API

Adapters are required to implement at least Ecto.Adapter behaviour. The remaining behaviours are optional as some data stores don’t support transactions or creating/dropping the storage (e.g. some cloud services).

There’s also a separate Ecto SQL project which ships with its own set of adapter specifications on top of the ones from Ecto. Conveniently, it also includes a Ecto.Adapters.SQL module that we can use, which implements most of the callbacks and lets us worry mostly about generating appropriate SQL.

Ecto SQL Adapter

Let’s try using the Ecto.Adapters.SQL module:

defmodule MyXQL.EctoAdapter do
  use Ecto.Adapters.SQL,
    driver: :myxql,
    migration_lock: "FOR UPDATE"
end

When we compile it, we’ll get a bunch of warnings as we haven’t implemented any of the callbacks yet.

warning: function supports_ddl_transaction?/0 required by behaviour Ecto.Adapter.Migration is not implemented (in module MyXQL.EctoAdapter)
  lib/a.ex:1

warning: function MyXQL.EctoAdapter.Connection.all/1 is undefined (module MyXQL.EctoAdapter.Connection is not available)
  lib/a.ex:2

warning: function MyXQL.EctoAdapter.Connection.delete/4 is undefined (module MyXQL.EctoAdapter.Connection is not available)
  lib/a.ex:2

(...)

Notably, we get a module MyXQL.EctoAdapter.Connection is not available warning. The SQL adapter specification requires us to implement a separate connection module (see Ecto.Adapters.SQL.Connection behaviour) which will leverage, you guessed it, DBConnection. Let’s try that now and implement a couple of callbacks:

defmodule MyXQL.EctoAdapter.Connection do
  @moduledoc false
  @behaviour Ecto.Adapters.SQL.Connection

  @impl true
  def child_spec(opts) do
    MyXQL.child_spec(opts)
  end

  @impl true
  def prepare_execute(conn, name, sql, params, opts) do
    MyXQL.prepare_execute(conn, name, sql, params, opts)
  end
end

Since we’ve leveraged DBConnection in the MyXQL driver, these functions are simply delegating to driver. Let’s implement something a little bit more interesting.

Did you ever wonder how Ecto.Changeset.unique_constraint/3 is able to transform a SQL constraint violation failure into a changeset error? Turns out that unique_constriant/3 keeps a mapping between unique key constraint name and fields these errors should be reported on. The code that makes it work is executed in the repo and the adapter when the structs are persisted. In particular, the adapter should implement the Ecto.Adapters.SQL.Connection.to_constraints/1 callback. Let’s take a look:

iex> b Ecto.Adapters.SQL.Connection.to_constraints
@callback to_constraints(exception :: Exception.t()) :: Keyword.t()

Receives the exception returned by c:query/4.

The constraints are in the keyword list and must return the constraint type,
like :unique, and the constraint name as a string, for example:

    [unique: "posts_title_index"]

Must return an empty list if the error does not come from any constraint.

Let’s see how the constraint violation error looks exactly:

$ mysql -u root myxql_test
mysql> CREATE TABLE uniques (x INTEGER UNIQUE);
Query OK, 0 rows affected (0.17 sec)

mysql> INSERT INTO uniques VALUES (1);
Query OK, 1 row affected (0.08 sec)

mysql> INSERT INTO uniques VALUES (1);
ERROR 1062 (23000): Duplicate entry '1' for key 'x'

MySQL responds with error code 1062. We can further look into the error by using perror
command-line utility that ships with MySQL installation:

% perror 1062
MySQL error code 1062 (ER_DUP_ENTRY): Duplicate entry '%-.192s' for key %d

Ok, let’s finally implement the callback:

defmodule MyXQL.EctoAdapter.Connection do
  # ...

  @impl true
  def to_constraints(%MyXQL.Error{mysql: %{code: 1062}, message: message}) do
    case :binary.split(message, " for key ") do
      [_, quoted] -> [unique: strip_quotes(quoted)]
      _ -> []
    end
  end
end

Let’s break this down. We expect that the driver raises an exception struct on constraint violation, we then match on the particular error code, extract the field name from the error message, and return that as keywords list.

(To make this more understandable, in the MyXQL project we’ve added error code/name mapping so we pattern match like this instead: mysql: %{code: :ER_DUP_ENTRY}.)

To get a feeling of what other subtle changes we may have between data stores, let’s implement one more callback, back in the MyXQL.EctoAdapter module.

While MySQL has a BOOLEAN type, turns out it’s simply an alias to TINYINT and its possible values are 1 and 0. These sort of discrepancies are handled by the dumpers/2 and loaders/2 callbacks, let’s implement the latter:

defmodule MyXQL.EctoAdapter do
  # ...

  @impl true
  def loaders(:boolean, type), do: [&bool_decode/1, type]
  # ...
  def loaders(_, type),        do: [type]

  defp bool_decode(<<0>>), do: {:ok, false}
  defp bool_decode(<<1>>), do: {:ok, true}
  defp bool_decode(0), do: {:ok, false}
  defp bool_decode(1), do: {:ok, true}
  defp bool_decode(other), do: {:ok, other}
end

Integration Tests

As you can see there might be quite a bit of discrepancies between adapters and data stores. For this reason, besides providing adapter specifications, Ecto ships with integration tests that can be re-used by adapter libraries.

Here’s a set of basic integration test cases and support files in Ecto, see: ./integration_test/ directory.

And here’s an example how a separate package might leverage these. Turns out that ecto_sql uses ecto integration tests:

# ecto_sql/integration_test/mysql/all_test.exs
ecto = Mix.Project.deps_paths[:ecto]
Code.require_file "#{ecto}/integration_test/cases/assoc.exs", __DIR__
Code.require_file "#{ecto}/integration_test/cases/interval.exs", __DIR__
# ...

and has a few of its own.

When implementing a 3rd-party SQL adapter for Ecto we already have a lot of integration tests to run against!

Conclusion

In this article we have briefly looked at integrating our driver with Ecto and Ecto SQL.

Ecto helps with the integration by providing:

adapter specifications
a Ecto.Adapters.SQL module that we can use to build adapters for relational databases even faster
integration tests

We’re also concluding our adapter series. Some of the overarching themes were:

separation of concerns: we’ve built our protocol packet encoding/decoding layer stateless and separate from a process model which in turn made DBConnection integration more straight-forward and resulting codebase easier to understand. Ecto also exhibits a separation of concerns: not only we have separate changeset, repo, adapter etc, within adapter we have different aspects of talking to data stores like storage, transactions, connection etc.
behaviours, behaviours, behaviours! Not only behaviours provide a thought-through way of organizing the code as contracts, as long as we adhere to those contracts, features like e.g. DBConnection resilience and access to Ecto tooling and greater ecosystem becomes avaialble.

As this article is being published, we’re getting closer to shipping MyXQL’s first release as well as making it the default MySQL adapter in upcoming Ecto v3.1. You can see the progress on elixir-ecto/ecto_sql#66.

Happy coding!

The post Building a new MySQL adapter for Ecto Part IV: Ecto Integration first appeared on Plataformatec Blog.

Building a new MySQL adapter for Ecto, Part III: DBConnection Integration

Wojtek Mach — Fri, 21 Dec 2018 16:36:56 +0000

Welcome to the “Building a new MySQL adapter for Ecto” series:

Part I: Hello World
Part II: Encoding/Decoding
Part III: DBConnection Integration (you’re here!)
Part IV: Ecto Integration

In the first two articles of the series we have learned the basic building blocks for interacting with a MySQL server using its binary protocol over TCP.

To have a production-quality driver, however, there’s more work to do. Namely, we need to think about:

maintaining a connection pool to talk to the DB efficiently from multiple processes
not overloading the DB
attempting to re-connect to the DB if connection is lost
supporting common DB features like prepared statements, transactions, and streaming

In short, we need: reliability, performance, and first-class support for common DB features. This is where DBConnection comes in.

DBConnection

DBConnection is a behaviour module for implementing efficient database connection client processes, pools and transactions. It has been created by Elixir and Ecto Core Team member James Fish and has been introduced in Ecto v2.0.

Per DBConnection documentation we can see how it addresses concerns mentioned above:

DBConnection handles callbacks differently to most behaviours. Some callbacks will be called in the calling process, with the state copied to and from the calling process. This is useful when the data for a request is large and means that a calling process can interact with a socket directly.

A side effect of this is that query handling can be written in a simple blocking fashion, while the connection process itself will remain responsive to OTP messages and can enqueue and cancel queued requests.

If a request or series of requests takes too long to handle in the client process a timeout will trigger and the socket can be cleanly disconnected by the connection process.

If a calling process waits too long to start its request it will timeout and its request will be cancelled. This prevents requests building up when the database cannot keep up.

If no requests are received for a period of time the connection will trigger an idle timeout and the database can be pinged to keep the connection alive.

Should the connection be lost, attempts will be made to reconnect with (configurable) exponential random backoff to reconnect. All state is lost when a connection disconnects but the process is reused.

The DBConnection.Query protocol provide utility functions so that queries can be prepared or encoded and results decoding without blocking the connection or pool.

Let’s see how we can use it!

DBConnection Integration

We will first create a module responsible for implementing DBConnection callbacks:

defmodule MyXQL.Protocol do
  use DBConnection
end

When we compile it, we’ll get a bunch of warnings about callbacks that we haven’t implemented yet.

Let’s start with the connect/1 callback and while at it, add some supporting code:

defmodule MyXQL.Error do
  defexception [:message]
end

defmodule MyXQL.Protocol do
  @moduledoc false
  use DBConnection
  import MyXQL.Messages
  defstruct [:sock]

  @impl true
  def connect(opts) do
    hostname = Keyword.get(opts, :hostname, "localhost")
    port = Keyword.get(opts, :port, 3306)
    timeout = Keyword.get(opts, :timeout, 5000)
    username = Keyword.get(opts, :username, System.get_env("USER")) || raise "username is missing"
    sock_opts = [:binary, active: false]

    case :gen_tcp.connect(String.to_charlist(hostname), port, sock_opts) do
      {:ok, sock} ->
        handshake(username, timeout, %__MODULE__{sock: sock})

      {:error, reason} ->
        {:error, %MyXQL.Error{message: "error when connecting: #{inspect(reason)}"}}

      err_packet(message: message) ->
        {:error, %MyXQL.Error{message: "error when performing handshake: #{message}"}}
    end
  end

  @impl true
  def checkin(state) do
    {:ok, state}
  end

  @impl true
  def checkout(state) do
    {:ok, state}
  end

  @impl true
  def ping(state) do
    {:ok, state}
  end

  defp handshake(username, timeout, state) do
    with {:ok, data} <- :gen_tcp.recv(state.sock, 0, timeout),
         initial_handshake_packet() = decode_initial_handshake_packet(data),
         data = encode_handshake_response_packet(username),
         :ok <- :gen_tcp.send(state.sock, data),
         {:ok, data} <- :gen_tcp.recv(state.sock, 0, timeout),
         ok_packet() <- decode_handshake_response_packet(data) do 
      {:ok, sock}
    end
  end
end

defmodule MyXQL do
  @moduledoc "..."

  @doc "..."
  def start_link(opts) do 
    DBConnection.start_link(MyXQL.Protocol, opts)
  end
end

That’s a lot to unpack so let’s break this down:

per documentation, connect/1 must return {:ok, state} on success and {:error, exception} on failure. Our connection state for now will be just the socket. (In a complete driver we’d use the state to manage prepared transaction references, status of transaction etc.) On error, we return an exception.
we extract configuration from keyword list opts and provide sane defaults * we try to connect to the TCP server and if successful, perform the handshake.
as we’ve learned in part I, the handshake goes like this: after connecting to the socket, we receive the “Initial Handshake Packet”. Then, we send “Handshake Response” packet. At the end, we receive the response and decode the result which can be an “OK Pacet” or an “ERR Packet”. If we receive any socket errors, we ignore them for now. We’ll talk about handling them better later on.
finally, we introduce a public MyXQL.start_link/1 that is an entry point to the driver
we also provide minimal implementations for checkin/1, checkout/1 and ping/1 callbacks

It’s worth taking a step back at looking at our overall design:

MyXQL module exposes a small public API and calls into an internal module
MyXQL.Protocol implements DBConnection behaviour and is the place where all side-effects are being handled
MyXQL.Messages implements pure functions for encoding and decoding packets This separation is really important. By keeping protocol data separate from protocol interactions code we have a codebase that’s much easier to understand and maintain.

Prepared Statements

Let’s take a look at handle_prepare/3 and handle_execute/4 callbacks that are used to
handle prepared statements:

iex> b DBConnection.handle_prepare
@callback handle_prepare(query(), opts :: Keyword.t(), state :: any()) ::
            {:ok, query(), new_state :: any()}
            | {:error | :disconnect, Exception.t(), new_state :: any()}

Prepare a query with the database. Return {:ok, query, state} where query is a
query to pass to execute/4 or close/3, {:error, exception, state} to return an
error and continue or {:disconnect, exception, state} to return an error and
disconnect.

This callback is intended for cases where the state of a connection is needed
to prepare a query and/or the query can be saved in the database to call later.

This callback is called in the client process.

iex> b DBConnection.handle_execute
@callback handle_execute(query(), params(), opts :: Keyword.t(), state :: any()) ::
            {:ok, query(), result(), new_state :: any()}
            | {:error | :disconnect, Exception.t(), new_state :: any()}

Execute a query prepared by c:handle_prepare/3. Return {:ok, query, result,
state} to return altered query query and result result and continue, {:error,
exception, state} to return an error and continue or {:disconnect, exception,
state} to return an error and disconnect.

This callback is called in the client process.

Notice the callbacks reference types like: query(), result() and params().
Let’s take a look at them too:

iex> t DBConnection.result
@type result() :: any()

iex> t DBConnection.params
@type params() :: any()

iex> t DBConnection.query
@type query() :: DBConnection.Query.t()

As far as DBConnection is concerned, result() and params() can be any term (it’s up to us to define these) and the query() must implement the DBConnection.Query protocol.

DBConnection.Query is used for preparing queries, encoding their params, and decoding their
results. Let’s define query and result structs as well as minimal protocol implementation.

defmodule MyXQL.Result do
  defstruct [:columns, :rows]
end

defmodule MyXQL.Query do
  defstruct [:statement, :statement_id]

  defimpl DBConnection.Query do
    def parse(query, _opts), do: query

    def describe(query, _opts), do: query

    def encode(_query, params, _opts), do: params

    def decode(_query, result, _opts), do: result
  end
end

Let’s define the first callback, handle_prepare/3:

defmodule MyXQL.Protocol do
  # ...

  @impl true
  def handle_prepare(%MyXQL.Query{statement: statement}, _opts, state) do
    data = encode_com_stmt_prepare(query.statement)

    with :ok <- sock_send(data, state),
         {:ok, data} <- sock_recv(state),
         com_stmt_prepare_ok(statement_id: statement_id) <- decode_com_stmt_prepare_response(data) do
      query = %{query | statement_id: statement_id}
      {:ok, query, state}
    else
      err_packet(message: message) ->
        {:error, %MyXQL.Error{message: "error when preparing query: #{message}"}, state}

      {:error, reason} ->
        {:disconnect, %MyXQL.Error{message: "error when preparing query: #{inspect(reason)}"}, state}
    end
  end

  defp sock_send(data, state), do: :gen_tcp.recv(state.sock, data, :infinity)

  defp sock_recv(state), do: :gen_tcp.recv(state.sock, :infinity)
end

The callback receives query, opts (which we ignore), and state. We encode the query statement into COM_STMT_PREPARE packet, send it, receive response, decode the COM_STMT_PREPARE Response, and put the retrieved statement_id into our query struct.

If we receive an ERR Packet, we put the error message into our MyXQL.Error exception and return that.

The only places that we could get {:error, reason} tuple is we could get it from are the gen_tcp.send,recv calls – if we get an error there it means there might be something wrong with the socket. By returning {:disconnect, _, _}, DBConnection will take care of closing the socket and will attempt to re-connect with a new one.

Note, we set timeout to :infinity on our send/recv calls. That’s because DBConnection is managing the process these calls will be executed in and it maintains it’s own timeouts. (And if we hit these timeouts, it cleans up the socket automatically.)

Let’s now define the handle_execute/4 callback:

defmodule MyXQL.Protocol do
  # ...

  @impl true
  def handle_execute(%{statement_id: statement_id} = query, params, _opts, state)
      when is_integer(statement_id) do
    data = encode_com_stmt_execute(statement_id, params)

    with :ok <- sock_send(state, data),
         {:ok, data} <- sock_recv(state),
         resultset(columns: columns, rows: rows) = decode_com_stmt_execute_response() do
      columns = Enum.map(columns, &column_definition(&1, :name))
      result = %MyXQL.Result{columns: columns, rows: rows}
      {:ok, query, result, state}
    else
      err_packet(message: message) ->
        {:error, %MyXQL.Error{message: "error when preparing query: #{message}"}, state}

      {:error, reason} ->
        {:disconnect, %MyXQL.Error{message: "error when preparing query: #{inspect(reason)}"}, state}
    end
  end
end

defmodule MyXQL.Messages do
  # ...

  # https://dev.mysql.com/doc/internals/en/com-query-response.html#packet-ProtocolText::Resultset
  defrecord :resultset, [:column_count, :columns, :row_count, :rows, :warning_count, :status_flags]

  def decode_com_stmt_prepare_response(data) do
    # ...
    resultset(...)
  end

  # https://dev.mysql.com/doc/internals/en/com-query-response.html#packet-Protocol::ColumnDefinition41
  defrecord :column_definition, [:name, :type]
end

Let’s break this down.

handle_execute/4 receives an already prepared query, params to encode, opts, and the state.

Similarly to handle_prepare/3, we encode the COM_STMT_EXECUTE packet, send it and receive a response, decode COM_STMT_EXECUTE Response, into a resultset record, and finally build the result struct.

Same as last time, if we get an ERR Packet we return an {:error, _, _} response; on socket problems, we simply disconnect and let DBConnection handle re-connecting at later time.

We’ve mentioned that the DBConnection.Query protocol is used to prepare queries, and in fact we could perform encoding of params and decoding the result in implementation functions. We’ve left that part out for brevity.

Finally, let’s add a public function that users of the driver will use:

defmodule MyXQL do
  # ...

  def prepare_execute(conn, statement, params, opts) do
    query = %MyXQL.Query{statement: statement}
    DBConnection.prepare_execute(conn, query, params, opts)
  end
end

and see it all working.

iex> {:ok, pid} = MyXQL.start_link([])
iex> MyXQL.prepare_execute(pid, "SELECT ?", [42], [])
{:ok, %MyXQL.Query{statement: "SELECT ? + ?", statement_id: 1},
%MyXQL.Result{columns: ["? + ?"], rows: [[5]]}}

Arguments to MyXQL.start_link are passed down to
DBConnection.start_link/2,
so starting a pool of 2 connections is as simple as:

iex> {:ok, pid} = MyXQL.start_link(pool_size: 2)

Conclusion

In this article, we’ve seen a sneak peek of integration with the DBConnection library. It gave us
many benefits:

a battle-tested connection pool without writing a single line of pooling code
we can use blocking :gen_tcp functions without worrying about OTP messages and timeouts;
DBConnection will handle these
automatic re-connection, backoff etc
a way to structure our code

With this, we’re almost done with our adapter series. In the final article we’ll use our driver as an Ecto adapter. Stay tuned!

The post Building a new MySQL adapter for Ecto, Part III: DBConnection Integration first appeared on Plataformatec Blog.

Building a new MySQL adapter for Ecto, Part II: Encoding/Decoding

Wojtek Mach — Mon, 03 Dec 2018 16:07:11 +0000

Welcome to the “Building a new MySQL adapter for Ecto” series:

Part I: Hello World
Part II: Encoding/Decoding (you’re here!)
Part III: DBConnection Integration
Part IV: Ecto Integration

Last time we briefly looked at encoding and decoding data over MySQL wire protocol. In this article we’ll dive deeper into that topic, let’s get started!

Basic Types

MySQL protocol has two “Basic Data Types“: integers and strings. Within integers we have fixed-length and length-encoded integers.
The simplest type is int<1> which is an integer stored in 1 byte.

To recap, MySQL is using little endianess when encoding/decoding integers as binaries. Let’s define a function that takes an int<1> from the given binary and returns the rest of the binary:

defmodule MyXQL.Types do
  def take_int1(data) do
    <> = data
    {value, rest}
  end
end

iex> MyXQL.Types.take_int1(<<1, 2, 3>>)
{1, <<2, 3>>}

We can generalize this function to accept any fixed-length integer:

def take_fixed_length_integer(data, size) do
  <> = data
  {value, rest}
end

iex> MyXQL.Types.take_fixed_length_integer(<<1, 2, 3>>, 2)
{513, <<3>>}

(See <<>>/1 for more information on bitstrings.)

Decoding a length-encoded integer is slightly more complicated.
Basically, if the first byte value is less than 251, then it’s a 1-byte integer; if the first-byte is 0xFC, then it’s a 2-byte integer and so on up to a 8-byte integer:

def take_length_encoded_int1(<>) when int < 251, do: {int, rest}

def take_length_encoded_int2(<<0xFC, int::16-little-integer, rest::binary>>), do: {int, rest}

def take_length_encoded_int3(<<0xFD, int::24-little-integer, rest::binary>>), do: {int, rest}

def take_length_encoded_int8(<<0xFE, int::64-little-integer, rest::binary>>), do: {int, rest}

iex> MyXQL.Types.take_length_encoded_int1(<<1, 2, 3>>)
{1, <<2, 3>>}

iex> MyXQL.Types.take_length_encoded_int2(<<0xFC, 1, 2, 3>>)
{513, <<3>>}

Can we generalize this function to a single binary pattern match, the same way we did with take_fixed_length_integer/2? Unfortunately we can’t. Our logic is essentially a case with 4 clauses and such cannot be used in pattern matches.
For this reason, the way we decode data is by reading some bytes, decoding them, and returning the rest of the binary.

It’s a shame that MySQL doesn’t encode the size of the binary in the first byte because otherwise our decode function could be easily implemented in a single binary pattern match, e.g.:

iex> <> = <<2, 1, 2, 3>>
iex> {value, rest}
{513, <<3>>}

In fact, it’s common for protocols to encode data as Type-Length-Value (TLV) which as you can see above, it’s very easy to implement with Elixir.

In any case, we can still leverage binary pattern matching in the function head. Here’s our final take_length_encoded_integer/1 function:

def take_length_encoded_integer(<>) when int < 251, do: {int, rest}
def take_length_encoded_integer(<<0xFC, int::int(2), rest::binary>>), do: {int, rest}
def take_length_encoded_integer(<<0xFD, int::int(3), rest::binary>>), do: {int, rest}
def take_length_encoded_integer(<<0xFE, int::int(8), rest::binary>>), do: {int, rest}

There’s one last thing that we can do. Because take_fixed_length_integer/2 is so simple and basically uses a single binary pattern match (in particular, it does not have a case statement), we can replace it with a macro instead. All we need to do is to emit little-integer-size(size)-unit(8) AST so that we can use it in a bitstring; that’s easy:

defmacro int(size) do
  quote do
    little-integer-size(unquote(size))-unit(8)
  end
end

Because it’s a macro we need to require or import it to use it:

iex> import MyXQL.Types

iex> <> = <<1, 2, 3>>
iex> {value, rest}
{1, <<2, 3>>}

iex> <> = <<1, 2, 3>>
iex> {value, rest}
{513, <<3>>}

A really nice thing about using a macro here is we get encoding for free:

iex> <<513::int(2)>>
<<1, 2>>

We could write a macro for encoding length-encoded integers (we could even invoke it as 513::int(lenenc) to mimic the spec, by adjusting int/1 macro) but I decided against it as it won’t be usable in a binary pattern match.

Encoding/decoding MySQL strings is very similar so we will not be going over that and we’ll jump into the next section on bit flags. (Sure enough, working with strings would be easy, even in binary pattern matches, if not for an EOF-terminated string and string types.)

Bit Flags

MySQL provides “Capability Flags” like:

CLIENT_PROTOCOL_41 0x00000200
CLIENT_SECURE_CONNECTION 0x00008000
CLIENT_PLUGIN_AUTH 0x00080000

The idea is we represent a set of capabilities as a single integer on which we can use Bitwise operations like: 0x00000200 ||| 0x00008000, flags &&& 0x00080000 etc.

We definitely don’t want to pass these “magic” bytes around so we should encapsulate them somehow.
We could store them as module attributes, e.g.: @client_protocol_41 0x00000200; if we mistype the name of the flag, we’ll get a helpful compiler warning. Using functions, however, gives us a bit more flexibility as we can generate great error messages as well as “hide” usage of bitwise operations underneath.
Let’s implement a function that checks whether given flags has a given capability:

defmodule MyXQL.Messages do
  use Bitwise

  def has_capability_flag?(flags, :client_protocol_41), do: (flags &&& 0x00000200) == 0x00000200
  def has_capability_flag?(flags, :client_secure_connection), do: (flags &&& 0x00008000) == 0x00008000
  def has_capability_flag?(flags, :client_plugin_auth), do: (flags &&& 0x00080000) == 0x00080000
  # ...
end

iex> MyXQL.Messages.has_capability_flag?(0, :client_protocol_41)
false
iex> MyXQL.Messages.has_capability_flag?(0x00000200, :client_protocol_41)
true

iex> MyXQL.Messages.has_capability_flag?(0x00000200, :bad)
** (FunctionClauseError) no function clause matching in MyXQL.Messages.has_capability_flag?/2

    The following arguments were given to MyXQL.Messages.has_capability_flag?/2:

        # 1
        512

        # 2
        :bad

    Attempted function clauses (showing 3 out of 3):

        def has_capability_flag?(flags, :client_protocol_41)
        def has_capability_flag?(flags, :client_secure_connection)
        def has_capability_flag?(flags, :client_plugin_auth)

This is a very useful error message, we can see what are all available capabilities. If we want something more customized, all we need to do is define an additional catch-all clause at the end:

def has_capability_flag?(flags, other) do
  raise ...
end

and raise an error there. That way we could, for example, implement a “Did you mean?” hint.

Last but not least, instead of manually defining each function head by hand, we can use Elixir meta-programming capabilities to define them at compile time:

capability_flags = [
  client_protocol_41: 0x00000200,
  client_secure_connection: 0x00008000,
  client_plugin_auth: 0x00080000,
]

for {name, value} <- capability_flags do
  def has_capability_flag?(flags, unquote(name)), do: (flags &&& unquote(value)) == unquote(value)
end

Packets

Finally, let’s bring this all together to handle packets. We need a data structure that’s going to store packet fields and we basically have two options: structs and records. Structs are great when data has to be sent between modules, especially because they are polymorphic. However, when the data belongs to a single module, or separate modules that are considered private API, using records may make more sense as they are more space efficient. Let’s verify that using :erts_debug module and instead of comparing structs and records let’s just compare their internal representations: maps and tuples, respectively:

iex> :erts_debug.size(%{x: 1})
6
iex> :erts_debug.size(%{x: 1, y: 2})
8
iex> :erts_debug.size(%{x: 1, y: 2, z: 3})
10

iex> :erts_debug.size({:Point, 1})
3
iex> :erts_debug.size({:Point, 1, 2})
4
iex> :erts_debug.size({:Point, 1, 2, 3})
5

As you can see, as we add more keys to the map our data structure grows twice as fast and the reason is we store both keys and values whereas tuple stores the size of the tuple once and then just values.
Since we may be processing thousands of packets per second, this difference may add up, so we’re going to use records here.

The final packet we discussed in the last article was the OK Packet. Let’s now write a function to decode it (it’s not fully following the spec for brevity):

# https://dev.mysql.com/doc/internals/en/packet-OK_Packet.html
defrecord :ok_packet, [:affected_rows, :last_insert_id, :status_flags, :warning_count]

def decode_ok_packet(data, capability_flags) do
  <<0x00, rest::binary>> = data

  {affected_rows, rest} = take_length_encoded_integer(rest)
  {last_insert_id, rest} = take_length_encoded_integer(rest)

  packet = ok_packet(
    affected_rows: affected_rows,
    last_insert_id: last_insert_id
  )

  if has_capability_flag?(capability_flags, :client_protocol_41) do
    <<
      status_flags::int(2),
      warning_count::int(2)
    >> = rest

    ok_packet(packet,
      status_flags: status_flags,
      warning_count: warning_count
    )
  else
    packet
  end
end

And let’s test this with the OK packet we got at the end of the last article (00 00 00 02 00 00 00):

iex> ok_packet(affected_rows: affected_rows) = decode_ok_packet(<<0x00, 0x00, 0x00, 0x02, 0x00, 0x00, 0x00>>, 0x00000200)
iex> affected_rows
0

It works!

Conclusion

In this article, we discussed encoding and decoding basic data types, handling bit flags, and finally using both of these ideas to decode packets. Using these tools we should be able to fully implement MySQL protocol specification and with examples of :gen_tcp.send/2 and :gen_tcp.recv/2 calls from Part I, we could interact with the server. However, that’s not enough to build a resilient and production-quality driver. For that, we’ll look into DBConnection integration in Part III. Stay tuned!

The post Building a new MySQL adapter for Ecto, Part II: Encoding/Decoding first appeared on Plataformatec Blog.

Building a new MySQL adapter for Ecto, Part I: Hello World

Wojtek Mach — Wed, 14 Nov 2018 13:16:32 +0000

As you may have seen in the announcement, Plataformatec is working on a new MySQL driver called MyXQL.

Writing a complete driver involves quite a bit of work. To name just a few things, we need to support: all protocol messages and data types, authentication schemes, connection options (TCP/SSL/UNIX domain socket), transactions and more. Rather than going through all of these in detail, I plan to distill this knowledge into 4 parts, each with a quick overview of a given area:

This also mimics how I approached the development of the library, my end goal was to integrate with Ecto and I wanted to be integrating end-to-end as soon and as often as possible. Rather than implementing each part fully, I implemented just enough to move forward knowing I can later go back and fill in remaining details. Without further ado, let’s get started!

Hello World

Our “Hello World” will involve performing a “handshake”: connecting to a running MySQL server and authenticating a user. To avoid getting bogged down in authentication details, the simplest possible thing to do is to log in as user without password. Let’s create one:

$ mysql --user=root -e "CREATE USER myxql_test"

We can check if everything went well by trying to log in as that user:

$ mysql --user=myxql_test -e "SELECT NOW()"
+---------------------+
| NOW()               |
+---------------------+
| 2018-10-04 18:35:11 |
+---------------------+

If you don’t have MySQL installed, I recommend setting it up via Homebrew, if you’re on macOS, or Docker. I ended up using Docker because I knew I needed to test on multiple server versions. Here’s how I set it up:

$ docker run --publish=3306:3306 --name myxql_test -e MYSQL_ROOT_PASSWORD=secret -d mysql:8.0.12
# note we connect via TCP, instead of the default UNIX domain socket:
$ mysql --protocol=tcp --user=root --password=secret -e "CREATE USER myxql_test;"

$ mysql --protocol=tcp --user=myxql_test -e "SELECT NOW()"
+---------------------+
| NOW()               |
+---------------------+
| 2018-10-04 18:40:04 |
+---------------------+

We can now connect to the server from IEx session:

iex> {:ok, sock} = :gen_tcp.connect('127.0.0.1', 3306, [:binary, active: false], 5000)
{:ok, #Port<0.6>}

Let’s break this down. :gen_tcp.connect/4 accepts:

Hostname (as charlist)
Port
Options (as proplist); by default, data from the socket is returned as iolist, however for us binary will be more convenient to work with, so we pass :binary option.
active: false means we’ll work with the socket in “passive mode”, meaning we’ll read data using blocking :gen_tcp.recv/3 call.
Timeout (in milliseconds)

Let’s now read data from the socket: (0 means we read all available bytes, 5000 is the timeout in milliseconds)

iex> {:ok, data} = :gen_tcp.recv(sock, 0, 5000)
iex> data
<<74, 0, 0, 0, 10, 56, 46, 48, 46, 49, 50, 0, 12, 0, 0, 0, 11, 9, 19, 27, 96, 108, 77, 116, 0, 255, 255, 255, 2, 0, 255, 195, 21, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 37, 62, 29, 59, 1, ...>>

To make sense of this, we’re gonna need to look into MySQL manual.
Each MySQL packet has 3 elements: length of the payload (3-byte integer), sequence id (1-byte integer), and payload.
In this case, the actual payload is the “Initial Handshake Packet”. Let’s extract the payload part using binary matching (see <<>>/1 for more information on binary matching):

iex> <> = data
iex> payload_length
4849664
iex> byte_size(payload)
74

Wait, the size of the payload is 74 so why payload_length is 4849664?! Numerical values when stored in a binary have “endianness” which basically means whether we should read bits/bytes from “little-end” (least significant bit) or “big-end” (most significant bit).
Thus, a 3-byte integer <<74, 0, 0>> in “big-endian” is indeed 4849664 but in “little-endian” it’s 74. Fortunately, bitstring syntax has great support for endianess and it’s as easy as adding little modifier (“big-endian” is the default):

iex> <> = data
iex> payload_length
74

To make sense of the remaining payload we’re gonna use the binpp package:

iex> :binpp.pprint(payload)
0000 0A 38 2E 30 2E 31 32 00 0F 00 00 00 27 73 79 59  .8.0.12.....'syY
0001 7A 34 26 3B 00 FF FF FF 02 00 FF C3 15 00 00 00  z4&;.ÿÿÿ..ÿÃ....
0002 00 00 00 00 00 00 00 43 55 6B 60 74 5A 71 08 75  .......CUk`tZq.u
0003 6F 08 2F 00 63 61 63 68 69 6E 67 5F 73 68 61 32  o./.caching_sha2
0004 5F 70 61 73 73 77 6F 72 64 00                    _password.

We can see up to 16 bytes in each row and at the far right we have ASCII interpretation of each byte. Per “Initial Handshake Packet” the first byte is the protocol version, always 10 (0x0A), and what follows is a null-terminated server version string. Let’s extract that:

iex> <<10, rest::binary>> = payload
iex> [server_version, rest] = :binary.split(rest, <<0x00>>)
iex> server_version
"8.0.12"

We can parse the server version, that’s a good start! There are other fields in this packet that in a complete adapter we’d have to handle, but for now we’ll simply ignore them. We’ll just take a note of the authentication method at the end to the packet, a null-terminated string "caching_sha2_password".

After receiving “Initial Handshake Packet” the client is supposed to send “Handshake Response”. We’ll again just gloss over the details:

iex> use Bitwise
iex> capability_flags = 0x00000200 ||| 0x00008000 ||| 0x00080000
iex> max_packet_size = 65535
iex> charset = 0x21
iex> username = "myxql_test"
iex> auth_response = <<0x00>>
iex> client_auth_plugin = "caching_sha2_password"
iex> payload = <<
       capability_flags::32-little,
       max_packet_size::32-little,
       charset, 0::8*23,
       username::binary, 0x00,
       auth_response::binary,
       client_auth_plugin::binary, 0x00
>>
iex> sequence_id = 1
iex> data = <>

Let’s break this down:

First, we use CLIENT_PROTOCOL_41,CLIENT_SECURE_CONNECTION, and CLIENT_PLUGIN_AUTH capability flags using “bitwise OR”. Secondly, we set the max packet size, charset (0x21 is utf8_general_ci), filler (0s repeated 23 times), username, auth response (empty password is a null byte), and auth plugin name. Note, we encode username and client_auth_plugin as null-terminated strings. Finally, we generate payload and encode it in a packet with payload length and sequence id (it’s 2nd packet so sequence id is 1). Let’s now send this and receive response from the server:

iex> :ok = :gen_tcp.send(sock, data)
iex> {:ok, data} = :gen_tcp.recv(sock, 0)
iex> <> = data
iex> :binpp.pprint(payload)
0000 00 00 00 02 00 00 00

The first byte of the response is 0x00 which corresponds to the OK_Packet, authentication succedded! Even though we’ve glossed over many details, we’ve shown that we can integrate with the server end-to-end and that’s going to be a foundation we’ll built upon. There are many more packets that we’ll need to encode or decode and we’re gonna need a more structured approach which we will discuss in part II.

The post Building a new MySQL adapter for Ecto, Part I: Hello World first appeared on Plataformatec Blog.

Updating Hex.pm to Ecto 3.0

Wojtek Mach — Thu, 25 Oct 2018 18:50:01 +0000

Ecto 3.0 is just around the corner and as you may already know it reached stable API. To make sure everything works properly I thought I’ll try updating one of the first projects that ever used Ecto: Hex.pm.

The whole upgrade was done in a single pull request, which we will break down below.

First, the required steps:

Update mix.exs to depend on ecto_sql and bump the postgrex dependency. Note: SQL handling have been extracted out into a separate ecto_sql project, so we need to add that new dependency. (6b3b78cf)
DBConnection 2.0 no longer ships with Sojourn and Poolboy pools, so we can remove the pool configuration and use the default pool implementation. (760026f3)
Speaking of pools, we need to make sure pool_size is at least 2 when running migrations.
JSON library is now set on the adapter and not on Ecto (e16ebd8f) and because we were already using the recommended package, Jason, we don’t need that configuration anymore. (66f9cbdf)
Ecto 3.0 makes date/time handling stricter with regards to precision. So we need to either update the types of our schema fields or make sure we truncate date/time values. For example, when we define a field as time we can’t put value with microsecond precision and similarly we can’t put into a time_usec field a value without microsecond precision. (2e34b833)
Constraint handling functions like Ecto.Changeset.unique_constraint/3 are now including in the error metadata the type and the name of the constraint, which broke our test that was overly specific. (3d19f903)

Secondly, we got a couple deprecation warnings so here are the fixes:

Adapter is now set when defining Repo module and not in the configuration. (d3911953)
Ecto.Multi.run/3 now accepts a 2-arity function (first argument is now the Repo) instead of a 1-arity one before. (95d11cc2)

Finally, there were a few minor glitches (or redundancies!) specific to Hex.pm: c4168977, 21eb0bf8, and 0929cd9e.

Overall the update process was pretty straightforward. There were a few minor bugs along the way which were promptly fixed upstream. Having previously updated Hex.pm to Ecto 2.0, which took a few months (we started it early on, which made it a fast moving target back then), I can really appreciate the level of maturity that Ecto achieved and how easy it was to update this time around.

Update: Add note about pool_size when running migrations.

We are glad to be an active part of the process by contributing to the ecosystem and helping companies adopt Elixir at different stages of their adoption cycle. We would love to work with you too!

Do you think that your company could benefit from our expertise and assistance? Just click the button below.

The post Updating Hex.pm to Ecto 3.0 first appeared on Plataformatec Blog.

A sneak peek at Ecto 3.0: performance, migrations and more

José Valim — Mon, 22 Oct 2018 17:41:40 +0000

Welcome to the “A sneak peek at Ecto 3.0” series:

Breaking changes
Query improvements part 1
Query improvements part 2
Performance, migrations and more (you are here!)

We are back for one last round! This time we are going to cover improvements on three main areas: performance, upserts and migrations. If you would like to give Ecto a try right now, note Ecto v3.0.0-rc.0 has been released and we are looking forward to your feedback.

Better memory usage

One of the most notable performance improvements in Ecto 3.0 is that schemas loaded from an Ecto repository now uses less memory.

A big part of the memory improvements seen in Ecto 3.0 comes from better management of schema metadata. Every instance you have of an Ecto.Schema, such as a %User{}, has a metadata field with life-cycle information about that entry, such as the database prefix or its state (was it just built or was it loaded from the database?). This metadata field takes exactly 16 words:

iex> :erts_debug.size %Ecto.Schema.Metadata{}
16

16 words for a 64-bits machine is equivalent to 128 bytes. This means that, if you were using Ecto 2.0 and you loaded 1000 entries, 128 kbytes of memory would be used only for storing this metadata. The good news is that all of those 1000 entries could use the exact same metadata! That’s what we did in this commit. This means that, if you load 1000 or 1000000 entries, the cost is always the same, only 128 bytes!

After we announced Ecto 3.0-rc, we started to hear some teams already upgraded to Ecto 3.0-rc. Some of those repos are quite big and it took them less than a day to upgrade, which is exactly how upgrading to major software versions should be.

Ben Wilson, Principal Engineer at CargoSense, upgraded one of their apps to Ecto 3.0-rc and pushed it to production. Here is the result:

You can see the drop in memory usage from Ecto 2 to Ecto 3 at the moment of the deployment. This particular app loads a bunch of data during boot and we can clearly see the impact those improvements have in the memory usage. Once the system stabilized, the average memory use is 15% less altogether.

But that’s not all!

We also changed Ecto 3.0 to make use of the Erlang VM literal pool, which allows us to share the metadata across queries. For example, if you have two queries, each returning 1000 posts, all 2000 posts will point to the same metadata. These improvements alongside other changes to reduce struct allocation should reduce Ecto’s memory usage as a whole.

Statement cache for INSERT/UPDATE/DELETE

Another notable performance improvement in Ecto 3.0 comes from the fact Ecto now automatically caches statements emitted by Ecto.Repo.insert/update/delete.

Consider this code:

for i <- 1..1000 do
  Repo.insert!(%Post{visits: i})
end

where Post is a schema with 13 fields. When running this code on my machine against a Postgres database with a pool of 10 connections, it takes 900ms to insert all 1000 posts. While Ecto has always cached select queries, once we also added the statement cache to Ecto.Repo.insert/update/delete, the total operation time is reduced 610ms!

But that’s not all!

Part of the issue here is that every time we call Repo.insert!, Ecto needs to get a new connection out of the connection pool, perform the insert, and give the connection back. For a pool with 10 connections, there is a chance the next connection we pick up is not “warm” and we may not hit the statement cache. While it is important to not hold connections for long, so we can best utilize the database resources, in this scenario we know we will perform many operations in a row.

For this reason, Ecto 3.0 includes a Repo.checkout operation, which allows you to tell the Ecto repository you want to use the same connection, skipping the connection pool and always using a “warm” connection:

Repo.checkout(fn ->
  for i <- 1..1000 do
    Repo.insert!(%Post{visits: i})
  end
end)

With the change above, all of the inserts take 420ms on average.

There is one final trick we could use. Since we are performing multiple inserts, we could simply replace Repo.checkout by Repo.transaction. The transaction also checks out a single connection but it also allows the database itself to be more efficient. With this final change, the total time falls down to 320ms. And if you really need to go faster, you can always use Ecto.Repo.insert_all. Hooray!

More options around upserts

Ecto 2 added support for upserts. Ecto 3 brings many improvements to the upsert API, such as the ability to tell Ecto to :replace_all_except_primary_key in case of conflicts or to replace only certain fields by passing on_conflict: {:replace, [:foo, :bar, baz]}. This new version of Ecto also allow custom expressions to be given as :conflict_target by passing {:unsafe_fragment, "be careful with what goes here"} as a value.

There are many other improvements to the Ecto.Repo API, such as Ecto.Repo.checkout, introduced in the previous section, and the new Ecto.Repo.exists?.

Migrations

Another area in Ecto (or to be more precise, Ecto.SQL) that saw major improvements is migrations.

The most important change was a contribution by Allen Madsen that locks the migration table, allowing multiple machines to run migrations at the same time. In previous Ecto versions, if you had multiple machines attempting to run migrations, they could race each other, leading to failures, but now it is guaranteed such can’t happen. The type of lock can be configured via the :migration_lock repository configuration and defaults to “FOR UPDATE” or disabled if set to nil.

Another improvement is that Ecto is now capable of logging notices/alerts/warnings emitted by the database when running migrations. In previous Ecto versions, if you had a long index name, the database would truncate and emit an alert through the TCP connection, but this alert was never extracted and printed in the terminal. This is no longer the case in Ecto 3.0.

Similarly, Ecto will now warn if you attempt to run a migration and there is a higher version number already migrated in the database. Imagine you have been working on a feature for a long period of time and you were finally able to merge it to master. Since you started working on this feature, other features and migrations were already shipped to production. This may create an issue on deployment: in case something goes wrong when deploying this new feature and you have to rollback the database, the latest migrations by timestamp does not match the migrations that have just been executed.

By emitting warnings, we help developers and production teams alike to be aware of such pitfalls.

Summing up

We are very excited with the many improvements in Ecto 3.0. This short series of articles shares the most notable changes but there is much more. But perhaps the most important feature is that we have announced Ecto to be stable API and this is only possible due to the work that Plataformatec and the community has put into Ecto throughout the years. Enjoy!

The post A sneak peek at Ecto 3.0: performance, migrations and more first appeared on Plataformatec Blog.