{"id":8072,"date":"2018-12-03T14:07:11","date_gmt":"2018-12-03T16:07:11","guid":{"rendered":"http:\/\/blog.plataformatec.com.br\/?p=8072"},"modified":"2019-01-15T15:41:55","modified_gmt":"2019-01-15T17:41:55","slug":"building-a-new-mysql-adapter-for-ecto-part-ii-encoding-decoding","status":"publish","type":"post","link":"https:\/\/blog.plataformatec.com.br\/2018\/12\/building-a-new-mysql-adapter-for-ecto-part-ii-encoding-decoding\/","title":{"rendered":"Building a new MySQL adapter for Ecto, Part II: Encoding\/Decoding"},"content":{"rendered":"
Welcome to the “Building a new MySQL adapter for Ecto” series:<\/p>\n
Last time we briefly looked at encoding and decoding data over MySQL wire protocol. In this article we’ll dive deeper into that topic, let’s get started!<\/p>\n
MySQL protocol has two “Basic Data Types<\/a>“: integers and strings. Within integers we have fixed-length and length-encoded integers. To recap, MySQL is using little endianess when encoding\/decoding integers as binaries. Let’s define a function that takes an We can generalize this function to accept any fixed-length integer:<\/p>\n (See Decoding a length-encoded integer<\/a> is slightly more complicated. Can we generalize this function to a single binary pattern match, the same way we did with It’s a shame that MySQL doesn’t encode the size<\/em> of the binary in the first byte because otherwise our decode function could be easily implemented in a single binary pattern match, e.g.:<\/p>\n In fact, it’s common for protocols to encode data as Type-Length-Value (TLV)<\/a> which as you can see above, it’s very easy to implement with Elixir.<\/p>\n In any case, we can still leverage binary pattern matching in the function head. Here’s our final There’s one last thing that we can do. Because Because it’s a macro we need to A really nice thing about using a macro here is we get encoding for free:<\/p>\n We could write a macro for encoding length-encoded integers (we could even invoke it as Encoding\/decoding MySQL strings<\/a> is very similar so we will not be going over that and we’ll jump into the next section on bit flags. (Sure enough, working with strings would be easy, even in binary pattern matches, if not for an EOF-terminated MySQL provides “Capability Flags”<\/a> like:<\/p>\n The idea is we represent a set of capabilities as a single integer on which we can use We definitely don’t want to pass these “magic” bytes around so we should encapsulate them somehow. This is a very useful error message, we can see what are all available capabilities. If we want something more customized, all we need to do is define an additional catch-all clause at the end:<\/p>\n and raise an error there. That way we could, for example, implement a “Did you mean?” hint.<\/p>\n Last but not least, instead of manually defining each function head by hand, we can use Elixir meta-programming capabilities to define them at compile time:<\/p>\n Finally, let’s bring this all together to handle packets. We need a data structure that’s going to store packet fields and we basically have two options: structs<\/a> and records<\/a>. Structs are great when data has to be sent between modules, especially because they are polymorphic. However, when the data belongs to a single module, or separate modules that are considered private API, using records may make more sense as they are more space efficient. Let’s verify that using As you can see, as we add more keys to the map our data structure grows twice as fast and the reason is we store both keys and values whereas tuple stores the size of the tuple once and then just values. The final packet we discussed in the last article was the And let’s test this with the OK packet we got at the end of the last article ( It works!<\/p>\n In this article, we discussed encoding and decoding basic data types, handling bit flags, and finally using both of these ideas to decode packets. Using these tools we should be able to fully implement MySQL protocol specification and with examples of <\/a><\/p>\n","protected":false},"excerpt":{"rendered":" Welcome to the “Building a new MySQL adapter for Ecto” series: Part I: Hello World Part II: Encoding\/Decoding (you’re here!) Part III: DBConnection Integration Part IV: Ecto Integration Last time we briefly looked at encoding and decoding data over MySQL wire protocol. In this article we’ll dive deeper into that topic, let’s get started! Basic … \u00bb<\/a><\/p>\n","protected":false},"author":70,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"ngg_post_thumbnail":0,"footnotes":""},"categories":[1],"tags":[238,143],"aioseo_notices":[],"jetpack_sharing_enabled":true,"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/blog.plataformatec.com.br\/wp-json\/wp\/v2\/posts\/8072"}],"collection":[{"href":"https:\/\/blog.plataformatec.com.br\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.plataformatec.com.br\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.plataformatec.com.br\/wp-json\/wp\/v2\/users\/70"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.plataformatec.com.br\/wp-json\/wp\/v2\/comments?post=8072"}],"version-history":[{"count":31,"href":"https:\/\/blog.plataformatec.com.br\/wp-json\/wp\/v2\/posts\/8072\/revisions"}],"predecessor-version":[{"id":8334,"href":"https:\/\/blog.plataformatec.com.br\/wp-json\/wp\/v2\/posts\/8072\/revisions\/8334"}],"wp:attachment":[{"href":"https:\/\/blog.plataformatec.com.br\/wp-json\/wp\/v2\/media?parent=8072"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.plataformatec.com.br\/wp-json\/wp\/v2\/categories?post=8072"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.plataformatec.com.br\/wp-json\/wp\/v2\/tags?post=8072"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}
\nThe simplest type is int<1><\/code> which is an integer stored in 1 byte.<\/p>\n
int<1><\/code> from the given binary and returns the rest of the binary:<\/p>\n
defmodule MyXQL.Types do\n def take_int1(data) do\n <<value::8-little-integer, rest::binary>> = data\n {value, rest}\n end\nend\n<\/code><\/pre>\n
iex> MyXQL.Types.take_int1(<<1, 2, 3>>)\n{1, <<2, 3>>}\n<\/code><\/pre>\n
def take_fixed_length_integer(data, size) do\n <<value::little-integer-size(size)-unit(8), rest::binary>> = data\n {value, rest}\nend\n<\/code><\/pre>\n
iex> MyXQL.Types.take_fixed_length_integer(<<1, 2, 3>>, 2)\n{513, <<3>>}\n<\/code><\/pre>\n
<<>>\/1<\/code><\/a> for more information on bitstrings.)<\/p>\n
\nBasically, if the first byte value is less than 251<\/code>, then it’s a 1-byte integer; if the first-byte is
0xFC<\/code>, then it’s a 2-byte integer and so on up to a 8-byte integer:<\/p>\n
def take_length_encoded_int1(<<int::8-little-integer, rest::binary>>) when int < 251, do: {int, rest}\n\ndef take_length_encoded_int2(<<0xFC, int::16-little-integer, rest::binary>>), do: {int, rest}\n\ndef take_length_encoded_int3(<<0xFD, int::24-little-integer, rest::binary>>), do: {int, rest}\n\ndef take_length_encoded_int8(<<0xFE, int::64-little-integer, rest::binary>>), do: {int, rest}\n<\/code><\/pre>\n
iex> MyXQL.Types.take_length_encoded_int1(<<1, 2, 3>>)\n{1, <<2, 3>>}\n\niex> MyXQL.Types.take_length_encoded_int2(<<0xFC, 1, 2, 3>>)\n{513, <<3>>}\n<\/code><\/pre>\n
take_fixed_length_integer\/2<\/code>? Unfortunately we can’t. Our logic is essentially a
case<\/code> with 4 clauses and such cannot be used in pattern matches.
\nFor this reason, the way we decode data is by reading some bytes, decoding them, and returning the rest of the binary.<\/p>\niex> <<size::8, value::little-integer-size(size)-unit(8), rest::binary>> = <<2, 1, 2, 3>>\niex> {value, rest}\n{513, <<3>>}\n<\/code><\/pre>\n
take_length_encoded_integer\/1<\/code> function:<\/p>\n
def take_length_encoded_integer(<<int::8, rest::binary>>) when int < 251, do: {int, rest}\ndef take_length_encoded_integer(<<0xFC, int::int(2), rest::binary>>), do: {int, rest}\ndef take_length_encoded_integer(<<0xFD, int::int(3), rest::binary>>), do: {int, rest}\ndef take_length_encoded_integer(<<0xFE, int::int(8), rest::binary>>), do: {int, rest}\n<\/code><\/pre>\n
take_fixed_length_integer\/2<\/code> is so simple and basically uses a single binary pattern match (in particular, it does not<\/em> have a
case<\/code> statement), we can replace it with a macro instead. All we need to do is to emit
little-integer-size(size)-unit(8)<\/code> AST so that we can use it in a bitstring; that’s easy:<\/p>\n
defmacro int(size) do\n quote do\n little-integer-size(unquote(size))-unit(8)\n end\nend\n<\/code><\/pre>\n
require<\/code> or
import<\/code> it to use it:<\/p>\n
iex> import MyXQL.Types\n\niex> <<value::int(1), rest::binary>> = <<1, 2, 3>>\niex> {value, rest}\n{1, <<2, 3>>}\n\niex> <<value::int(2), rest::binary>> = <<1, 2, 3>>\niex> {value, rest}\n{513, <<3>>}\n<\/code><\/pre>\n
iex> <<513::int(2)>>\n<<1, 2>>\n<\/code><\/pre>\n
513::int(lenenc)<\/code> to mimic the spec, by adjusting
int\/1<\/code> macro) but I decided against it as it won’t be usable in a binary pattern match.<\/p>\n
string<eof><\/code> and
string<lenenc><\/code> types.)<\/p>\n
Bit Flags<\/h2>\n
CLIENT_PROTOCOL_41 0x00000200\nCLIENT_SECURE_CONNECTION 0x00008000\nCLIENT_PLUGIN_AUTH 0x00080000\n<\/code><\/pre>\n
Bitwise<\/code> operations like:
0x00000200 ||| 0x00008000<\/code>,
flags &&& 0x00080000<\/code> etc.<\/p>\n
\nWe could store them as module attributes, e.g.: @client_protocol_41 0x00000200<\/code>; if we mistype the name of the flag, we’ll get a helpful compiler warning. Using functions, however, gives us a bit more flexibility as we can generate great error messages as well as “hide” usage of bitwise operations underneath.
\nLet’s implement a function that checks whether given flags<\/code> has a given capability:<\/p>\n
defmodule MyXQL.Messages do\n use Bitwise\n\n def has_capability_flag?(flags, :client_protocol_41), do: (flags &&& 0x00000200) == 0x00000200\n def has_capability_flag?(flags, :client_secure_connection), do: (flags &&& 0x00008000) == 0x00008000\n def has_capability_flag?(flags, :client_plugin_auth), do: (flags &&& 0x00080000) == 0x00080000\n # ...\nend\n<\/code><\/pre>\n
iex> MyXQL.Messages.has_capability_flag?(0, :client_protocol_41)\nfalse\niex> MyXQL.Messages.has_capability_flag?(0x00000200, :client_protocol_41)\ntrue\n\niex> MyXQL.Messages.has_capability_flag?(0x00000200, :bad)\n** (FunctionClauseError) no function clause matching in MyXQL.Messages.has_capability_flag?\/2\n\n The following arguments were given to MyXQL.Messages.has_capability_flag?\/2:\n\n # 1\n 512\n\n # 2\n :bad\n\n Attempted function clauses (showing 3 out of 3):\n\n def has_capability_flag?(flags, :client_protocol_41)\n def has_capability_flag?(flags, :client_secure_connection)\n def has_capability_flag?(flags, :client_plugin_auth)\n<\/code><\/pre>\n
def has_capability_flag?(flags, other) do\n raise ...\nend\n<\/code><\/pre>\n
capability_flags = [\n client_protocol_41: 0x00000200,\n client_secure_connection: 0x00008000,\n client_plugin_auth: 0x00080000,\n]\n\nfor {name, value} <- capability_flags do\n def has_capability_flag?(flags, unquote(name)), do: (flags &&& unquote(value)) == unquote(value)\nend\n<\/code><\/pre>\n
Packets<\/h2>\n
:erts_debug<\/code> module and instead of comparing structs and records let’s just compare their internal representations: maps and tuples, respectively:<\/p>\n
iex> :erts_debug.size(%{x: 1})\n6\niex> :erts_debug.size(%{x: 1, y: 2})\n8\niex> :erts_debug.size(%{x: 1, y: 2, z: 3})\n10\n<\/code><\/pre>\n
iex> :erts_debug.size({:Point, 1})\n3\niex> :erts_debug.size({:Point, 1, 2})\n4\niex> :erts_debug.size({:Point, 1, 2, 3})\n5\n<\/code><\/pre>\n
\nSince we may be processing thousands of packets per second, this difference may add up, so we’re going to use records here.<\/p>\nOK Packet<\/code>. Let’s now write a function to decode it (it’s not fully following the spec for brevity):<\/p>\n
# https:\/\/dev.mysql.com\/doc\/internals\/en\/packet-OK_Packet.html\ndefrecord :ok_packet, [:affected_rows, :last_insert_id, :status_flags, :warning_count]\n\ndef decode_ok_packet(data, capability_flags) do\n <<0x00, rest::binary>> = data\n\n {affected_rows, rest} = take_length_encoded_integer(rest)\n {last_insert_id, rest} = take_length_encoded_integer(rest)\n\n packet = ok_packet(\n affected_rows: affected_rows,\n last_insert_id: last_insert_id\n )\n\n if has_capability_flag?(capability_flags, :client_protocol_41) do\n <<\n status_flags::int(2),\n warning_count::int(2)\n >> = rest\n\n ok_packet(packet,\n status_flags: status_flags,\n warning_count: warning_count\n )\n else\n packet\n end\nend\n<\/code><\/pre>\n
00 00 00 02 00 00 00<\/code>):<\/p>\n
iex> ok_packet(affected_rows: affected_rows) = decode_ok_packet(<<0x00, 0x00, 0x00, 0x02, 0x00, 0x00, 0x00>>, 0x00000200)\niex> affected_rows\n0\n<\/code><\/pre>\n
Conclusion<\/h2>\n
:gen_tcp.send\/2<\/code><\/a> and
:gen_tcp.recv\/2<\/code><\/a> calls from Part I, we could interact with the server. However, that’s not enough to build a resilient and production-quality driver. For that, we’ll look into
DBConnection<\/code><\/a> integration in Part III. Stay tuned!<\/p>\n