{"id":4628,"date":"2015-05-21T09:00:34","date_gmt":"2015-05-21T12:00:34","guid":{"rendered":"http:\/\/blog.plataformatec.com.br\/?p=4628"},"modified":"2016-07-13T12:26:29","modified_gmt":"2016-07-13T15:26:29","slug":"introducing-reducees","status":"publish","type":"post","link":"http:\/\/blog.plataformatec.com.br\/2015\/05\/introducing-reducees\/","title":{"rendered":"Introducing reducees"},"content":{"rendered":"

Elixir provides the concept of collections, which may be in-memory data structures, as well as events, I\/O resources and more. Those collections are supported by the Enumerable protocol, which is an implementation of an abstraction we call “reducees”.<\/p>\n

In this article, we will outline the design decisions behind such abstraction, often exploring ideas from Haskell, Clojure and Scala that eventually led us to develop this new abstraction called reducees, focusing specially on the constraints and performance characteristics of the Erlang Virtual Machine.<\/p>\n

Recursion and Elixir<\/h2>\n

Elixir is a functional programming language<\/a> that runs on the Erlang VM. All the examples on this article will be written in Elixir although we will introduce the concepts bit by bit.<\/p>\n

Elixir provides linked-lists. Lists can hold many items and, with pattern matching, it is easy to extract the head (the first item) and the tail (the rest) of a list:<\/p>\n

iex> [h|t] = [1, 2, 3]\niex> h\n1\niex> t\n[2, 3]\n<\/code><\/pre>\n

An empty list won’t match the pattern [h|t]<\/code>:<\/p>\n

[h|t] = []\n** (MatchError) no match of right hand side value: []\n<\/code><\/pre>\n

Suppose we want to recurse every element in the list, multiplying each element by 2. Let’s write a double function:<\/p>\n

defmodule Recursion do\n  def double([h|t]) do\n    [h*2|double(t)]\n  end\n\n  def double([]) do\n    []\n  end\nend\n<\/code><\/pre>\n

The function above recursively traverses the list, doubling the head at each step and invoking itself with the tail. We could define a similar function if we wanted to triple every element in the list but it makes more sense to abstract our current implementation. Let’s define a function called map<\/code> that applies a given function to each element in the list:<\/p>\n

defmodule Recursion do\n  def map([h|t], fun) do\n    [fun.(h)|map(t, fun)]\n  end\n\n  def map([], _fun) do\n    []\n  end\nend\n<\/code><\/pre>\n

double<\/code> could now be defined in terms of map<\/code> as follows:<\/p>\n

def double(list) do\n  map(list, fn x -> x * 2 end)\nend\n<\/code><\/pre>\n

Manually recursing the list is straight-forward but it doesn’t really compose. Imagine we would like to implement other functional operations like filter<\/code>, reduce<\/code>, take<\/code> and so on for lists. Then we introduce sets, dictionaries, and queues into the language and we would like to provide the same operations for all of them.<\/p>\n

Instead of manually implementing all of those operations for each data structure, it is better to provide an abstraction that allows us to define those operations only once, and they will work with different data structures.<\/p>\n

That’s our next step.<\/p>\n

Introducing Iterators<\/h2>\n

The idea behind iterators is that we ask the data structure what is the next item until the data structure no longer has items to emit.<\/p>\n

Let’s implement iterators for lists. This time, we will be using Elixir documentation and doctests to detail how we expect iterators to work:<\/p>\n

defmodule Iterator do\n  @doc \"\"\"\n  Each step needs to return a tuple containing\n  the next element and a payload that will be\n  invoked the next time around.\n\n      iex> next([1, 2, 3])\n      {1, [2, 3]}\n      iex> next([2, 3])\n      {2, [3]}\n      iex> next([3])\n      {3, []}\n      iex> next([])\n      :done\n  \"\"\"\n  def next([h|t]) do\n    {h, t}\n  end\n\n  def next([]) do\n    :done\n  end\nend\n<\/code><\/pre>\n

We can implement map<\/code> on top of next<\/code>:<\/p>\n

def map(collection, fun) do\n  map_next(next(collection), fun)\nend\n\ndefp map_next({h, t}, fun) do\n  [fun.(h)|map_next(next(t), fun)]\nend\n\ndefp map_next(:done, _fun) do\n  []\nend\n<\/code><\/pre>\n

Since map<\/code> uses the next<\/code> function, as long as we implement next<\/code> for a new data structure, map<\/code> (and all future functions) should work out of the box. This brings the polymorphism we desired but it has some downsides.<\/p>\n

Besides not having ideal performance, it is quite hard to make iterators work with resources (events, I\/O, etc), leading to messy and error-prone code.<\/p>\n

The trouble with resources is that, if something goes wrong, we need to tell the resource that it should be closed. After all, we don’t want to leave file descriptors or database connections open. This means we need to extend our next<\/code> contract to introduce at least one other function called halt<\/code>.<\/p>\n

halt<\/code> should be called if the iteration is interrupted suddenly, either because we are no longer interested in the next items (for example, if someone calls take(collection, 5)<\/code> to retrieve only the first five items) or because an error happened. Let’s start with take:<\/p>\n

def take(collection, n) do\n  take_next(next(collection), n)\nend\n\n# Invoked on every step\ndefp take_next({h, t}, n) when n > 0 do\n  [h|take_next(next(t), n - 1)]\nend\n\n# If we reach this, the collection finished\ndefp take_next(:done, _n) do\n  []\nend\n\n# If we reach this, we took all we cared about before finishing\ndefp take_next(value, 0) do\n  halt(value) # Invoke halt as a \"side-effect\" for resources\n  []\nend\n<\/code><\/pre>\n

Implementing take<\/code> is somewhat straight-forward. However we also need to modify map<\/code> since every step in the user supplied function can fail. Therefore we need to make sure we call halt<\/code> on every possible step in case of failures:<\/p>\n

def map(collection, fun) do\n  map_next(next(collection), fun)\nend\n\ndefp map_next({h, t}, fun) do\n  [try do\n     fun.(h)\n   rescue\n     e ->\n       # Invoke halt as a \"side-effect\" for resources\n       # in case of failures and then re-raise\n       halt(t)\n       raise(e)\n   end|map_next(next(t), fun)]\nend\n\ndefp map_next(:done, _fun) do\n  []\nend\n<\/code><\/pre>\n

This is not elegant nor performant. Furthermore, it is very error prone. If we forget to call halt<\/code> at some particular point, we can end-up with a dangling resource that may never be closed.<\/p>\n

Introducing reducers<\/h2>\n

Not long ago, Clojure introduced the concept of reducers<\/a>.<\/p>\n

Since Elixir protocols were heavily inspired on Clojure protocols, I was very excited to see their take on collection processing. Instead of imposing a particular mechanism for traversing collections as in iterators, reducers are about sending computations to the collection so the collection applies the computation on itself. From the announcement: “the only thing that knows how to apply a function to a collection is the collection itself”.<\/p>\n

Instead of using a next<\/code> function, reducers expect a reduce<\/code> implementation. Let’s implement this reduce<\/code> function for lists:<\/p>\n

defmodule Reducer do\n  def reduce([h|t], acc, fun) do\n    reduce(t, fun.(h, acc), fun)\n  end\n\n  def reduce([], acc, _fun) do\n    acc\n  end\nend\n<\/code><\/pre>\n

With reduce, we can easily calculate the sum of a collection:<\/p>\n

def sum(collection) do\n  reduce(collection, 0, fn x, acc -> x + acc end)\nend\n<\/code><\/pre>\n

We can also implement map in terms of reduce. The list, however, will be reversed at the end, requiring us to reverse<\/code> it back:<\/p>\n

def map(collection, fun) do\n  reversed = reduce(collection, [], fn x, acc -> [fun.(x)|acc] end)\n  # Call Erlang reverse (implemented in C for performance)\n  :lists.reverse(reversed)\nend\n<\/code><\/pre>\n

Reducers provide many advantages:<\/p>\n