Plataformatec Blog http://blog.plataformatec.com.br Plataformatec's place to talk about Ruby, Ruby on Rails and software engineering Wed, 22 Jul 2015 12:00:46 +0000 en-US hourly 1 http://wordpress.org/?v=4.2.3 Coding can make you a better project manager http://blog.plataformatec.com.br/2015/07/coding-can-make-you-a-better-project-manager/ http://blog.plataformatec.com.br/2015/07/coding-can-make-you-a-better-project-manager/#comments Wed, 22 Jul 2015 12:00:46 +0000 http://blog.plataformatec.com.br/?p=4836 »]]> As a Scrum Master, Project Manager or any other project management role you can have, removing impediments is part of your daily routine. They come in many forms and sizes, ranging from organizational to human ones.

But there are also technical impediments, those you annoy ask your beloved developer teammates for help if you don’t usually code.

The truth is that big part of those roadblocks can be tackled with a single line of code. Yes, you can work on low hanging fruits and have a lot of fun!

Which technical impediments can you work on?

Recently I had the opportunity to code a little bit to help the team. I can suggest the following topics, where you can achieve positive outcomes with very little effort:

Outdated project readme files

It’s just text, but it’s a good opportunity to hone your GitHub and markdown skills without breaking stuff.

Besides helping a new member on his onboarding process, it will also spread the message that it’s important to keep it up to date.

Outdated Boxen manifests

Some of our folks use Boxen to set up their environments. We often forget to update the project’s manifest. One more chance to practice. Go for it, your teammates will be grateful when they perform another clean OSX install!

Front-end fixes (color, text, positioning, size, UI behavior)

They’re small, they’re easy to fix and may bore a developer to death. But to you it’s all new and rewarding. Those represent the majority of the commits I’ve done.

As you become more experienced and confident you can expand into other areas, it’s just a matter of curiosity and opportunity.

What are the benefits?

Between one commit and another I noticed some advantages. Below are some that might also work for you:

Codebase knowledge

As you perform fixes, you get in touch with different parts of the code. It will naturally increase your awareness of where things come from and where they belong. From now on when you hear about a bug you will probably track its origin with ease.

This also makes grooming sessions more productive, since you are more familiar with the effects a new feature will bring to the application, improving your negotiation superpowers.

Better communication and relationship

Good communication saves a tremendous amount of time and energy. When you deal with code, you learn the technical vocabulary. In your next conversation with the team, people won’t need to translate jargon so you can understand. The same applies in the opposite direction.

In my opinion, a manager that can code also increases the sense of fellowship, improving the overall relationship among team members.

The feeling of shipping code

My first commit made some checkboxes to be selected by default. Nothing big, I know, but when I saw the customer using it I realized that shipping code feels amazing!

This brings the sense of belonging and increases shared ownership. You should give it a try!

Working with Pull Requests

At my very first day at Plataformatec, I realized that all teams were using Pull Requests. Shortly after, I figured out why we were using this technique. But I really understood it when I first submitted one of my own. Reading code and discussions added value to me, because I could see what was going on without interrupting people. Discussions also acted as a thermometer, revealing some insights about team relationship.

Learning how to write tests

Your code will break some tests (believe me, it will). Panic? Never! Go on and learn how to write and run automated tests. It’s not that hard. Ask your friends, they will be happy to teach you.
Learning about tests will help your communication with developers and reinforce quality culture as well.

Waste reduction

Waste is a fierce enemy and every chance to beat it must be taken. Coding can give you some opportunities to do so.

Every little fix you deal with is one less interruption and context switch for a developer. Every conversation you join in a Pull Request is a shorter feedback cycle being nurtured and less time being spent in queues.

These and other factors combined can positively impact the team’s performance.

Where to start?

I recommend following these steps:

Learning git and Pull Request basics

Here in Plataformatec we use GitHub, so finding documentation about Pull Requests wasn’t difficult. If you use it as well, you can browse Github’s Help and even use our Guidelines as a reference. If you don’t, I’m sure you can find plenty of content about your version control tool with a simple Google search.

Practicing in a controlled environment

Look for a repository that you can commit without spreading panic if you mess things up. For instance, I have practiced in our Campfire bot. If you don’t find any appropriate repository, you can fork a public project on GitHub and use it as your sandbox.

In this phase my colleagues helped me a lot crafting my first Pull Request, I bet yours will do the same. Don’t be shy and ask them for help!

Looking for and working on impediments

You have just developed a new skill, now it’s time to use it! Look for things similar to those examples I suggested and invest a little time in them. A couple of fixes later you will be able to work more independently, and the results will start to come.

Is there any pitfalls in this practice?

If you do things in a careful and balanced way, they will happen just right. But there are some traps you should avoid:

Creating dependencies

Remember that you are coding to help removing obstacles. If you turn yourself into the person responsible for maintenance, you have just become a bottleneck for the system (or an irresponsible developer, as you prefer).

Forcing people to code

Despite the several advantages mentioned, not every manager likes to code. Don’t force people into doing it.

Micromanaging

Joining Pull Request discussions brings great value. Micromanaging doesn’t. Don’t even think of using Pull Request discussions as a control tool. Trust your team!

I hope this blog post can help you and your team to improve. Have you ever tried something like this? What are you thoughts about it? Share them with us in the comments below!

Subscribe to our blog
]]>
http://blog.plataformatec.com.br/2015/07/coding-can-make-you-a-better-project-manager/feed/ 2
Elixir in times of microservices http://blog.plataformatec.com.br/2015/06/elixir-in-times-of-microservices/ http://blog.plataformatec.com.br/2015/06/elixir-in-times-of-microservices/#comments Tue, 30 Jun 2015 12:00:52 +0000 http://blog.plataformatec.com.br/?p=4787 »]]> Since microservices have been a common topic lately, there are a lot of questions about how Elixir fits in microservice architectures.

On this post, I won’t focus on the merits of microservices, as many have already discussed that to exhaustion. In particular, Martin Fowler’s entry on the topic aligns well with my thoughts and is definitely worth reading.

It is also worth mentioning that I have seen many companies pursuing microservices because they fail to organize large applications as their tooling (languages and frameworks) do not provide good abstractions to manage those. So microservices are often seen as a solution for structuring code by imposing the separation of concerns from top to bottom. Unfortunately, prematurely adopting microservices often negatively impacts the team’s productivity. Therefore, it is also important for languages and frameworks to provide proper abstractions for handling code complexity as the codebase grows.

Background

Elixir is a concurrent and distributed programming language that runs on the Erlang Virtual Machine (Erlang VM) focusing on productivity and maintainability. Before we go into microservices, I would like first to argue why Elixir/Erlang may be the best platform out there for developing distributed systems (regardless if you have a microservices architecture or not).

The Erlang VM and its standard library were designed by Ericsson in the 80’s for building distributed telecommunication systems. The decisions they have done in the past continue to be relevant to this day and we will explore why. As far as I know, Erlang is the only runtime and Virtual Machine used widely in production designed upfront for distributed systems.

Applications

The Elixir runtime has the notion of applications. In Elixir, code is packaged inside applications which:

  1. are started and stopped as a unit. Any Elixir node (Virtual Machine instance) runs a series of applications, with Elixir itself being one of them and starting (and stopping) your system is a matter of starting all applications in it

  2. provide a unified directory structure and configuration API. If you have worked with one application, you know the structure and how to configure any other one

  3. contains your application supervision tree, with all processes and their state

Throughout this post, processes mean a lightweight thread of execution managed by the Erlang VM. They are cheap to create, isolated and exchange information via messages.

The impact of applications in a project is highly beneficial. It means that Elixir developers, when writing applications, are given a more explicit approach to:

  1. how their code is started and stopped, as it is contained and isolated inside each application

  2. what are the processes that make part of an application and therefore what is the application state. If you can introspect your application tree, you can introspect any process and, therefore, all the components that make up your application

  3. how the application processes will react and be affected in case of crashes or when something goes wrong

Not only that, the tooling around applications is great. If you have Elixir installed, open up “iex” and type: :observer.start(). Besides showing information and graphs about your live node, you can kill random processes, see their memory usage, state and more. Here is an example of running this in a Phoenix application:

Observer running within a Phoenix application

You can see all applications that are part of this node on the left side. Clicking an application shows the processes tree of that application. Double-clicking any process opens up a window with information about the process, which function it is executing, its particular state and more. Furthermore, you can kill any process and see the impact it will have on your system.

When compared to other languages, the difference here is that Applications and Processes give you an abstraction to reason about your system in production. Many languages provide packages, objects and modules mostly for code organization with no reflection on the runtime system. If you have a class attribute or a singleton object: how can you reason about the code that manipulates it? If you have a memory leak or a bottleneck, how can you find the entity responsible for it?

This visibility is one of the major benefits of building systems in Elixir. If you ask anyone running a distributed system, that’s the kind of insight they want, and with Elixir you have that as the building block.

Communication

When building a distributed system, you need to choose a communication protocol and how the data will be serialized. Although there are many options out there, unfortunately a lot of developers choose HTTP and JSON, which is a very verbose and expensive combination for performing what ends up becoming RPC calls.

With Elixir, you already have a communication protocol and a serialization mechanism out of the box via Distributed Erlang. If you want to have two nodes communicating with each other, you only need to give them names, ensure they share the same secret cookie, and you are done.

Not only that, because all Elixir processes communicate with each other via message passing, the runtime provides a feature called location transparency. This means it doesn’t really matter if two processes are in the same node or in different ones, they are still able to exchange messages.

I wrote a quick introduction to Elixir that covers how to get started with Elixir from creating a brand new project up to node communication on How I Start. Check it out for more information.

The Distributed Erlang protocol and serialization mechanism are also documented and therefore it can be used to communicate with other languages. The Erlang VM ships with a binding for Java and others can be found for Python, Go and more.

Breaking monolithic applications

Earlier I mentioned I have seen many companies pursuing microservices because they fail to organize code at the project level. So often they prematurely split their architecture in microservices which affects productivity in the short and long run. From Martin Fowler’s article:

Complexity and Productivity of monoliths and microservices

In Elixir, breaking a large application into smaller ones is simpler than anywhere else, as the process tree already outlines dependencies and the communication between those dependencies always happen explicitly via message passing. For example, imagine you have an application called abc that has grown larger with time, you can break it apart into applications a, b and c by extracting its supervision tree to different applications.

This is such a common aspect of working with Elixir projects that its build tool, called Mix, provides a feature called umbrella projects where you have a project composed of many applications that may depend on each other on any fashion.

Umbrella projects allows you to compile, test and run each application as a unit but also perform all the tasks at once if required. Here is quick example:

$ mix new abc --umbrella
$ cd abc/apps
$ mix new a
$ mix new b --sup
$ mix new c --sup

The snippet above creates a new umbrella project, enters its apps directory and create three applications: a, b and c, where the last two contain a supervision tree. If you run mix test at the abc project root, it will compile and test all projects, but you can still go inside each application and work with it in isolation.

Once the main application abc is broken apart, you may also move each part to a separate repository if desired, or you may not. The benefit is that developers are able to handle growing code complexity in small, granular steps, without making large decisions upfront. We cover this with more details in our Mix and OTP guide.

Microservices

So far I haven’t talked about microservices. That’s because, up to this point, they don’t really matter. You are already designing your system around tiny processes that are isolated and distributed. Call them nanoservices if you’d like to!

Not only that, those processes are packaged into applications, which group them as entities that can be started and stopped as unit. Combined with Distributed Erlang, if you want to deploy your a, b and c applications as [a, b] + [c] or [a] + [b] + [c], you will have very little trouble in doing so due to their inherent design and built-in communication.

In other words, you can focus on how to deploy your applications based on what is driving you to break them apart. Is it code complexity? You can work on them separately but still deploy them as a unit. Is it for scability or multi-tenancy reasons? If c requires more instances or the application with user specific concern, then it is reasonable to isolate it and deploy multiple instances of c.

Is Elixir good only for building distributed systems?

If you are not familiar with Elixir, after reading this far, you may be wondering: is Elixir good only for building distributed systems?

Elixir is excellent for building any kind of long running system exactly because of all the insights you have on your application, even if it is deployed to a single node. The language is also expressive and pleasant to learn and work with (I am certainly biased though), with a getting started guide and many learning resources already available.

While there is a learning curve, the abstractions outlined here are elegant and simple, and the tooling does an excellent job on guiding you to build your first application. The command mix new my_app --sup we have executed above will generate an application, with its own process tree, which you can directly use and explore to learn more.

Wrapping up

I hope I have illustrated how the design decisions done by Elixir and the Erlang VM provide a great foundation for building distributed systems.

It is also very exciting to see companies starting to enjoy and explore those characteristics through the Elixir programming language. In particular, it is worth watching Jamie Windsor talk at Erlang Factory 2015 and how they were able to leverage this to build a game platform.

Finally, a lot of this post focuses on building systems through Distributed Erlang. Although Distributed Erlang will definitely be the most productive approach, there is no reason why you can’t leverage the benefits outlined here by using Elixir with another protocol like Apache Thrift.

And, if at the end of the day, all you want is to use HTTP and JSON, that is fine too and libraries like Plug and frameworks like Phoenix will guarantee you are as productive as anywhere else while enjoying the performance characteristics and robustness of the abstractions outlined here.

Happy coding!

Note I have not covered techniques like blue/green and canary deployments because they depend on the system and communication protocol you are running. Elixir provides conveniences for process grouping and global processes (shared between nodes) but you can still use external libraries like Consul or Zookeeper for service discovery or rely on HAProxy for load balancing for the HTTP based frontends.


Subscribe to Elixir Radar

]]>
http://blog.plataformatec.com.br/2015/06/elixir-in-times-of-microservices/feed/ 7
Adding MySQL support to Ecto http://blog.plataformatec.com.br/2015/06/adding-mysql-support-to-ecto/ http://blog.plataformatec.com.br/2015/06/adding-mysql-support-to-ecto/#comments Wed, 10 Jun 2015 12:00:46 +0000 http://blog.plataformatec.com.br/?p=4766 »]]> A few months ago, I started learning Elixir by reading and doing the exercises from Dave Thomas’ book ‘Programming Elixir’. I was having a lot of fun on it, but I was missing a real project to put my hands on.

At that time, José Valim was in Brazil and I had the chance of working with him for a whole day. In the first hour, he told me that there were some low hanging fruits on Elixir’s codebase and then he gave me an introduction to where I should focus. I was able to send two Pull Requests to Elixir’s codebase to add timeouts to tests and set timeout to infinity in trace mode. The result was great and I learned a lot. By the end of that day, he mentioned that Ecto was needing a MySQL adapter because it only had support for PostgreSQL. I didn’t think twice and told him I was going to work on it.

In this blog post, I will share what I learned while making the MySQL adapter for Ecto. Let’s start from the beginning.

Ecto is a database wrapper and language integrated query for Elixir. It supports many features, like migrations, associations and more.

Migrations on Ecto look like this:

defmodule MyApp.Repo.Migrations.CreateLists do
  use Ecto.Migration

  def change do
    create table(:lists) do
      add :name, :string

      timestamps
    end
  end
end

and associations:

defmodule MyApp.List do
  use Ecto.Model

  schema "lists" do
    has_many :tasks, Task

    field :name, :string
    timestamps
  end
end

defmodule MyApp.Task do
  use Ecto.Model

  schema "tasks" do
    belongs_to :list, List

    field :description, :string
    timestamps
  end
end

this is how queries work:

select = from l in MyApp.List,
          where: l.id = 10,
          select: l.name

update = from(l in MyApp.List, where: l.id == 1)

After you define your models and your queries, you can run them against a repository:

# The examples here are using the query defined above

MyApp.Repo.all(select)
MyApp.Repo.update_all(update)

Cool, right? But how to make all these features run on MySQL too?

Well, I didn’t start from nothing. I started from an open Pull Request that had some initial work. It was setting up a database connection using the MySQL driver and creating/dropping MySQL databases from the command line.

We had to make the connection work and support the basic DDL (Data Definition Language) statements, such as CREATE tables and indexes, DROP and so on. It was quite easy since Ecto has a well-defined interface that must be implemented by the adapters.

After the initial setup, I got the integration tests to run, but I got stuck with prepared statements. It took some time until I found that Mariaex (the MySQL driver) didn’t have support for it, so I focused on porting the queries to MySQL module and its unit tests until prepared statements got supported.

The unit tests were all passing, but they didn’t cover complex situations, which were the responsibility of integration tests. When the basic support for prepared statements was added, I went back to the integration tests and that’s where things started getting complicated.

First problem found was that MySQL doesn’t support transactions on DDL operations and since Ecto runs the migrations inside a transaction, it was not working. To solve this particular problem, it was needed to check if the current adapter supports DDL transactions and only run migrations inside a transaction if the current adapter supports it.

All done with the DDL check, it was time to get back to the other failing cases when I got stuck with another hard problem. The migration tests were still failing and I couldn’t find where the problem was. After some time looking into it (pairing with José) we discovered that the problem was the driver that was not encoding the parameters.

The prepared statements only had support for basic types (Integer, String and Binary), so we had to add support for Date, Datetime, Timestamp, Decimal, Float, Double and Boolean. It was really hard to make it work, since I never had worked with binary protocols before and the MySQL documentation for the binary protocol isn’t complete.

I was making progress but the Ecto development didn’t stop. I had to port some features while trying to make the integration tests pass and one of those changes required more work on Mariaex, which was the change to add support for microseconds.

Today the MySQL adapter is working and you can use it on Ecto, but, as you can see, I spent more time adding features to the MySQL driver than working on Ecto itself. It was a great experience and I could learn a lot about how MySQL actually works.

If you want more detail about the implementation itself, you can check my Pull Request and you will see a detailed todo list with all the steps that were required. I also would like to thanks all the people involved in this work: José Valim, Joe Quadrino and Dmitry Aleksandrov.


Subscribe to Elixir Radar

]]>
http://blog.plataformatec.com.br/2015/06/adding-mysql-support-to-ecto/feed/ 0
Elixir in production interview: Garth Hitchens http://blog.plataformatec.com.br/2015/06/elixir-in-production-interview-garth-hitches/ http://blog.plataformatec.com.br/2015/06/elixir-in-production-interview-garth-hitches/#comments Wed, 03 Jun 2015 14:04:30 +0000 http://blog.plataformatec.com.br/?p=4550 »]]> Elixir running in an embedded device

Elixir running in an embedded device

A few weeks ago we had the opportunity to interview Garth Hitchens about his experience with shipping Elixir software.

Garth manages a development team at Rose Point Navigation Systems. They’re using Elixir to develop embedded software for navigation devices for marine markets.

Watch the video below or read the interview to get to know his experiences with Elixir.

José Valim: I’m here with Garth at Erlang Factory. Garth, can you tell us a little bit about what do you do?

Garth Hitchens: I manage a development team at a company called Rose Point Navigation Systems and we build navigation systems for both recreational and commercial marine markets, and we are also building embedded devices. We’ve been using Elixir for our embedded device’s projects and absolutely loving it.

José Valim: Oh, great! So, are you already shipping Elixir software?

Garth Hitchens: We are shipping devices that actually run Elixir code. I have one of them here. This is an interface device that basically converts ethernet to other protocols that are common on boats (…). This runs Elixir, it boots in a few seconds and we’ve been absolutely in love with the platform, it’s been great for us.

José Valim: Awesome! It’s very exciting to see Elixir in there and the Erlang virtual machine and everything else, that’s really cool! And, what is the thing that you are most excited about Elixir?

Garth Hitchens: I think the language is particularly elegant, I get to give you credit there. I came to Elixir from a long history of programming languages including C, C++, Ruby, Python, Lisp, so I love the work macros and homoiconicity that you have done. And I like the way that it gives me powerful abstractions to rapidly develop things. Plus, the Erlang VM has been rock solid. We’re using Frank Hunleth’s Nerves package to embed Erlang and Elixir on these products and it works really well, and I highly recommend to anybody.

José Valim: Awesome. Yeah, I was just at Frank’s talk, it was an exciting talk. And I’m planning to get some SumoBots and start to play with them.

Garth Hitchens: Yeah! It’s good stuff.

Subscriber to our blog to receive the next “Elixir in production” interviews.


Subscribe to Elixir Radar

]]>
http://blog.plataformatec.com.br/2015/06/elixir-in-production-interview-garth-hitches/feed/ 0
Companies using Elixir in production http://blog.plataformatec.com.br/2015/06/companies-using-elixir-in-production/ http://blog.plataformatec.com.br/2015/06/companies-using-elixir-in-production/#comments Wed, 03 Jun 2015 14:03:17 +0000 http://blog.plataformatec.com.br/?p=4543 »]]> One of the common questions we usually hear from people starting their journey in Elixir is: “who else is using it?”. We ourselves know that more and more companies are already using Elixir in production, but that information is not easily accessible… yet.

A few weeks ago we tweeted:

That tweet got more than 100 interactions between RTs and favorites. The teaser video linked by that tweet got more than 1700 views and t is already the most viewed video on our Youtube channel. That’s a clear hint that the community is interested in learning more about such cases. So, we’re continuing with our project of documenting cases of Elixir in production.

The first thing we did was to interview people at Erlang Factory that are using Elixir in production. We got some awesome cases there, ranging from web development to API backends to embedded devices. We already have one of those interviews published and we plan to publish the other ones in the following weeks. Subscribe to our blog to receive the next interviews.

Besides those interviews, we’re planning to do more in-depth case studies with companies that are using Elixir in an interesting way and share them with the community. We’re already reaching some of them, but if you have a nice use case of Elixir or know someone who does, please reach us at “hugo.barauna@plataformatec.com.br”.

Stay tuned for more Elixir news!


Subscribe to Elixir Radar

]]>
http://blog.plataformatec.com.br/2015/06/companies-using-elixir-in-production/feed/ 0
Introducing reducees http://blog.plataformatec.com.br/2015/05/introducing-reducees/ http://blog.plataformatec.com.br/2015/05/introducing-reducees/#comments Thu, 21 May 2015 12:00:34 +0000 http://blog.plataformatec.com.br/?p=4628 »]]> Elixir provides the concept of collections, which may be in-memory data structures, as well as events, I/O resources and more. Those collections are supported by the Enumerable protocol, which is an implementation of an abstraction we call “reducees”.

In this article, we will outline the design decisions behind such abstraction, often exploring ideas from Haskell, Clojure and Scala that eventually led us to develop this new abstraction called reducees, focusing specially on the constraints and performance characteristics of the Erlang Virtual Machine.

At the end, there is a link to a talk I have recently given at Elixirconf EU 2015, the first Elixir conference in Europe, that explores the next steps and how we plan to introduce asynchrony into the standard library.

Recursion and Elixir

Elixir is a functional programming language that runs on the Erlang VM. All the examples on this article will be written in Elixir although we will introduce the concepts bit by bit.

Elixir provides linked-lists. Lists can hold many items and, with pattern matching, it is easy to extract the head (the first item) and the tail (the rest) of a list:

iex> [h|t] = [1, 2, 3]
iex> h
1
iex> t
[2, 3]

An empty list won’t match the pattern [h|t]:

[h|t] = []
** (MatchError) no match of right hand side value: []

Suppose we want to recurse every element in the list, multiplying each element by 2. Let’s write a double function:

defmodule Recursion do
  def double([h|t]) do
    [h*2|double(t)]
  end

  def double([]) do
    []
  end
end

The function above recursively traverses the list, doubling the head at each step and invoking itself with the tail. We could define a similar function if we wanted to triple every element in the list but it makes more sense to abstract our current implementation. Let’s define a function called map that applies a given function to each element in the list:

defmodule Recursion do
  def map([h|t], fun) do
    [fun.(h)|map(t, fun)]
  end

  def map([], _fun) do
    []
  end
end

double could now be defined in terms of map as follows:

def double(list) do
  map(list, fn x -> x * 2 end)
end

Manually recursing the list is straight-forward but it doesn’t really compose. Imagine we would like to implement other functional operations like filter, reduce, take and so on for lists. Then we introduce sets, dictionaries, and queues into the language and we would like to provide the same operations for all of them.

Instead of manually implementing all of those operations for each data structure, it is better to provide an abstraction that allows us to define those operations only once, and they will work with different data structures.

That’s our next step.

Introducing Iterators

The idea behind iterators is that we ask the data structure what is the next item until the data structure no longer has items to emit.

Let’s implement iterators for lists. This time, we will be using Elixir documentation and doctests to detail how we expect iterators to work:

defmodule Iterator do
  @doc """
  Each step needs to return a tuple containing
  the next element and a payload that will be
  invoked the next time around.

      iex> next([1, 2, 3])
      {1, [2, 3]}
      iex> next([2, 3])
      {2, [3]}
      iex> next([3])
      {3, []}
      iex> next([])
      :done
  """
  def next([h|t]) do
    {h, t}
  end

  def next([]) do
    :done
  end
end

We can implement map on top of next:

def map(collection, fun) do
  map_next(next(collection), fun)
end

defp map_next({h, t}, fun) do
  [fun.(h)|map_next(next(t), fun)]
end

defp map_next(:done, _fun) do
  []
end

Since map uses the next function, as long as we implement next for a new data structure, map (and all future functions) should work out of the box. This brings the polymorphism we desired but it has some downsides.

Besides not having ideal performance, it is quite hard to make iterators work with resources (events, I/O, etc), leading to messy and error-prone code.

The trouble with resources is that, if something goes wrong, we need to tell the resource that it should be closed. After all, we don’t want to leave file descriptors or database connections open. This means we need to extend our next contract to introduce at least one other function called halt.

halt should be called if the iteration is interrupted suddenly, either because we are no longer interested in the next items (for example, if someone calls take(collection, 5) to retrieve only the first five items) or because an error happened. Let’s start with take:

def take(collection, n) do
  take_next(next(collection), n)
end

# Invoked on every step
defp take_next({h, t}, n) when n > 0 do
  [h|take_next(next(t), n - 1)]
end

# If we reach this, the collection finished
defp take_next(:done, _n) do
  []
end

# If we reach this, we took all we cared about before finishing
defp take_next(value, 0) do
  halt(value) # Invoke halt as a "side-effect" for resources
  []
end

Implementing take is somewhat straight-forward. However we also need to modify map since every step in the user supplied function can fail. Therefore we need to make sure we call halt on every possible step in case of failures:

def map(collection, fun) do
  map_next(next(collection), fun)
end

defp map_next({h, t}, fun) do
  [try do
     fun.(h)
   rescue
     e ->
       # Invoke halt as a "side-effect" for resources
       # in case of failures and then re-raise
       halt(t)
       raise(e)
   end|map_next(next(t), fun)]
end

defp map_next(:done, _fun) do
  []
end

This is not elegant nor performant. Furthermore, it is very error prone. If we forget to call halt at some particular point, we can end-up with a dangling resource that may never be closed.

Introducing reducers

Not long ago, Clojure introduced the concept of reducers.

Since Elixir protocols were heavily inspired on Clojure protocols, I was very excited to see their take on collection processing. Instead of imposing a particular mechanism for traversing collections as in iterators, reducers are about sending computations to the collection so the collection applies the computation on itself. From the announcement: “the only thing that knows how to apply a function to a collection is the collection itself”.

Instead of using a next function, reducers expect a reduce implementation. Let’s implement this reduce function for lists:

defmodule Reducer do
  def reduce([h|t], acc, fun) do
    reduce(t, fun.(h, acc), fun)
  end

  def reduce([], acc, _fun) do
    acc
  end
end

With reduce, we can easily calculate the sum of a collection:

def sum(collection) do
  reduce(collection, 0, fn x, acc -> x + acc end)
end

We can also implement map in terms of reduce. The list, however, will be reversed at the end, requiring us to reverse it back:

def map(collection, fun) do
  reversed = reduce(collection, [], fn x, acc -> [fun.(x)|acc] end)
  # Call Erlang reverse (implemented in C for performance)
  :lists.reverse(reversed)
end

Reducers provide many advantages:

  • They are conceptually simpler and faster
  • Operations like map, filter, etc are easier to implement than the iterators one since the recursion is pushed to the collection instead of being part of every operation
  • It opens the door to parallelism as its operations are no longer serial (in contrast to iterators)
  • No conceptual changes are required to support resources as collections

The last bullet is the most important for us. Because the collection is the one applying the function, we don’t need to change map to support resources, all we need to do is to implement reduce itself. Here is a pseudo-implementation of reducing a file line by line:

def reduce(file, acc, fun) do
  descriptor = File.open(file)

  try do
    reduce_next(IO.readline(descriptor), acc, fun)
  after
    File.close(descriptor)
  end
end

defp reduce_next({line, descriptor}, acc, fun) do
  reduce_next(IO.readline(descriptor), fun.(line, acc), fun)
end

defp reduce_next(:done, acc, _fun) do
  acc
end

Even though our file reducer uses something that looks like an iterator, because that’s the best way to traverse the file, from the map function perspective we don’t care which operation is used internally. Furthermore, it is guaranteed the file is closed after reducing, regardless of success or failure.

There are, however, two issues when implementing reducers as proposed in Clojure into Elixir.

First of all, some operations like take cannot be implemented in a purely functional way. For example, Clojure relies on reference types on its take implementation. This may not be an issue depending on the language/platform (it certainly isn’t in Clojure) but it is an issue in Elixir as side-effects would require us to spawn new processes every time take is invoked.

Another drawback of reducers is, because the collection is the one controlling the reducing, we cannot implement operations like zip that requires taking one item from a collection, then suspending the reduction, then taking an item from another collection, suspending it, and starting again by resuming the first one and so on. Again, at least not in a purely functional way.

With reducers, we achieve the goal of a single abstraction that works efficiently with in-memory data structures and resources. However, it is limited on the amount of operations we can support efficiently, in a purely functional way, so we had to continue looking.

Introducing iteratees

It was at Code Mesh 2013 that I first heard about iteratees. I attended a talk by Jessica Kerr and, in the first minutes, she described exactly where my mind was at the moment: iterators and reducers indeed have their limitations, but they have been solved in scalaz-stream.

After the talk, Jessica and I started to explore how scalaz-stream solves those problems, eventually leading us to the Monad.Reader issue that introduces iteratees. After some experiments, we had a prototype of iteratees working in Elixir.

With iteratees, we have “instructions” going “up and down” between the source and the reducing function telling what is the next step in the collection processing:

defmodule Iteratee do
  @doc """
  Enumerates the collection with the given instruction.

  If the instruction is a `{:cont, fun}` tuple, the given
  function will be invoked with `{:some, h}` if there is
  an entry in the collection, otherwise `:done` will be
  given.

  If the instruction is `{:halt, acc}`, it means there is
  nothing to process and the collection should halt.
  """
  def enumerate([h|t], {:cont, fun}) do
    enumerate(t, fun.({:some, h})) 
  end

  def enumerate([], {:cont, fun}) do
    fun.(:done)
  end

  def enumerate(_, {:halt, acc}) do
    {:halted, acc}
  end
end

With enumerate defined, we can write map:

def map(collection, fun) do
  {:done, acc} = enumerate(collection, {:cont, mapper([], fun)})
  :lists.reverse(acc)
end

defp mapper(acc, fun) do
  fn
    {:some, h} -> {:cont, mapper([fun.(h)|acc], fun)}
    :done      -> {:done, acc}
  end
end

enumerate is called with {:cont, mapper} where mapper will receive {:some, h} or :done, as defined by enumerate. The mapper function then either returns {:cont, mapper}, with a new mapper function, or {:done, acc} when the collection has told no new items will be emitted.

The Monad.Reader publication defines iteratees as teaching fold (reduce) new tricks. This is precisely what we have done here. For example, while map only returns {:cont, mapper}, it could have returned {:halt, acc} and that would have told the collection to halt. That’s how take could be implemented with iteratees, we would send cont instructions until we are no longer interested in new elements, finally returning halt.

So while iteratees allow us to teach reduce new tricks, they are much harder to grasp conceptually. Not only that, functions implemented with iteratees were from 6 to 8 times slower in Elixir when compared to their reducer counterpart.

In fact, it is even harder to see how iteratees are actually based on reduce since it hides the accumulator inside a closure (the mapper function, in this case). This is also the cause of the performance issues in Elixir: for each mapped element in the collection, we need to generate a new closure, which becomes very expensive when mapping, filtering or taking items multiple times.

That’s when we asked: what if we could keep what we have learned with iteratees while maintaining the simplicity and performance characteristics of reduce?

Introducing reducees

Reducees are similar to iteratees. The difference is that they clearly map to a reduce operation and do not create closures as we traverse the collection. Let’s implement reducee for a list:

defmodule Reducee do
  @doc """
  Reduces the collection with the given instruction,
  accumulator and function.

  If the instruction is a `{:cont, acc}` tuple, the given
  function will be invoked with the next item and the
  accumulator.

  If the instruction is `{:halt, acc}`, it means there is
  nothing to process and the collection should halt.
  """
  def reduce([h|t], {:cont, acc}, fun) do
    reduce(t, fun.(h, acc), fun) 
  end

  def reduce([], {:cont, acc}, _fun) do
    {:done, acc}
  end

  def reduce(_, {:halt, acc}, _fun) do
    {:halted, acc}
  end
end

Our reducee implementations maps cleanly to the original reduce implementation. The only difference is that the accumulator is always wrapped in a tuple containing the next instruction as well as the addition of a halt checking clause.

Implementing map only requires us to send those instructions as we reduce:

def map(collection, fun) do
  {:done, acc} =
    reduce(collection, {:cont, []}, fn x, acc ->
      {:cont, [fun.(x)|acc]}
    end)
 :lists.reverse(acc)
end

Compared to the original reduce implementation:

def map(collection, fun) do
  reversed = reduce(collection, [], fn x, acc -> [fun.(x)|acc] end)
  :lists.reverse(reversed)
end

The only difference between both implementations is the accumulator wrapped in tuples. We have effectively replaced the closures in iteratees by two-item tuples in reducees, which provides a considerably speed up in terms of performance.

The tuple approach allows us to teach new tricks to reducees too. For example, our initial implementation already supports passing {:halt, acc} instead of {:cont, acc}, which we can use to implement take on top of reducees:

def take(collection, n) when n > 0 do
  {_, {acc, _}} =
    reduce(collection, {:cont, {[], n}}, fn
      x, {acc, count} -> {take_instruction(count), {[x|acc], n-1}}
    end)
 :lists.reverse(acc)
end

defp take_instruction(1), do: :halt
defp take_instruction(n), do: :cont

The accumulator in given to reduce now holds a list, to collect results, as well as the number of elements we still need to take from the collection. Once we have taken the last item (count == 1), we halt the collection.

At the end of the day, this is the abstraction that ships with Elixir. It solves all requirements outlined so far: it is simple, fast, works with both in-memory data structures and resources as collections, and it supports both take and zip operations in a purely functional way.

The path forward

Elixir developers mostly do not need to worry about the underlying reducees abstraction. Developers work directly with the module Enum which provides a series of operations that work with any collection. For example:

iex> Enum.map([1, 2, 3], fn x -> x * 2 end)
[2, 4, 6]

All functions in Enum are eager. The map operation above receives a list and immediately returns a list. None the less, it didn’t take long for us to add lazy variants of those operations:

iex> Stream.map([1, 2, 3], fn x -> x * 2 end)
#Stream<...>

All the functions in Stream are lazy: they only store the computation to be performed, traversing the collection just once after all desired computations have been expressed.

In addition, the Stream module provides a series of functions for abstracting resources, generating infinite collections and more.

In other words, in Elixir we use the same abstraction to provide both eager and lazy operations, that accepts both in-memory data structures or resources as collections, all conveniently encapsulated in both Enum and Stream modules.

While reducees are an important milestones, they are not our end goal. After all, the operations in Enum and Stream we have implemented so far are still purely functional and the Rx developers have showed us there is a long way to go once we decide to tackle asynchrony.

That’s exactly what we want to explore next for Elixir. For those interested in learning more, I have explored those topics at Elixirconf EU 2015 (the content related to this post starts at 30:39):

We hope you are as excited as we are about our foundations and what is coming next!

P.S.: An enormous thank you to Jessica Kerr for introducing me to iteratees and pairing with me at Code Mesh. Also, thanks to Jafar Husein for the conversations at Code Mesh and the team behind Rx which we are exploring next. Finally, thank you to James Fish, Pater Hamilton, Eric Meadows-Jönsson and Alexei Sholik for the countless reviews, feedback and prototypes regarding Elixir’s future.


Subscribe to Elixir Radar

]]>
http://blog.plataformatec.com.br/2015/05/introducing-reducees/feed/ 2