Edit: We have got some great feedback that highlighted inaccuracies and unfairness in our post. We have amended the post accordingly but you can read the errata at the bottom.
With Rails 5 soon to be released, many developers are planning to further explore Action Cable and add stateful features to their web applications via WebSockets. In this article we will highlight some points worth discussing when deploying such features.
When we use HTTP, scaling horizontally and vertically is cheaper and easier as the server is stateless. Every request contains all the information for it to be fulfilled, like the current user id stored in a cookie, which is then fetched and processed. From this perspective, once you access a given page, it doesn’t matter much which server or operating system process is going to fulfill it.
With WebSockets, instead of isolated requests, you have a long-running conversation. In this setup, clients connect to a single machine and they will stay exchanging messages with that particular machine as long as they are online.
Before moving forward, let’s try to put some numbers on how your application is affected once you go stateful.
Imagine you run a newspaper application and you render 100 articles per second. Assuming a uniform load, your infrastructure needs only to handle 100 connections per second. Now imagine you want to use WebSockets so readers can know right away if there is a new comment to the article they are reading. If the average read time per article is of 1 minute, your server now needs to effectively handle 6000 open connections per second (100 articles/s * 60s/article). As a rough estimate, you can expect the number of open connections to be multiplied by the time users spend on the application.
The first requirement of stateful applications is to handle long-running connections. Your infrastructure must also be able to do so concurrently. From the proxy to webservers, you must be able to hold multiple long-running connections at the same time. Not only that, you want a single webserver to serve as many connections as possible, in the cheapest way as possible (since every single connection costs memory).
Let’s continue studying the scenario above. Imagine a new article is published and is receiving 100 requests per second. The article also takes 1 minute to read on average (same numbers as above for simplicity). When someone publishes a new comment, we now need to broadcast this information to all 6000 clients.
In order to quantify this, let’s imagine the worst case scenario which you would never want to run in production: where you have a single operating process per WebSocket connection. Once a new comment is published, it would have to be broadcast 6000 times, one for each process, which will then push this information to the client.
However, if you can hold 6000 connections on a single machine, in the same OS process, the data will be broadcast only once. In other words, you want a single machine to hold as many connections as possible, reducing the latency across your events. The end result will be an increased user experience and reduced infrastructure cost.
To hold as many connections as possible, your runtime must use your machine resources, like IO and CPU, as efficiently as possible. While the huge majority of languages provide threads, which won’t block on IO and will provide CPU-based concurrency, not all of them can leverage multi-core efficiently.
One of the concerns when writing stateful apps is how your web server will behave when multiple clients are connected. Because multiple clients may be sending or receiving events at the same time, your runtime needs to be efficient when multiplexing those connections. If your runtime cannot effectively handle incoming CPU activity, different actions can block the connection (or your channels) causing latency to increase considerably, really fast.
To see how this can impact your clients, imagine you have 1000 channel events from multiple clients to handle, each taking on average 10ms due to CPU. By the time you need to process the 1000th event, that client has already waited 10 seconds (1000 * 10ms). Those problems are much easier to solve in a stateless world because we can easily load balance and send requests to other machines. With WebSockets, the machine you are connected to will be the one doing the work.
It is extremely important to clarify that almost everything you do in your programming language uses the CPU: calling a method, rendering a template, parsing some data. Because the main Ruby implementation has a Global Virtual Machine Lock, there is a good amount of actions that will block you from executing more than one action at once even when multiple cores are present.
To work around this limitation in Rails, you typically queue a job that would perform the rendering and publishing of events in the background. Then Rails implements a worker that is started by the job queue and broadcasts the event. This workflow adds a whole amount of indirection which should not really be needed. You need to be careful so workers that are CPU intensive are not running on the same process as your channels as they would be competing for CPU.
Today we live in a multi-core world. We need to rely on languages that can multiplex both CPU and IO events across multiple cores without locking. And common platforms like Node.js/EventMachine/Twisted are not a solution to this problem exactly because they only cover the IO side, which is not an issue in the majority of threaded languages (including Ruby), while still forcing you to write code in a convoluted callback style way.
Comparing infrastructure in Rails and Phoenix
To exemplify how proper concurrency support leads to simpler solutions, let’s compare examples of workflows between channels in Rails and Phoenix and how it affects our infrastructure.
In Rails we typically move the CPU-intensive tasks to job queues. Therefore the flow for receiving an event from a client and broadcasting it to everyone can be done as follows:
- The client pushes an event to the channel
- The channel puts a job into a job queue
- The job library will instantiate a worker
- The worker transmits the event to the pubsub adapter (Redis or PostgreSQL)
- The pubsub system pushes the broadcast to the server
- The server pushes the broadcast to all clients
On the other hand, let’s see how that would work in Phoenix. Phoenix runs on the Erlang VM which provides multi-core and distributed support out of the box. Receiving an event from a client and broadcasting it to everyone in Phoenix works as follows:
- The client pushes an event to the channel
- The channel transmits the event to the pubsub adapter
- The pubsub adapter pushes the broadcast to all clients
Phoenix does not impose a job queue because Phoenix channels run on the Erlang VM which can leverage all of your machine cores efficiently. If you have 2 or 40 cores, the machine will multiplex CPU-heavy requests, workers and channels across all cores.
Furthermore, Phoenix does not require external PubSub adapters. For a broadcast that was started on the current machine, the data is broadcast to all connected clients directly, without round-trips to Redis. When deploying to multiple machines, Phoenix runs on distributed mode and automatically broadcasts to other nodes without relying on Redis or Postgres. You get a distributed multi-server abstraction that looks like a single channel.
When running stateful applications, leveraging multi-core concurrency is preferred as it leads to simpler applications and better user experience due to reduced latency.
When such is not available, developers may need to work around such limitations. This applies to any platform without a proper concurrency model. For example, when using Socket.IO for Node.js, you need to avoid long computations from blocking Node.js’ event loop. When running on cluster mode (for multi-core usage) or in multiple nodes, broadcasts must first be sent to Redis.
On the other hand, Phoenix channels use all cores, which means developers no longer need to worry about low level details when writing channel code. Phoenix channels are as joyful and productive as any other part of the Phoenix web stack. Phoenix is able to support 2 million connections on a single node or run in distributed mode without Redis or any other adapter, giving engineers the option of scaling horizontally or vertically (or both).
The fact Phoenix PubSub does not require external tools paired with the Erlang VM fantastic support for concurrency is what allowed Phoenix to broadcast a wikipedia article to 2 million clients in about 5 seconds. Of course many developers won’t push the framework to such limits. Rather they are the guarantee you won’t have to sacrifice your productivity and code maintainability. You get beautiful code and great user experience without compromises.
These are some of the many reasons why we are excited about Phoenix. It brings back the simplicity and joy in writing modern web applications by mixing tried and true technologies with a fresh breeze of functional ideas.
We have removed some inaccuracies and unfairness in the blog post. We would like to highlight a couple of them below.
In its initial version, we said: “Currently in Rails, it is advised to deploy at least on two machines, one for stateless and another for stateful clients, and to avoid any blocking action in channels”. This is no longer true in Rails master, you can deploy on the same machine as long as you use a proper web server (like Puma – the new default – or Thin). There is no need to split stateful and stateless load apart, yay!
We originally wrote: “It is extremely important to clarify that almost everything you do in your Ruby code uses the CPU”. That’s of course true for all programming languages and we didn’t want to imply it was Ruby specific. We have amended the blog post to avoid any confusion.
We also mentioned “The most problematic aspect of Action Cable is that we should not perform any action (IO or CPU) in the channel because it would block the channel.” Action Cable doesn’t block on IO. It does not block on CPU per-se, but the Global Virtual Machine Lock will.
We want to thank Rafael França for helping us with these errata.