Posts by José Valim

We are very glad to announce the logos for two of our favorite Rails open source projects…

Simple Form:

Simple Form Logo

And Devise:

Devise Logo

We would like to congratulate our designer, Bruna Kochi, who was able to capture the essence of each project in their logos. We will write about their design process soon!

Those projects have been in the Rails community for almost 4 years and it was about time for them to have their own visual identity! We would like to thank all contributors and users who have helped those projects to be more robust, flexible and popular!

Hi everybody.

I’d like to announce that Devise v2.2.3, v2.1.3, v2.0.5 and v1.5.4 have been released with a security patch. Upgrade immediately unless you are using PostgreSQL or SQLite3. Users of all other databases (including NoSQL ones) require immediate upgrade.

Using a specially crafted request, an attacker could trick the database type conversion code to return incorrect records. For some token values this could allow an attacker to bypass the proper checks and gain control of other accounts.

In case you are using a Devise series older than the ones listed above, recommendations are provided below back to v1.2 series. Regardless, an upgrade to more recent versions is advised.

Versions affected

We checked all Devise versions released in the previous two years and recommendations follows as below.

v1.5, v2.0, v2.1 and v2.2 series

You can upgrade to any of v2.2.3, v2.1.3, v2.0.5 and v1.5.4. In case an upgrade is not feasible, please add the following patch to config/initializers/devise_patch.rb inside your Rails application:

Devise::ParamFilter.class_eval do
  def param_requires_string_conversion?(_value); true; end
end

v1.4 series

Please add the following patch to config/initializers/devise_patch.rb inside your Rails application:

Devise::Models::Authenticatable::ClassMethods.class_eval do
  def auth_param_requires_string_conversion?(value); true; end
end

Please upgrade to more recent versions.

v1.2 and v1.3 series

Not affected by this vulnerability. Please upgrade to more recent versions.

Upgrade notice

When upgrading to any of v2.2.3, v2.1.3, v2.0.5 and v1.5.4, some people may be relying on some wrong behaviour to filter data retrieved on authentication. For example, one may have writen in his model:

def find_for_authentication(conditions)
  conditions[:active] = true
  super
end

The code above may no longer work and needs to be rewriten as:

def find_for_authentication(conditions)
  find_first_by_auth_conditions(conditions, active: true)
end

Thank you notes

We would like to thank joernchen of Phenoelit for disclosing this vulnerability and working with us on a patch.

Last May we happily announced that Rafael França and Carlos Antonio earned commit access to the Ruby on Rails repository – it was a great accomplishment that deserved its own blog post. Today, we have some great news and we want to share with our readers. Just a few days ago, our team mate Rafael França (@rafaelfranca) received an invitation to join the Rails Core Team. And, of course, he accepted!

Rafael contributed to different features coming up in the next Rails 4 release and worked extensively with other Rails contributors to smash bugs and provide many improvements to the framework. I am personally happy to have a friend joining me in the Rails Core Team.

Furthermore, Rafael França’s contributions go beyond Rails. Lately he has also contributed to many other open source projects, like Mongoid, Janky and Dalli besides Plataformatec’s own projects like Devise, Simple Form and Elixir.

Rafael, congratulations! :D

From all your friends at Plataformatec.

It is common for web applications to interface with external services. When testing, since depending on an external service is very fragile, we end up mocking the interaction with such services. However, once in a while, it is still a good idea to check if the contract between your application and the service is still valid.

For example, this week we had to interact with a SOAP service, let’s call it KittenInfo (why would someone provide kitten information via a SOAP service is beyond the scope of this blog post). We only need to contact one end-point of the KittenInfo and it is called get_details, which receives a kitten identifier and returns kitten information:

KittenInfo::Client.new.get_details("gorbypuff")

Since this API is simple, it is very easy to mock the client whenever it is required by our application. On the other hand, we still need to verify that the integration between KittenInfo SOAP service and our application works correctly, so we write some tests for it:

describe KittenInfo::Client do
  it "retrieves kitten details" do
    client  = KittenInfo::Client.new
    details = client.get_details("gorbypuff")
    details[:owner].should == "tenderlove"
  end
end

However, since this is actually contacting the SOAP Service, it may make your test suite more fragile and slower, even more in this case, in which the SOAP Service responses take as long as kitten’s staring contests.

One possible solution to this problem is to make use of filter tags to exclude the SOAP integration tests from running, except when explicitly desired. We could do this by simply setting:

describe KittenInfo::Client, external: true do
  # ...
end

Then, in your spec_helper.rb, just set:

RSpec.configure do |config|
  config.filter_run_excluding external: true
end

Now, running your specs will by default skip all groups that have :external set to true. Whenever you tweak the client, or in your builds, you can run those specific tests with:

$ rspec --tag external

Notice that filter mechanism is similar to how we enable JavaScript tests when using Capybara. This means that, when using Capybara, you could also run all JavaScript tests in your app via $ rspec --tag js or all non-JavaScript tests with $ rspec --tag ~js.

What about you? What is your favorite RSpec trick?

A couple weeks ago, Aaron Patterson (aka @tenderlove) wrote about getting rid of config.threadsafe! on Rails 4. When discussing multi-process and multi-threaded servers in production, one important aspect of the discussion that came up in the blog post was code loading.

This blog post is about which code loading strategies exist in a Rails application, how they affect multi-process and multi-threaded servers, and how to build libraries that will behave well in both environments.

Require

The most common form of loading code in Ruby is via requires.

require "some/library"

Some libraries work great by simply using requires, but as they grow, some of them tend to rely on autoload techniques to avoid loading all up-front. autoload is particularly important on Rails applications because it helps boot time to stay low in development and test environments, since we will load modules as we need them.

Ruby autoload

One of the strategies used by Rails and libraries is Ruby autoload:

module Foo
  autoload :Bar, "path/to/bar"
end

Now, the first time Foo::Bar is accessed, it will be automatically loaded. The issue with this approach is that it is not thread-safe, except for latest JRuby versions (since 1.7) and Ruby master (2.0).

This means that eager loading these modules on boot (i.e. loading it all up-front) is essential for multi-threaded production servers, otherwise it could lead to scenarios where your application tries to access a module that does not exist yet when it just booted.

In multi-process servers, we may or may not want to eager load them. For instance, if your web server uses fork to handle each request (like Unicorn, Passenger), loading them up-front will become more important as we move towards Ruby 2.0 which will provide copy-on-write semantics. In such cases, if we load Foo::Bar on boot, we will have one copy of Foo::Bar shared between all processes otherwise each process will end up loading its own copy of Foo::Bar possibly leading to higher memory usage.

The downside of eager loading is that we may eventually load code that your application never uses, increasing memory usage. This not a problem if your server uses fork or threads, it is exactly what we want (share and load the most we can), but in case you server doesn’t (for example, Thin) nothing bad is going to happen.

Rails autoload

Another strategy used to load code is the one provided by ActiveSupport::Dependencies. In this mode, you don’t need to specify which module to load and its file, but instead Rails tracks a series of directories and tries to load missing constants based on it.

For instance, when you first access the constant User, Rails tries to find it in many directories, including app/models in your application. If a user.rb exists in it, it is loaded.

This approach is not thread safe, so regardless of using Ruby master or JRuby for a threaded server, it needs to be eager loaded. For multi-process servers, the trade-offs are the same as in the previous section.

So, eager loading

In other words, if you are using Unicorn, Puma, Passenger or any JRuby server, you probably want to eager load. Geez, so it basically means that most of us probably do want to eager load.

Although Rails has been taking care of eager loading its frameworks (defined using Ruby autoload) and its application and engines (defined using Rails autoload), the Rails ecosystem has not taken care of eager loading its libraries. This blame is shared with Rails, which could have provided better mechanisms to do so.

In order to better understand the problem, let’s take a template engine like HAML as an example. If the template engine uses Ruby autoload and it is not eager loaded, one of these things will happen:

  1. On multi-threaded servers, it won’t be thread-safe depending on the Ruby version, leading to crashes on the first requests;

  2. On multi-process servers with fork, HAML will have to be loaded on every request, unnecessarily taking response time and memory on each process instance;

  3. On multi-process servers without fork, it will be loaded when its needed the first time, without extra-hassle;

To avoid problems with 1) and 2), Rails 4 will provide tools and mechanisms to make it easier for libraries to eager load code.

First, Rails 4 will ship with a config.eager_load option. Today, eager loading an application is coupled with the config.cache_classes configuration. This means that every time we cache classes, we eager load the app. This is not necessarily the best configuration. For example, in the test environment, an application could benefit by lazily loading the application when running a single test file.

Second, Rails will include a config.eager_load_namespaces option to allow libraries to register namespaces that Rails should eager load. For instance, for Rails 4, Simple Form will probably execute:

config.eager_load_namespaces << SimpleForm

Rails will invoke eager_load! on all objects in the config.eager_load_namespaces list whenever config.eager_load is set to true. For Simple Form, eager loading will load inputs, form builder and others on boot, instead of loading when they are first used in production.

The idea of registering namespaces (and not blocks) is that a user should be able to remove a namespace from the list if it is causing problems or if they don’t really need to eager load it.

Rails engines and applications will be automatically added to config.eager_load_namespaces. This is because engines and applications rely on the Rails autoload for everything inside the app directory, which is not thread-safe and should always run on production.

As an extra, Rails will also provide a convenience module called ActiveSupport::Autoload to make it easier to define which modules are auto and eager loaded. We’ll see how to use it next.

The recipe

In order to make your libraries eager load ready, here is an easy recipe:

1) Don’t worry about app. Everything in app is automatically taken care by Rails applications and engines since they are always added to config.eager_load_namespaces;

2) If you only use require inside lib, you are good to go! This is the recommended for Rails applications and small libraries. However, if your library is considerably big, you may want to use Ruby autoload (next step) to avoid loading your library up-front on boot (which affects rake tasks, development, etc).

3) If you are using autoload in lib, you need to eager load your code. In order to do that, you can use ActiveSupport::Autoload to annotate which modules to eager load. For example, here is the SimpleForm module before being eager load ready:

module SimpleForm
  autoload :Components,        'simple_form/components'
  autoload :ErrorNotification, 'simple_form/error_notification'
  autoload :FormBuilder,       'simple_form/form_builder'
  autoload :Helpers,           'simple_form/helpers'
  autoload :I18nCache,         'simple_form/i18n_cache'
  autoload :Inputs,            'simple_form/inputs'
  autoload :MapType,           'simple_form/map_type'
  autoload :Wrappers,          'simple_form/wrappers'
end

And now eager load ready:

module SimpleForm
  extend ActiveSupport::Autoload
 
  # Helpers are modules that are included on inputs.
  # Usually we don't need to worry about eager loading
  # modules because we will usually eager load a class
  # that uses the module.
  autoload :Helpers
 
  # Wrappers are loaded on boot, it is part of Simple Form
  # configuration API, so no need to worry about them.
  autoload :Wrappers
 
  # We need to eager load these otherwise they will
  # be rendered the first time just when a form is
  # first rendered.
  eager_autoload do
    # Notice ActiveSupport::Autoload allows us to skip
    # the file name as it guesses it from the module.
    autoload :Components
    autoload :ErrorNotification
    autoload :FormBuilder
    autoload :I18nCache
    autoload :Inputs
    autoload :MapType
  end
 
  # ActiveSupport::Autoload automatically defines
  # an eager_load! method. In this case, we are
  # extending the method to also eager load the
  # inputs and components inside Simple Form.
  def self.eager_load!
    super
    SimpleForm::Inputs.eager_load!
    SimpleForm::Components.eager_load!
  end
end

The comments above explain part of the decision process taken in choosing if a module should be eager loaded or not. To be clear, the question you should be asking is: “if I don’t eager load this module, when will it be autoloaded”? If the answer is: “during a request”, you have to eager load it.

This is the reason we don’t need to eager load something like ActionController::Base. Because Rails already eager loads your models, controllers, etc and if you are actually using Action Controller, you will have a controller inheriting from ActionController::Base which then forces it to be loaded on boot.

Similar reasoning applies to most of Active Model modules. There is no need to eager load ActiveModel::Validations because if an application is using it, it will load a framework or a model that actually requires it on boot. You will find that this reasoning will probably apply to most modules in your library.

After defining your eager loads, all you need to do is to define a Railtie and include your eager load namespace in it:

module SimpleForm
  class Railtie < Rails::Railtie
    config.eager_load_namespaces << SimpleForm
  end
end

Summing up

Although Ruby is moving to a direction where autoload is threadsafe, which makes eager load not a requirement, improvements to the language, garbage collectors and web-servers make eager load a “nice to have” feature.

Today, it is very likely that you want to eager load the majority of your code when deploying applications to production. Although Rails has always been eager load friendly, the majority of the tools in the ecosystem are not. Rails 4 will change this panorama by not only giving flexible configuration options to app developers but also convenient abstraction for library developers.

The general recommendation is to use require to load code inside the lib directory if the project is small. Engines and applications do not need to worry about the app directory, since it is automatically taken care by Rails.

The more complex libraries probably already use autoload in the lib directory to avoid loading unnecessary amount of code in development. In such cases they need to provide custom eager_load! instructions for productions environment, which can be done with the help of the recipe above and Rails modules.

This blog post ended up too long but the subject is important and should be considered with care by the authors who maintain the most used libraries out there. Cheers to them!

Or, even better, why your web framework should not adopt a CGI-based API.

For the past few years I have been studying and observing the development of different emerging languages closely with a special focus on web frameworks/servers. Unfortunately, most of the new web frameworks are following the Rack/WSGI specification which may be a mistake depending on the platform you are targeting (particularly true for Erlang and Node.js which have very strong streaming foundations and is by default part of their stack).

This blog post is an attempt to detail the limitations in the Rack/CGI-based APIs that the Rails Core Team has found while working with the streaming feature that shipped with Rails 3.1 and why we need better abstractions in the long term.

Case in study

The use case we have in mind here is streaming. In Rails, we have focused on streaming as a way to quickly return the head of the HTML page to the browser, so the browser can start downloading assets (like javascript and stylesheet) while the server is generating the rest of the page. There is a great entry on Rails weblog about streaming in general and a Railscast if you want to focus on how to use it in your Rails applications. However, streaming is not only limited to HTML responses and can also be useful in API endpoints, for example, to stream search results as they pop-up or to synchronize with mobile devices.

The Rack specification in a nutshell

In Rack, the response is made by an array with three elements: [status, headers, body]

The body can be any object that responds to the method each. This means streaming can be done by passing an object that will, for example, lazily read a file and stream chunks when each is called.

A Rack application is any object that implements the method call and receives an environment hash/dictionary with the request information. When I said above that most new web frameworks are following the Rack specification, is because they are introducing an API similar to this one just described.

The issue

In order to understand the issue, we will consider three entities: the client, the server and the application. The client (for example a browser) sends a request to the server which forwards it to an application. In this case, the server and the application are communicating via the Rack API.

The issue in streaming cases is that, sending a response back from the application does not mean the application finished processing. For example, consider a middleware (a middleware is an object that sits in between the server and our application) that opens up a connection to the database for the duration of the request and cleans it afterwards:

def call(env)
  connection = DB.checkout_connection
  env["db.connection"] = connection
  @app.call(env)
ensure
  DB.checkin_connection connection
end

Without streaming, it would work as follow:

  1. The server receives a request and passes it down the stack
  2. The request reaches the middleware
  3. The middleware checks out the connection
  4. The application is invoked, renders a view accessing the database using the connection and returns the rendered view as a string
  5. The middleware checks the connection back in
  6. The response is sent back to the client

With streaming, this would happen:

  1. The server receives a request and passes it down the stack
  2. The request reaches the middleware
  3. The middleware checks out the connection
  4. The app is called but does not render anything. Instead it returns a lazy object as response body that will stream the HTML page in chunks as the `each` method is called
  5. The middleware checks the connection back in
  6. Back in the server, we have received the lazy body and will start to stream it
  7. While streaming the body, since the body is lazily calculated, now is the time it must access the database. But, since the middleware already checked the connection back in, our code will fail with a “not connected” exception

The first reaction to solve this issue is to ensure that all streaming happens inside the application, i.e. the application would have a mechanism to stream the response back and only when it is done it would return the Rack response back. However, if the application does this, any middleware that desires to modify the header or the response body won’t be able to do so because the response was already streamed from inside the application.

Our work-around for Rails was to create proxies that wrap the response body:

def call(env)
  connection = DB.checkout_connection
  env["db.connection"] = connection
  response = @app.call(env)
  ResponseProxy.new(response).on_close do
    DB.checkin_connection connection
  end
end

However, this is inefficient and extremely limited (not all middleware can be converted to such approach). In order for streaming to be successful, the underlying server API needs to acknowledge that the headers and the response body can be sent at different times. Not only that, it needs to provide proper callbacks around the response lifecycle (before sending headers, when the response is closed, on each stream, etc).

The trade-off here is that this can no longer be achieved with an easy API as Rack’s. In general, we would like to have a response objects that provides several life-cycle hooks. For example, the middleware above could be rewritten as:

def call(request, response)
  connection = DB.checkout_connection
  request.env["db.connection"] = connection
  response.on_close { DB.checkin_connection(connection) }
  @app.call(request, response)
end

The Java Servlet specification is a good example of how request and response objects could be designed to provide such hooks.

Other middleware

In the example above I focused on the database connection middleware but this limitation exists, in one way or the other, in the majority of middleware in a stack. For example, a middleware that rescues any exception inside the application to render a 500 page also needs to be adapted. Other middleware simply won’t work. For instance, Rails ships with a middleware that provides an ETag header based on the body which has to be disabled when streaming.

Looking back

Does this mean moving to Rack was a mistake? Not at all. Rack appeared when the web development Ruby community was fragmented and the simplicity of the Rack API made it possible to unify the different web frameworks and web servers available. Looking back, I would take the standardization provided by Rack any day regardless of the limitations it brings. Now that we have a standard, we are already working on addressing such issues, which leads us to…

Looking forward

Streaming will become more and more important. While working with HTML streaming requires special attention both technically and also in terms of usability, as outlined in Rails’ documentation, API endpoints could benefit from it with basically no extra cost. Not only that, features in HTML5 like server-sent events could easily be built on top of streaming without requiring a specific end-point in your application to handle them.

While CGI was originally streaming friendly, the abstractions we built on top of it (like middleware) are not. I believe web frameworks should be moving towards better connection/socket abstractions and away from the old CGI-based APIs, which served us well but it is finally time for us to let it go.

PS: Thanks to Aaron Patterson (who has also written about this issue in his blog), Yehuda Katz, Konstantin Haase and James Tucker for early review and feedback.

F.A.Q.

This section was added after the blog post was released based on some common questions.

Q: Isn’t it a bad idea to mix both streaming and non-streaming behavior in the same stack?

That depends on the stack. This is definitely not an issue with Erlang and Node.js since both stacks are streaming based. In Ruby, I believe a threaded jRuby or Thin will allow you to get away with keeping a socket open waiting for responses, but it will probably turn out to be a bad idea with other servers since the process holding the socket won’t be able to respond to any other request.

Q: Is there a need to do everything streaming based when a request/response would be fine?

No, there is no need. The point of the blog post is not to advocate for streaming only frameworks, but simply state that a Rack API may severely limit your streaming capability in case your platform supports it. Personally, I would like to be able to choose and mix both, if my stack allows me to do so.