Eager loading for greater good

A couple weeks ago, Aaron Patterson (aka @tenderlove) wrote about getting rid of config.threadsafe! on Rails 4. When discussing multi-process and multi-threaded servers in production, one important aspect of the discussion that came up in the blog post was code loading.

This blog post is about which code loading strategies exist in a Rails application, how they affect multi-process and multi-threaded servers, and how to build libraries that will behave well in both environments.

Require

The most common form of loading code in Ruby is via requires.

require "some/library"

Some libraries work great by simply using requires, but as they grow, some of them tend to rely on autoload techniques to avoid loading all up-front. autoload is particularly important on Rails applications because it helps boot time to stay low in development and test environments, since we will load modules as we need them.

Ruby autoload

One of the strategies used by Rails and libraries is Ruby autoload:

module Foo
  autoload :Bar, "path/to/bar"
end

Now, the first time Foo::Bar is accessed, it will be automatically loaded. The issue with this approach is that it is not thread-safe, except for latest JRuby versions (since 1.7) and Ruby master (2.0).

This means that eager loading these modules on boot (i.e. loading it all up-front) is essential for multi-threaded production servers, otherwise it could lead to scenarios where your application tries to access a module that does not exist yet when it just booted.

In multi-process servers, we may or may not want to eager load them. For instance, if your web server uses fork to handle each request (like Unicorn, Passenger), loading them up-front will become more important as we move towards Ruby 2.0 which will provide copy-on-write semantics. In such cases, if we load Foo::Bar on boot, we will have one copy of Foo::Bar shared between all processes otherwise each process will end up loading its own copy of Foo::Bar possibly leading to higher memory usage.

The downside of eager loading is that we may eventually load code that your application never uses, increasing memory usage. This not a problem if your server uses fork or threads, it is exactly what we want (share and load the most we can), but in case you server doesn’t (for example, Thin) nothing bad is going to happen.

Rails autoload

Another strategy used to load code is the one provided by ActiveSupport::Dependencies. In this mode, you don’t need to specify which module to load and its file, but instead Rails tracks a series of directories and tries to load missing constants based on it.

For instance, when you first access the constant User, Rails tries to find it in many directories, including app/models in your application. If a user.rb exists in it, it is loaded.

This approach is not thread safe, so regardless of using Ruby master or JRuby for a threaded server, it needs to be eager loaded. For multi-process servers, the trade-offs are the same as in the previous section.

So, eager loading

In other words, if you are using Unicorn, Puma, Passenger or any JRuby server, you probably want to eager load. Geez, so it basically means that most of us probably do want to eager load.

Although Rails has been taking care of eager loading its frameworks (defined using Ruby autoload) and its application and engines (defined using Rails autoload), the Rails ecosystem has not taken care of eager loading its libraries. This blame is shared with Rails, which could have provided better mechanisms to do so.

In order to better understand the problem, let’s take a template engine like HAML as an example. If the template engine uses Ruby autoload and it is not eager loaded, one of these things will happen:

  1. On multi-threaded servers, it won’t be thread-safe depending on the Ruby version, leading to crashes on the first requests;

  2. On multi-process servers with fork, HAML will have to be loaded on every request, unnecessarily taking response time and memory on each process instance;

  3. On multi-process servers without fork, it will be loaded when its needed the first time, without extra-hassle;

To avoid problems with 1) and 2), Rails 4 will provide tools and mechanisms to make it easier for libraries to eager load code.

First, Rails 4 will ship with a config.eager_load option. Today, eager loading an application is coupled with the config.cache_classes configuration. This means that every time we cache classes, we eager load the app. This is not necessarily the best configuration. For example, in the test environment, an application could benefit by lazily loading the application when running a single test file.

Second, Rails will include a config.eager_load_namespaces option to allow libraries to register namespaces that Rails should eager load. For instance, for Rails 4, Simple Form will probably execute:

config.eager_load_namespaces 

Rails will invoke eager_load! on all objects in the config.eager_load_namespaces list whenever config.eager_load is set to true. For Simple Form, eager loading will load inputs, form builder and others on boot, instead of loading when they are first used in production.

The idea of registering namespaces (and not blocks) is that a user should be able to remove a namespace from the list if it is causing problems or if they don't really need to eager load it.

Rails engines and applications will be automatically added to config.eager_load_namespaces. This is because engines and applications rely on the Rails autoload for everything inside the app directory, which is not thread-safe and should always run on production.

As an extra, Rails will also provide a convenience module called ActiveSupport::Autoload to make it easier to define which modules are auto and eager loaded. We'll see how to use it next.

The recipe

In order to make your libraries eager load ready, here is an easy recipe:

1) Don't worry about app. Everything in app is automatically taken care by Rails applications and engines since they are always added to config.eager_load_namespaces;

2) If you only use require inside lib, you are good to go! This is the recommended for Rails applications and small libraries. However, if your library is considerably big, you may want to use Ruby autoload (next step) to avoid loading your library up-front on boot (which affects rake tasks, development, etc).

3) If you are using autoload in lib, you need to eager load your code. In order to do that, you can use ActiveSupport::Autoload to annotate which modules to eager load. For example, here is the SimpleForm module before being eager load ready:

module SimpleForm
  autoload :Components,        'simple_form/components'
  autoload :ErrorNotification, 'simple_form/error_notification'
  autoload :FormBuilder,       'simple_form/form_builder'
  autoload :Helpers,           'simple_form/helpers'
  autoload :I18nCache,         'simple_form/i18n_cache'
  autoload :Inputs,            'simple_form/inputs'
  autoload :MapType,           'simple_form/map_type'
  autoload :Wrappers,          'simple_form/wrappers'
end

And now eager load ready:

module SimpleForm
  extend ActiveSupport::Autoload
  
  # Helpers are modules that are included on inputs.
  # Usually we don't need to worry about eager loading
  # modules because we will usually eager load a class
  # that uses the module.
  autoload :Helpers
  
  # Wrappers are loaded on boot, it is part of Simple Form
  # configuration API, so no need to worry about them.
  autoload :Wrappers
  
  # We need to eager load these otherwise they will
  # be rendered the first time just when a form is
  # first rendered.
  eager_autoload do
    # Notice ActiveSupport::Autoload allows us to skip
    # the file name as it guesses it from the module.
    autoload :Components
    autoload :ErrorNotification
    autoload :FormBuilder
    autoload :I18nCache
    autoload :Inputs
    autoload :MapType
  end
  
  # ActiveSupport::Autoload automatically defines
  # an eager_load! method. In this case, we are
  # extending the method to also eager load the
  # inputs and components inside Simple Form.
  def self.eager_load!
    super
    SimpleForm::Inputs.eager_load!
    SimpleForm::Components.eager_load!
  end
end

The comments above explain part of the decision process taken in choosing if a module should be eager loaded or not. To be clear, the question you should be asking is: "if I don't eager load this module, when will it be autoloaded"? If the answer is: "during a request", you have to eager load it.

This is the reason we don't need to eager load something like ActionController::Base. Because Rails already eager loads your models, controllers, etc and if you are actually using Action Controller, you will have a controller inheriting from ActionController::Base which then forces it to be loaded on boot.

Similar reasoning applies to most of Active Model modules. There is no need to eager load ActiveModel::Validations because if an application is using it, it will load a framework or a model that actually requires it on boot. You will find that this reasoning will probably apply to most modules in your library.

After defining your eager loads, all you need to do is to define a Railtie and include your eager load namespace in it:

module SimpleForm
  class Railtie 

Summing up

Although Ruby is moving to a direction where autoload is threadsafe, which makes eager load not a requirement, improvements to the language, garbage collectors and web-servers make eager load a "nice to have" feature.

Today, it is very likely that you want to eager load the majority of your code when deploying applications to production. Although Rails has always been eager load friendly, the majority of the tools in the ecosystem are not. Rails 4 will change this panorama by not only giving flexible configuration options to app developers but also convenient abstraction for library developers.

The general recommendation is to use require to load code inside the lib directory if the project is small. Engines and applications do not need to worry about the app directory, since it is automatically taken care by Rails.

The more complex libraries probably already use autoload in the lib directory to avoid loading unnecessary amount of code in development. In such cases they need to provide custom eager_load! instructions for productions environment, which can be done with the help of the recipe above and Rails modules.

This blog post ended up too long but the subject is important and should be considered with care by the authors who maintain the most used libraries out there. Cheers to them!

8 responses to “Eager loading for greater good”

  1. Nando Vieira says:

    Nice post, Pretty!

    Not all gems that use Kernel.autoload want to introduce the ActiveSupport dependency. So the only alternative is loading everything upfront with `require`?

  2. josevalim says:

    If you want to use Kernel.autooad and don’t want to depend on Active Support, you can simply define a eager_load! method and reference the constants you want to load:

    module MyLib do
    def self.eager_load!
    MyLib::Klass
    MyLib::Another
    end
    end

  3. Nando Vieira says:

    That’s nice! <3

  4. Christian Lallemand says:

    Many thanks for this post … i think this explains why sidekiq often fails to load my modules

  5. josevalim says:

    Yes, this is a good point I haven’t talked about in the blog post. It will improve Sidekiq (and any other solution depending on threads) too.

  6. trans says:

    Last I heard autoload was basically deprecated. But it won’t be removed for a while, its marked for Ruby 3.0.

  7. divoxx says:

    Yeah, as far as I know autoload is deprecated. Has that changed Valim?

  8. josevalim says:

    I think Matz wanted to deprecate it because it could not be made threadsafe. However, it seems the JRuby guys were able to. I don’t know where the proposal to deprecate it stands right now.