A couple weeks ago, Aaron Patterson (aka @tenderlove) wrote about getting rid of
config.threadsafe! on Rails 4. When discussing multi-process and multi-threaded servers in production, one important aspect of the discussion that came up in the blog post was code loading.
This blog post is about which code loading strategies exist in a Rails application, how they affect multi-process and multi-threaded servers, and how to build libraries that will behave well in both environments.
The most common form of loading code in Ruby is via requires.
Some libraries work great by simply using requires, but as they grow, some of them tend to rely on autoload techniques to avoid loading all up-front.
autoload is particularly important on Rails applications because it helps boot time to stay low in development and test environments, since we will load modules as we need them.
One of the strategies used by Rails and libraries is Ruby autoload:
module Foo autoload :Bar, "path/to/bar" end
Now, the first time
Foo::Bar is accessed, it will be automatically loaded. The issue with this approach is that it is not thread-safe, except for latest JRuby versions (since 1.7) and Ruby master (2.0).
This means that eager loading these modules on boot (i.e. loading it all up-front) is essential for multi-threaded production servers, otherwise it could lead to scenarios where your application tries to access a module that does not exist yet when it just booted.
In multi-process servers, we may or may not want to eager load them. For instance, if your web server uses fork to handle each request (like Unicorn, Passenger), loading them up-front will become more important as we move towards Ruby 2.0 which will provide copy-on-write semantics. In such cases, if we load
Foo::Bar on boot, we will have one copy of
Foo::Bar shared between all processes otherwise each process will end up loading its own copy of
Foo::Bar possibly leading to higher memory usage.
The downside of eager loading is that we may eventually load code that your application never uses, increasing memory usage. This not a problem if your server uses fork or threads, it is exactly what we want (share and load the most we can), but in case you server doesn’t (for example, Thin) nothing bad is going to happen.
Another strategy used to load code is the one provided by
ActiveSupport::Dependencies. In this mode, you don’t need to specify which module to load and its file, but instead Rails tracks a series of directories and tries to load missing constants based on it.
For instance, when you first access the constant
User, Rails tries to find it in many directories, including
app/models in your application. If a
user.rb exists in it, it is loaded.
This approach is not thread safe, so regardless of using Ruby master or JRuby for a threaded server, it needs to be eager loaded. For multi-process servers, the trade-offs are the same as in the previous section.
So, eager loading
In other words, if you are using Unicorn, Puma, Passenger or any JRuby server, you probably want to eager load. Geez, so it basically means that most of us probably do want to eager load.
Although Rails has been taking care of eager loading its frameworks (defined using Ruby autoload) and its application and engines (defined using Rails autoload), the Rails ecosystem has not taken care of eager loading its libraries. This blame is shared with Rails, which could have provided better mechanisms to do so.
In order to better understand the problem, let’s take a template engine like HAML as an example. If the template engine uses Ruby autoload and it is not eager loaded, one of these things will happen:
On multi-threaded servers, it won’t be thread-safe depending on the Ruby version, leading to crashes on the first requests;
On multi-process servers with fork, HAML will have to be loaded on every request, unnecessarily taking response time and memory on each process instance;
On multi-process servers without fork, it will be loaded when its needed the first time, without extra-hassle;
To avoid problems with 1) and 2), Rails 4 will provide tools and mechanisms to make it easier for libraries to eager load code.
First, Rails 4 will ship with a
config.eager_load option. Today, eager loading an application is coupled with the
config.cache_classes configuration. This means that every time we cache classes, we eager load the app. This is not necessarily the best configuration. For example, in the test environment, an application could benefit by lazily loading the application when running a single test file.
Second, Rails will include a
config.eager_load_namespaces option to allow libraries to register namespaces that Rails should eager load. For instance, for Rails 4, Simple Form will probably execute:
config.eager_load_namespaces << SimpleForm
Rails will invoke
eager_load! on all objects in the
config.eager_load_namespaces list whenever
config.eager_load is set to true. For Simple Form, eager loading will load inputs, form builder and others on boot, instead of loading when they are first used in production.
The idea of registering namespaces (and not blocks) is that a user should be able to remove a namespace from the list if it is causing problems or if they don’t really need to eager load it.
Rails engines and applications will be automatically added to
config.eager_load_namespaces. This is because engines and applications rely on the Rails autoload for everything inside the
app directory, which is not thread-safe and should always run on production.
As an extra, Rails will also provide a convenience module called
ActiveSupport::Autoload to make it easier to define which modules are auto and eager loaded. We’ll see how to use it next.
In order to make your libraries eager load ready, here is an easy recipe:
1) Don’t worry about
app. Everything in
app is automatically taken care by Rails applications and engines since they are always added to
2) If you only use
lib, you are good to go! This is the recommended for Rails applications and small libraries. However, if your library is considerably big, you may want to use Ruby autoload (next step) to avoid loading your library up-front on boot (which affects rake tasks, development, etc).
3) If you are using autoload in
lib, you need to eager load your code. In order to do that, you can use
ActiveSupport::Autoload to annotate which modules to eager load. For example, here is the
SimpleForm module before being eager load ready:
module SimpleForm autoload :Components, 'simple_form/components' autoload :ErrorNotification, 'simple_form/error_notification' autoload :FormBuilder, 'simple_form/form_builder' autoload :Helpers, 'simple_form/helpers' autoload :I18nCache, 'simple_form/i18n_cache' autoload :Inputs, 'simple_form/inputs' autoload :MapType, 'simple_form/map_type' autoload :Wrappers, 'simple_form/wrappers' end
And now eager load ready:
module SimpleForm extend ActiveSupport::Autoload # Helpers are modules that are included on inputs. # Usually we don't need to worry about eager loading # modules because we will usually eager load a class # that uses the module. autoload :Helpers # Wrappers are loaded on boot, it is part of Simple Form # configuration API, so no need to worry about them. autoload :Wrappers # We need to eager load these otherwise they will # be rendered the first time just when a form is # first rendered. eager_autoload do # Notice ActiveSupport::Autoload allows us to skip # the file name as it guesses it from the module. autoload :Components autoload :ErrorNotification autoload :FormBuilder autoload :I18nCache autoload :Inputs autoload :MapType end # ActiveSupport::Autoload automatically defines # an eager_load! method. In this case, we are # extending the method to also eager load the # inputs and components inside Simple Form. def self.eager_load! super SimpleForm::Inputs.eager_load! SimpleForm::Components.eager_load! end end
The comments above explain part of the decision process taken in choosing if a module should be eager loaded or not. To be clear, the question you should be asking is: “if I don’t eager load this module, when will it be autoloaded”? If the answer is: “during a request”, you have to eager load it.
This is the reason we don’t need to eager load something like
ActionController::Base. Because Rails already eager loads your models, controllers, etc and if you are actually using Action Controller, you will have a controller inheriting from
ActionController::Base which then forces it to be loaded on boot.
Similar reasoning applies to most of Active Model modules. There is no need to eager load
ActiveModel::Validations because if an application is using it, it will load a framework or a model that actually requires it on boot. You will find that this reasoning will probably apply to most modules in your library.
After defining your eager loads, all you need to do is to define a Railtie and include your eager load namespace in it:
module SimpleForm class Railtie < Rails::Railtie config.eager_load_namespaces << SimpleForm end end
Although Ruby is moving to a direction where autoload is threadsafe, which makes eager load not a requirement, improvements to the language, garbage collectors and web-servers make eager load a “nice to have” feature.
Today, it is very likely that you want to eager load the majority of your code when deploying applications to production. Although Rails has always been eager load friendly, the majority of the tools in the ecosystem are not. Rails 4 will change this panorama by not only giving flexible configuration options to app developers but also convenient abstraction for library developers.
The general recommendation is to use
require to load code inside the
lib directory if the project is small. Engines and applications do not need to worry about the
app directory, since it is automatically taken care by Rails.
The more complex libraries probably already use autoload in the
lib directory to avoid loading unnecessary amount of code in development. In such cases they need to provide custom
eager_load! instructions for productions environment, which can be done with the help of the recipe above and Rails modules.
This blog post ended up too long but the subject is important and should be considered with care by the authors who maintain the most used libraries out there. Cheers to them!