{"id":3043,"date":"2012-08-28T18:01:56","date_gmt":"2012-08-28T21:01:56","guid":{"rendered":"http:\/\/blog.plataformatec.com.br\/?p=3043"},"modified":"2012-08-28T18:01:56","modified_gmt":"2012-08-28T21:01:56","slug":"eager-loading-for-greater-good","status":"publish","type":"post","link":"https:\/\/blog.plataformatec.com.br\/2012\/08\/eager-loading-for-greater-good\/","title":{"rendered":"Eager loading for greater good"},"content":{"rendered":"

A couple weeks ago, Aaron Patterson (aka @tenderlove<\/a>) wrote about getting rid of config.threadsafe!<\/code> on Rails 4<\/a>. When discussing multi-process and multi-threaded servers in production, one important aspect of the discussion that came up in the blog post was code loading.<\/p>\n

This blog post is about which code loading strategies exist in a Rails application, how they affect multi-process and multi-threaded servers, and how to build libraries that will behave well in both environments.<\/p>\n

Require<\/h3>\n

The most common form of loading code in Ruby is via requires.<\/p>\n

\nrequire \"some\/library\"\n<\/pre>\n

Some libraries work great by simply using requires, but as they grow, some of them tend to rely on autoload techniques to avoid loading all up-front. autoload<\/code> is particularly important on Rails applications because it helps boot time to stay low in development and test environments, since we will load modules as we need them.<\/p>\n

Ruby autoload<\/h3>\n

One of the strategies used by Rails and libraries is Ruby autoload:<\/p>\n

\nmodule Foo\n  autoload :Bar, \"path\/to\/bar\"\nend\n<\/pre>\n

Now, the first time Foo::Bar<\/code> is accessed, it will be automatically loaded. The issue with this approach is that it is not thread-safe, except for latest JRuby versions (since 1.7) and Ruby master (2.0).<\/p>\n

This means that eager loading these modules on boot (i.e. loading it all up-front) is essential for multi-threaded production servers, otherwise it could lead to scenarios where your application tries to access a module that does not exist yet when it just booted.<\/p>\n

In multi-process servers, we may or may not want to eager load them. For instance, if your web server uses fork to handle each request (like Unicorn, Passenger), loading them up-front will become more important as we move towards Ruby 2.0 which will provide copy-on-write semantics. In such cases, if we load Foo::Bar<\/code> on boot, we will have one copy of Foo::Bar<\/code> shared between all processes otherwise each process will end up loading its own copy of Foo::Bar<\/code> possibly leading to higher memory usage.<\/p>\n

The downside of eager loading is that we may eventually load code that your application never uses, increasing memory usage. This not a problem if your server uses fork or threads, it is exactly what we want (share and load the most we can), but in case you server doesn’t (for example, Thin) nothing bad is going to happen.<\/p>\n

Rails autoload<\/h3>\n

Another strategy used to load code is the one provided by ActiveSupport::Dependencies<\/code>. In this mode, you don’t need to specify which module to load and its file, but instead Rails tracks a series of directories and tries to load missing constants based on it.<\/p>\n

For instance, when you first access the constant User<\/code>, Rails tries to find it in many directories, including app\/models<\/code> in your application. If a user.rb<\/code> exists in it, it is loaded.<\/p>\n

This approach is not thread safe, so regardless of using Ruby master or JRuby for a threaded server, it needs to be eager loaded. For multi-process servers, the trade-offs are the same as in the previous section.<\/p>\n

So, eager loading<\/h3>\n

In other words, if you are using Unicorn, Puma, Passenger or any JRuby server, you probably want to eager load. Geez, so it basically means that most of us probably do want to eager load.<\/p>\n

Although Rails has been taking care of eager loading its frameworks (defined using Ruby autoload) and its application and engines (defined using Rails autoload), the Rails ecosystem has not taken care of eager loading its libraries. This blame is shared with Rails, which could have provided better mechanisms to do so.<\/p>\n

In order to better understand the problem, let’s take a template engine like HAML as an example. If the template engine uses Ruby autoload and it is not eager loaded, one of these things will happen:<\/p>\n

    \n
  1. \n

    On multi-threaded servers, it won’t be thread-safe depending on the Ruby version, leading to crashes on the first requests;<\/p>\n<\/li>\n

  2. \n

    On multi-process servers with fork, HAML will have to be loaded on every request, unnecessarily taking response time and memory on each process instance;<\/p>\n<\/li>\n

  3. \n

    On multi-process servers without fork, it will be loaded when its needed the first time, without extra-hassle;<\/p>\n<\/li>\n<\/ol>\n

    To avoid problems with 1) and 2), Rails 4 will provide tools and mechanisms to make it easier for libraries to eager load code.<\/p>\n

    First, Rails 4 will ship with a config.eager_load<\/code> option. Today, eager loading an application is coupled with the config.cache_classes<\/code> configuration. This means that every time we cache classes, we eager load the app. This is not necessarily the best configuration. For example, in the test environment, an application could benefit by lazily loading the application when running a single test file.<\/p>\n

    Second, Rails will include a config.eager_load_namespaces<\/code> option to allow libraries to register namespaces that Rails should eager load. For instance, for Rails 4, Simple Form<\/a> will probably execute:<\/p>\n

    \nconfig.eager_load_namespaces << SimpleForm\n<\/pre>\n

    Rails will invoke eager_load!<\/code> on all objects in the config.eager_load_namespaces<\/code> list whenever config.eager_load<\/code> is set to true. For Simple Form, eager loading will load inputs, form builder and others on boot, instead of loading when they are first used in production.<\/p>\n

    The idea of registering namespaces (and not blocks) is that a user should be able to remove a namespace from the list if it is causing problems or if they don't really need to eager load it.<\/p>\n

    Rails engines and applications will be automatically added to config.eager_load_namespaces<\/code>. This is because engines and applications rely on the Rails autoload for everything inside the app<\/code> directory, which is not thread-safe and should always run on production.<\/p>\n

    As an extra, Rails will also provide a convenience module called ActiveSupport::Autoload<\/code> to make it easier to define which modules are auto and eager loaded. We'll see how to use it next.<\/p>\n

    The recipe<\/h3>\n

    In order to make your libraries eager load ready, here is an easy recipe:<\/p>\n

    1) Don't worry about app<\/code>. Everything in app<\/code> is automatically taken care by Rails applications and engines since they are always added to config.eager_load_namespaces<\/code>;<\/p>\n

    2) If you only use require<\/code> inside lib<\/code>, you are good to go! This is the recommended for Rails applications and small libraries. However, if your library is considerably big, you may want to use Ruby autoload (next step) to avoid loading your library up-front on boot (which affects rake tasks, development, etc).<\/p>\n

    3) If you are using autoload in lib<\/code>, you need to eager load your code. In order to do that, you can use ActiveSupport::Autoload<\/code> to annotate which modules to eager load. For example, here is the SimpleForm<\/code> module before being eager load ready:<\/p>\n

    \nmodule SimpleForm\n  autoload :Components,        'simple_form\/components'\n  autoload :ErrorNotification, 'simple_form\/error_notification'\n  autoload :FormBuilder,       'simple_form\/form_builder'\n  autoload :Helpers,           'simple_form\/helpers'\n  autoload :I18nCache,         'simple_form\/i18n_cache'\n  autoload :Inputs,            'simple_form\/inputs'\n  autoload :MapType,           'simple_form\/map_type'\n  autoload :Wrappers,          'simple_form\/wrappers'\nend\n<\/pre>\n

    And now eager load ready:<\/p>\n

    \nmodule SimpleForm\n  extend ActiveSupport::Autoload\n  \n  # Helpers are modules that are included on inputs.\n  # Usually we don't need to worry about eager loading\n  # modules because we will usually eager load a class\n  # that uses the module.\n  autoload :Helpers\n  \n  # Wrappers are loaded on boot, it is part of Simple Form\n  # configuration API, so no need to worry about them.\n  autoload :Wrappers\n  \n  # We need to eager load these otherwise they will\n  # be rendered the first time just when a form is\n  # first rendered.\n  eager_autoload do\n    # Notice ActiveSupport::Autoload allows us to skip\n    # the file name as it guesses it from the module.\n    autoload :Components\n    autoload :ErrorNotification\n    autoload :FormBuilder\n    autoload :I18nCache\n    autoload :Inputs\n    autoload :MapType\n  end\n  \n  # ActiveSupport::Autoload automatically defines\n  # an eager_load! method. In this case, we are\n  # extending the method to also eager load the\n  # inputs and components inside Simple Form.\n  def self.eager_load!\n    super\n    SimpleForm::Inputs.eager_load!\n    SimpleForm::Components.eager_load!\n  end\nend\n<\/pre>\n

    The comments above explain part of the decision process taken in choosing if a module should be eager loaded or not. To be clear, the question you should be asking is: \"if I don't eager load this module, when will it be autoloaded\"? If the answer is: \"during a request\", you have to eager load it.<\/p>\n

    This is the reason we don't need to eager load something like ActionController::Base<\/code>. Because Rails already eager loads your models, controllers, etc and if you are actually using Action Controller, you will have a controller inheriting from ActionController::Base<\/code> which then forces it to be loaded on boot.<\/p>\n

    Similar reasoning applies to most of Active Model modules. There is no need to eager load ActiveModel::Validations<\/code> because if an application is using it, it will load a framework or a model that actually requires it on boot. You will find that this reasoning will probably apply to most modules in your library.<\/p>\n

    After defining your eager loads, all you need to do is to define a Railtie and include your eager load namespace in it:<\/p>\n

    \nmodule SimpleForm\n  class Railtie < Rails::Railtie\n    config.eager_load_namespaces << SimpleForm\n  end\nend\n<\/pre>\n

    Summing up<\/h3>\n

    Although Ruby is moving to a direction where autoload is threadsafe, which makes eager load not a requirement, improvements to the language, garbage collectors and web-servers make eager load a \"nice to have\" feature.<\/p>\n

    Today, it is very likely that you want to eager load the majority of your code when deploying applications to production. Although Rails has always been eager load friendly, the majority of the tools in the ecosystem are not. Rails 4 will change this panorama by not only giving flexible configuration options to app developers but also convenient abstraction for library developers.<\/p>\n

    The general recommendation is to use require<\/code> to load code inside the lib<\/code> directory if the project is small. Engines and applications do not need to worry about the app<\/code> directory, since it is automatically taken care by Rails.<\/p>\n

    The more complex libraries probably already use autoload in the lib<\/code> directory to avoid loading unnecessary amount of code in development. In such cases they need to provide custom eager_load!<\/code> instructions for productions environment, which can be done with the help of the recipe above and Rails modules.<\/p>\n

    This blog post ended up too long but the subject is important and should be considered with care by the authors who maintain the most used libraries out there. Cheers to them!<\/p>\n","protected":false},"excerpt":{"rendered":"

    A couple weeks ago, Aaron Patterson (aka @tenderlove) wrote about getting rid of config.threadsafe! on Rails 4. When discussing multi-process and multi-threaded servers in production, one important aspect of the discussion that came up in the blog post was code loading. This blog post is about which code loading strategies exist in a Rails application, … \u00bb<\/a><\/p>\n","protected":false},"author":4,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"ngg_post_thumbnail":0,"footnotes":""},"categories":[1],"tags":[190,7],"aioseo_notices":[],"jetpack_sharing_enabled":true,"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/blog.plataformatec.com.br\/wp-json\/wp\/v2\/posts\/3043"}],"collection":[{"href":"https:\/\/blog.plataformatec.com.br\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.plataformatec.com.br\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.plataformatec.com.br\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.plataformatec.com.br\/wp-json\/wp\/v2\/comments?post=3043"}],"version-history":[{"count":8,"href":"https:\/\/blog.plataformatec.com.br\/wp-json\/wp\/v2\/posts\/3043\/revisions"}],"predecessor-version":[{"id":3057,"href":"https:\/\/blog.plataformatec.com.br\/wp-json\/wp\/v2\/posts\/3043\/revisions\/3057"}],"wp:attachment":[{"href":"https:\/\/blog.plataformatec.com.br\/wp-json\/wp\/v2\/media?parent=3043"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.plataformatec.com.br\/wp-json\/wp\/v2\/categories?post=3043"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.plataformatec.com.br\/wp-json\/wp\/v2\/tags?post=3043"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}