A couple weeks ago, Aaron Patterson (aka @tenderlove) wrote about getting rid of config.threadsafe! on Rails 4. When discussing multi-process and multi-threaded servers in production, one important aspect of the discussion that came up in the blog post was code loading.
This blog post is about which code loading strategies exist in a Rails application, how they affect multi-process and multi-threaded servers, and how to build libraries that will behave well in both environments.
Require
The most common form of loading code in Ruby is via requires.
require "some/library" |
Some libraries work great by simply using requires, but as they grow, some of them tend to rely on autoload techniques to avoid loading all up-front. autoload is particularly important on Rails applications because it helps boot time to stay low in development and test environments, since we will load modules as we need them.
Ruby autoload
One of the strategies used by Rails and libraries is Ruby autoload:
module Foo autoload :Bar, "path/to/bar" end |
Now, the first time Foo::Bar is accessed, it will be automatically loaded. The issue with this approach is that it is not thread-safe, except for latest JRuby versions (since 1.7) and Ruby master (2.0).
This means that eager loading these modules on boot (i.e. loading it all up-front) is essential for multi-threaded production servers, otherwise it could lead to scenarios where your application tries to access a module that does not exist yet when it just booted.
In multi-process servers, we may or may not want to eager load them. For instance, if your web server uses fork to handle each request (like Unicorn, Passenger), loading them up-front will become more important as we move towards Ruby 2.0 which will provide copy-on-write semantics. In such cases, if we load Foo::Bar on boot, we will have one copy of Foo::Bar shared between all processes otherwise each process will end up loading its own copy of Foo::Bar possibly leading to higher memory usage.
The downside of eager loading is that we may eventually load code that your application never uses, increasing memory usage. This not a problem if your server uses fork or threads, it is exactly what we want (share and load the most we can), but in case you server doesn’t (for example, Thin) nothing bad is going to happen.
Rails autoload
Another strategy used to load code is the one provided by ActiveSupport::Dependencies. In this mode, you don’t need to specify which module to load and its file, but instead Rails tracks a series of directories and tries to load missing constants based on it.
For instance, when you first access the constant User, Rails tries to find it in many directories, including app/models in your application. If a user.rb exists in it, it is loaded.
This approach is not thread safe, so regardless of using Ruby master or JRuby for a threaded server, it needs to be eager loaded. For multi-process servers, the trade-offs are the same as in the previous section.
So, eager loading
In other words, if you are using Unicorn, Puma, Passenger or any JRuby server, you probably want to eager load. Geez, so it basically means that most of us probably do want to eager load.
Although Rails has been taking care of eager loading its frameworks (defined using Ruby autoload) and its application and engines (defined using Rails autoload), the Rails ecosystem has not taken care of eager loading its libraries. This blame is shared with Rails, which could have provided better mechanisms to do so.
In order to better understand the problem, let’s take a template engine like HAML as an example. If the template engine uses Ruby autoload and it is not eager loaded, one of these things will happen:
-
On multi-threaded servers, it won’t be thread-safe depending on the Ruby version, leading to crashes on the first requests;
-
On multi-process servers with fork, HAML will have to be loaded on every request, unnecessarily taking response time and memory on each process instance;
-
On multi-process servers without fork, it will be loaded when its needed the first time, without extra-hassle;
To avoid problems with 1) and 2), Rails 4 will provide tools and mechanisms to make it easier for libraries to eager load code.
First, Rails 4 will ship with a config.eager_load option. Today, eager loading an application is coupled with the config.cache_classes configuration. This means that every time we cache classes, we eager load the app. This is not necessarily the best configuration. For example, in the test environment, an application could benefit by lazily loading the application when running a single test file.
Second, Rails will include a config.eager_load_namespaces option to allow libraries to register namespaces that Rails should eager load. For instance, for Rails 4, Simple Form will probably execute:
config.eager_load_namespaces << SimpleForm |
Rails will invoke eager_load! on all objects in the config.eager_load_namespaces list whenever config.eager_load is set to true. For Simple Form, eager loading will load inputs, form builder and others on boot, instead of loading when they are first used in production.
The idea of registering namespaces (and not blocks) is that a user should be able to remove a namespace from the list if it is causing problems or if they don’t really need to eager load it.
Rails engines and applications will be automatically added to config.eager_load_namespaces. This is because engines and applications rely on the Rails autoload for everything inside the app directory, which is not thread-safe and should always run on production.
As an extra, Rails will also provide a convenience module called ActiveSupport::Autoload to make it easier to define which modules are auto and eager loaded. We’ll see how to use it next.
The recipe
In order to make your libraries eager load ready, here is an easy recipe:
1) Don’t worry about app. Everything in app is automatically taken care by Rails applications and engines since they are always added to config.eager_load_namespaces;
2) If you only use require inside lib, you are good to go! This is the recommended for Rails applications and small libraries. However, if your library is considerably big, you may want to use Ruby autoload (next step) to avoid loading your library up-front on boot (which affects rake tasks, development, etc).
3) If you are using autoload in lib, you need to eager load your code. In order to do that, you can use ActiveSupport::Autoload to annotate which modules to eager load. For example, here is the SimpleForm module before being eager load ready:
module SimpleForm autoload :Components, 'simple_form/components' autoload :ErrorNotification, 'simple_form/error_notification' autoload :FormBuilder, 'simple_form/form_builder' autoload :Helpers, 'simple_form/helpers' autoload :I18nCache, 'simple_form/i18n_cache' autoload :Inputs, 'simple_form/inputs' autoload :MapType, 'simple_form/map_type' autoload :Wrappers, 'simple_form/wrappers' end |
And now eager load ready:
module SimpleForm extend ActiveSupport::Autoload # Helpers are modules that are included on inputs. # Usually we don't need to worry about eager loading # modules because we will usually eager load a class # that uses the module. autoload :Helpers # Wrappers are loaded on boot, it is part of Simple Form # configuration API, so no need to worry about them. autoload :Wrappers # We need to eager load these otherwise they will # be rendered the first time just when a form is # first rendered. eager_autoload do # Notice ActiveSupport::Autoload allows us to skip # the file name as it guesses it from the module. autoload :Components autoload :ErrorNotification autoload :FormBuilder autoload :I18nCache autoload :Inputs autoload :MapType end # ActiveSupport::Autoload automatically defines # an eager_load! method. In this case, we are # extending the method to also eager load the # inputs and components inside Simple Form. def self.eager_load! super SimpleForm::Inputs.eager_load! SimpleForm::Components.eager_load! end end |
The comments above explain part of the decision process taken in choosing if a module should be eager loaded or not. To be clear, the question you should be asking is: “if I don’t eager load this module, when will it be autoloaded”? If the answer is: “during a request”, you have to eager load it.
This is the reason we don’t need to eager load something like ActionController::Base. Because Rails already eager loads your models, controllers, etc and if you are actually using Action Controller, you will have a controller inheriting from ActionController::Base which then forces it to be loaded on boot.
Similar reasoning applies to most of Active Model modules. There is no need to eager load ActiveModel::Validations because if an application is using it, it will load a framework or a model that actually requires it on boot. You will find that this reasoning will probably apply to most modules in your library.
After defining your eager loads, all you need to do is to define a Railtie and include your eager load namespace in it:
module SimpleForm class Railtie < Rails::Railtie config.eager_load_namespaces << SimpleForm end end |
Summing up
Although Ruby is moving to a direction where autoload is threadsafe, which makes eager load not a requirement, improvements to the language, garbage collectors and web-servers make eager load a “nice to have” feature.
Today, it is very likely that you want to eager load the majority of your code when deploying applications to production. Although Rails has always been eager load friendly, the majority of the tools in the ecosystem are not. Rails 4 will change this panorama by not only giving flexible configuration options to app developers but also convenient abstraction for library developers.
The general recommendation is to use require to load code inside the lib directory if the project is small. Engines and applications do not need to worry about the app directory, since it is automatically taken care by Rails.
The more complex libraries probably already use autoload in the lib directory to avoid loading unnecessary amount of code in development. In such cases they need to provide custom eager_load! instructions for productions environment, which can be done with the help of the recipe above and Rails modules.
This blog post ended up too long but the subject is important and should be considered with care by the authors who maintain the most used libraries out there. Cheers to them!
Tags: eager load, rails
Posted in English | 8 Comments »
Aqui na Plataformatec eu trabalho lado-a-lado com desenvolvedores (back/front-end), designers e gerentes de projetos. Francamente, é difícil saber onde termina o trabalho de um e onde começa o trabalho do outro. Existe muita sobreposição de conhecimentos entre os membros das nossas equipes e isso é muito bom! Nós realmente acreditamos no poder das equipes multi-disciplinares e os nossos resultados nos convencem cada vez mais disso.

A capa do livro “HTML5 e CSS3: Domine a web do futuro”
E a melhor evidência de que eu também acredito em multi-disciplinaridade é o meu recém lançado livro: “HTML5 e CSS3: Domine a web do futuro” publicado pela Casa do Código. Sou desenvolvedor há algum tempo e sempre me interessei por front-end. No começo eu tive dúvidas em qual assunto me aprofundar, mas entre escolher um caminho ou outro acabei optando em aprofundar em ambos. Confesso que eu não me arrependo em nada da minha escolha e estou muito contente em ter aprendido bastante ao longo do caminho e hoje eu poder compartilhar o que aprendi em HTML e CSS através deste livro.
Escrevi este livro pensando em desenvolvedores web que desejam aprimorar os seus conhecimentos em front-end e profissionais que estejam dedicando exclusivamente a estes assuntos. Aprendi muitas coisas ao ‘fazer’, por isso eu tentei replicar este tipo de aprendizado no livro com vários tutoriais e lições sobre técnicas clássicas de CSS e HTML (utilizando float, pseudo elementos e outros) e combinações de CSS3 para componentes visuais.
Para quem se interessar, o livro está a venda online na Casa do Código (versões impressa, .mobi, .epub e .pdf).
Plataformatec na QCon SP
Aproveito para fazer o convite para assitirem o meu lightning talk sobre CSS3 Transitions e a palestra do José Valim “Uma abordagem moderna para programação na Erlang VM: Elixir” na QCon SP, neste final de semana, no dia 04 de Agosto.
Fora a minha palestra, eu estarei no stand da Casa do Código (lá na QCon). Passe por lá para bater um papo. Será um prazer!
Tags: eventos, front-end, livro, qcon
Posted in Português | 2 Comments »
Besides the big and shiny features that Rails 4 holds, there’s a lot of small improvements on several other sections of the Rails framework – helpers, core extensions, app configurations and more – that might not even hit the Changelogs but will somehow make our lifes easier in the future. One of these hidden gems that I’ve found recently is an improvement on the content_for helper to flush and replace previous chunks of HTML with new ones.
The content_for that we are used to
The content_for method is an old friend of every Rails developer, and it’s a pretty simple and flexible helper. You can store a chunk of HTML from a String or a block, and grab it somewhere else in your views or yield it directly into your templates. It’s a pretty handy trick to move data from your views into your layouts, like page titles, custom meta tags or specific script tags that your page needs to include.
# On your 'application.html.erb' layout, inside the '<head>' tag. <%= yield :metatags %> # Then, into a specific view <% content_for :metatags do %> <meta property="og:image" content="http://example.com/image.jpg" /> <% end %> |
Multiple calls of the content_for helper using the same identifier will concatenate them and output them together when you read it back on your views, as:
<% content_for :example, "This will be rendered" %> <% content_for :example do %> <h1>This will be rendered too!</h1> <% end %> |
On some scenarios this behavior might not be desired, and with Rails 4 you can flush out the stored pieces of an identifier and replace it instead of adding more content to it: using the flush: true option. The first implementation used an extra true argument, but we changed to use a Hash instead, so the flush key can express better the behavior we’re expecting.
<% content_for :example, "This will be rendered" %> <% content_for :example, flush: true do %> <h1>But this will override everything on the ':example' block.</h1> <% end %> |
The gallery situation
I’ve stumbled upon this on a recent project, where we had a somewhat classic scenario: a partial named _gallery, responsible for rendering the piece of HTML to display a gallery of images that also supplies a content_for block with a script tag to include the required libraries to put the gallery to work.
<section class="gallery"> <!-- a truckload of HTML tags --> </section> <% content_for :scripts, javascript_include_tag('gallery') %> |
It works like a charm. But with an updated requirement we had the case where multiple galleries could be present on the same page, rendering the _gallery partial several times. The required HTML would be present, but the gallery.js script would be included multiple times into the rendered page. Instead of working this out using instance variables to check that the partial was rendered at least once, we could let Rails do all the hard work for us, using the flush option when including the gallery.js script.
<section class="gallery"> <!-- a truckload of HTML tags --> </section> <% # We can render this partial several times and this script will be included just once %> <% content_for :scripts, javascript_include_tag('gallery'), flush: true %> |
Back to the present: Rails 3.2
Well, while this seems to be a perfect solution to my problem, this feature isn’t available on Rails 3.2 or on the 3-2-stable branch – it’s only available on the master branch that will be released with Rails 4. But, backporting this feature into a 3.x application is pretty simple, using a helper of your own.
def single_content_for(name, content = nil, &block) @view_flow.set(name, ActiveSupport::SafeBuffer.new) content_for(name, content, &block) end |
After some source diving into the ActionPack source code we’re done – it just needs to replace any present content with a brand new SafeBuffer instance before storing the piece of HTML.
What do you think about this little addition to Rails 4? Can you think of a similar problem that could be solved with this instead of a custom hack?
Tags: actionpack, open source, rails 4
Posted in English | 8 Comments »
I’d like to start with a question: Have you ever seen code like this?
class User < ActiveRecord::Base end User.new.tap do |user| user.name = "John Doe" user.username = "john.doe" user.password = "john123" end |
I have. But what few developers know is that many methods in Active Record already accept a block, so you don’t need to invoke tap in the first place. And that’s all because Active Record loves blocks! Let’s go through some examples.
Using blocks with Active Record
When creating an Active Record object, either by using new or create/create!, you can give a block straight to the method call instead of relying on tap:
User.new do |user| user.name = "John Doe" user.username = "john.doe" user.password = "john123" end User.create do |user| user.name = "John Doe" user.username = "john.doe" user.password = "john123" end |
And you can mix and match with hash initialization:
User.new(name: "John Doe") do |user| user.username = "john.doe" user.password = "john123" end |
All these methods, when receiving a block, yield the current object to the block so that you can do whatever you want with it. It’s basically the same effect as using tap. And it all happens after the attributes hash have been assigned and other internal Active Record code has been run during the object initialization, except by the after_initialize callbacks.
That’s neat. That means we can stop using tap in a few places now. But wait, there’s more.
Active Record associations also love blocks
We talked about using blocks when building an Active Record object using new or create, but associations like belongs_to or has_many also work with that, when calling build or create on them:
class User < ActiveRecord::Base has_many :posts end class Post < ActiveRecord::Base belongs_to :user end # has_many user = User.first user.posts.build do |post| post.title = "Active Record <3 Blocks" post.body = "I can give tap a break! <3 <3 <3" end # belongs_to post = Post.first post.build_user do |user| user.name = "John Doe <3 blocks" user.username = "john.doe" user.password = "john123" end |
That’s even better. That means we can stop using tap in a few more places.
Wrapping up: Active Record <3 blocks
It is possible to avoid extra work, sometimes simple stuff such as using tap with methods like new and create, other times more complicated ones, by getting to know what the framework can give us for free.
There are other places inside Active Record that accept blocks, for instance first_or_initialize and friends will execute the given block when the record is not found, to initialize the new one.
In short, next time you need a block when creating records using Active Record, take a minute to see if you can avoid using tap by using an already existing feature. Remember: Active Record <3 blocks. And don’t do that with blocks only, the main idea here is that you can learn more about the framework, and let it do more work for you.
How about you, do you have any small trick in Ruby or Rails that makes your work easier? Take a minute to share it with others in the comments.
Tags: activerecord, blocks, open source, rails
Posted in English | 8 Comments »
Update 08/13/2012
Since the new deprecation policy the composed_of removal was reverted. We are still discussing the future of this feature.
The reason
A few days ago we started a discussion about the removal of the composed_of method from Active Record. I started this discussion because when I was tagging every Rails issue according to its framework, I found some of them related to composed_of, that are not only hard to fix but would add more complexity for a feature that, in my opinion, is not adding any visible value to the framework.
In this presentation, Aaron Patterson talks about three types of features: Cosmetic, Refactoring, and Course Correction. Aaron defines cosmetic features as a feature that adds dubious value and unknown debt (I highly recommend that you watch the whole presentation to see more about these types of features). This is exactly what I think about composed_of. At this time this feature is adding more debt than value to the Rails code base and applications, so the Rails team have decided to remove this method.
The plan
The removal plan is simple, deprecate in 3.2 and remove in 4.0. This means that you need to stop using this feature and implement it in another way.
The Rails team have chosen this path because this feature can be implemented using plain ruby methods for getters and setters. You will see how in the next section.
Implementation
In the simplest case, when you have only one attribute and needs to instantiate an object with the value of this attribute, you can use the serialize feature with a custom serializer:
class MoneySerializer def dump(money) money.value end def load(value) Money.new(value) end end class Account < ActiveRecord::Base serialize :balance, MoneySerializer.new end |
To use it with multiple attributes you can do the following:
class Person < ActiveRecord::Base def address @address ||= Address.new(address_street, address_city) end def address=(address) self[:address_street] = address.street self[:address_city] = address.city @address = address end end |
Benefits for Rails developers
I already talked about what this removal can provide to Rails maintainers, but what benefits does it bring to Rails developers?
I think that the best advantages are:
- It is easier to test the composite objects;
- It is easier to understand the lazy methods;
- It is easier to customize it without resorting to options like
:converter,:constructorand:allow_nil.
Wrapping up
I strongly recommend that you read the whole discussion in the pull request. You will find more examples and additional information there.
Also, I want to thank @steveklabnik for working on this feature and the awesome work that he has been doing on the Rails Issues Team.
Finally, I want to invite you to help the Rails team to fix, test, and track issues. About half of the issues are related to the Active Record framework and we need to work on them. As a regular Rails contributor and Rails developer, I think there is still a lot we can do to improve the Rails code base, so join us.
Tags: activerecord, composed_of, rails 4
Posted in English | Comments Off
Or, even better, why your web framework should not adopt a CGI-based API.
For the past few years I have been studying and observing the development of different emerging languages closely with a special focus on web frameworks/servers. Unfortunately, most of the new web frameworks are following the Rack/WSGI specification which may be a mistake depending on the platform you are targeting (particularly true for Erlang and Node.js which have very strong streaming foundations and is by default part of their stack).
This blog post is an attempt to detail the limitations in the Rack/CGI-based APIs that the Rails Core Team has found while working with the streaming feature that shipped with Rails 3.1 and why we need better abstractions in the long term.
Case in study
The use case we have in mind here is streaming. In Rails, we have focused on streaming as a way to quickly return the head of the HTML page to the browser, so the browser can start downloading assets (like javascript and stylesheet) while the server is generating the rest of the page. There is a great entry on Rails weblog about streaming in general and a Railscast if you want to focus on how to use it in your Rails applications. However, streaming is not only limited to HTML responses and can also be useful in API endpoints, for example, to stream search results as they pop-up or to synchronize with mobile devices.
The Rack specification in a nutshell
In Rack, the response is made by an array with three elements: [status, headers, body]
The body can be any object that responds to the method each. This means streaming can be done by passing an object that will, for example, lazily read a file and stream chunks when each is called.
A Rack application is any object that implements the method call and receives an environment hash/dictionary with the request information. When I said above that most new web frameworks are following the Rack specification, is because they are introducing an API similar to this one just described.
The issue
In order to understand the issue, we will consider three entities: the client, the server and the application. The client (for example a browser) sends a request to the server which forwards it to an application. In this case, the server and the application are communicating via the Rack API.
The issue in streaming cases is that, sending a response back from the application does not mean the application finished processing. For example, consider a middleware (a middleware is an object that sits in between the server and our application) that opens up a connection to the database for the duration of the request and cleans it afterwards:
def call(env) connection = DB.checkout_connection env["db.connection"] = connection @app.call(env) ensure DB.checkin_connection connection end |
Without streaming, it would work as follow:
- The server receives a request and passes it down the stack
- The request reaches the middleware
- The middleware checks out the connection
- The application is invoked, renders a view accessing the database using the connection and returns the rendered view as a string
- The middleware checks the connection back in
- The response is sent back to the client
With streaming, this would happen:
- The server receives a request and passes it down the stack
- The request reaches the middleware
- The middleware checks out the connection
- The app is called but does not render anything. Instead it returns a lazy object as response body that will stream the HTML page in chunks as the `each` method is called
- The middleware checks the connection back in
- Back in the server, we have received the lazy body and will start to stream it
- While streaming the body, since the body is lazily calculated, now is the time it must access the database. But, since the middleware already checked the connection back in, our code will fail with a “not connected” exception
The first reaction to solve this issue is to ensure that all streaming happens inside the application, i.e. the application would have a mechanism to stream the response back and only when it is done it would return the Rack response back. However, if the application does this, any middleware that desires to modify the header or the response body won’t be able to do so because the response was already streamed from inside the application.
Our work-around for Rails was to create proxies that wrap the response body:
def call(env) connection = DB.checkout_connection env["db.connection"] = connection response = @app.call(env) ResponseProxy.new(response).on_close do DB.checkin_connection connection end end |
However, this is inefficient and extremely limited (not all middleware can be converted to such approach). In order for streaming to be successful, the underlying server API needs to acknowledge that the headers and the response body can be sent at different times. Not only that, it needs to provide proper callbacks around the response lifecycle (before sending headers, when the response is closed, on each stream, etc).
The trade-off here is that this can no longer be achieved with an easy API as Rack’s. In general, we would like to have a response objects that provides several life-cycle hooks. For example, the middleware above could be rewritten as:
def call(request, response) connection = DB.checkout_connection request.env["db.connection"] = connection response.on_close { DB.checkin_connection(connection) } @app.call(request, response) end |
The Java Servlet specification is a good example of how request and response objects could be designed to provide such hooks.
Other middleware
In the example above I focused on the database connection middleware but this limitation exists, in one way or the other, in the majority of middleware in a stack. For example, a middleware that rescues any exception inside the application to render a 500 page also needs to be adapted. Other middleware simply won’t work. For instance, Rails ships with a middleware that provides an ETag header based on the body which has to be disabled when streaming.
Looking back
Does this mean moving to Rack was a mistake? Not at all. Rack appeared when the web development Ruby community was fragmented and the simplicity of the Rack API made it possible to unify the different web frameworks and web servers available. Looking back, I would take the standardization provided by Rack any day regardless of the limitations it brings. Now that we have a standard, we are already working on addressing such issues, which leads us to…
Looking forward
Streaming will become more and more important. While working with HTML streaming requires special attention both technically and also in terms of usability, as outlined in Rails’ documentation, API endpoints could benefit from it with basically no extra cost. Not only that, features in HTML5 like server-sent events could easily be built on top of streaming without requiring a specific end-point in your application to handle them.
While CGI was originally streaming friendly, the abstractions we built on top of it (like middleware) are not. I believe web frameworks should be moving towards better connection/socket abstractions and away from the old CGI-based APIs, which served us well but it is finally time for us to let it go.
PS: Thanks to Aaron Patterson (who has also written about this issue in his blog), Yehuda Katz, Konstantin Haase and James Tucker for early review and feedback.
F.A.Q.
This section was added after the blog post was released based on some common questions.
Q: Isn’t it a bad idea to mix both streaming and non-streaming behavior in the same stack?
That depends on the stack. This is definitely not an issue with Erlang and Node.js since both stacks are streaming based. In Ruby, I believe a threaded jRuby or Thin will allow you to get away with keeping a socket open waiting for responses, but it will probably turn out to be a bad idea with other servers since the process holding the socket won’t be able to respond to any other request.
Q: Is there a need to do everything streaming based when a request/response would be fine?
No, there is no need. The point of the blog post is not to advocate for streaming only frameworks, but simply state that a Rack API may severely limit your streaming capability in case your platform supports it. Personally, I would like to be able to choose and mix both, if my stack allows me to do so.
Tags: cgi, rack, rails, streaming
Posted in English | 14 Comments »

All
English only
Em português apenas