Posts tagged "rails"

Last May we happily announced that Rafael França and Carlos Antonio earned commit access to the Ruby on Rails repository – it was a great accomplishment that deserved its own blog post. Today, we have some great news and we want to share with our readers. Just a few days ago, our team mate Rafael França (@rafaelfranca) received an invitation to join the Rails Core Team. And, of course, he accepted!

Rafael contributed to different features coming up in the next Rails 4 release and worked extensively with other Rails contributors to smash bugs and provide many improvements to the framework. I am personally happy to have a friend joining me in the Rails Core Team.

Furthermore, Rafael França’s contributions go beyond Rails. Lately he has also contributed to many other open source projects, like Mongoid, Janky and Dalli besides Plataformatec’s own projects like Devise, Simple Form and Elixir.

Rafael, congratulations! :D

From all your friends at Plataformatec.

Nos dias 30 e 31 de Agosto de 2012, aconteceu o maior evento de Ruby da América Latina: a RubyConf Brasil, e a Plataformatec marcou presença com palestras e lightning talks. O evento foi um sucesso, com mais de 750 participantes durante a conferência, e mais de 500 pessoas assistindo o evento online através do site da Eventials.

Plataformatec Team

Abaixo você pode ver os temas, com links para os slides e vídeos:

Palestras

Vamos falar sobre Concorrência

Por José Valim. Confira o vídeo.

Escrevendo Aplicações Melhores com Active Model

Por Carlos Antonio. Confira o vídeo.

Conhecendo as Entranhas do Rails

Por Rafael França. Confira o vídeo.

Lightning Talks

Contribuindo para o Rails

Por Carlos Galdino.

I18nAlchemy

Por Lucas Mazza.

Copyright, Licenças Open Source e você!

Por George Guimarães.

Confira o vídeo das Lightning Talks.

Sinta-se à vontade para ver e rever os slides e vídeos das palestras, e nos passar seu feedback através dos comentários. Não deixe também de conferir as outras palestras disponíveis, e se você escreveu um post sobre o evento em seu blog, adoraríamos ver um comentário com um link compartilhando seu post.

Gostaríamos também de agradecer e parabenizar o Fábio Akita e a Locaweb pela ótima organização e alta qualidade do evento, tudo funcionou perfeitamente para que todos pudessem aproveitar ao máximo a conferência.

E nos vemos na RubyConf Brasil 2013!

Nos próximos dias 30 e 31 de agosto, estaremos presentes na Ruby Conf Brasil com três palestrantes!

No dia 31, às 9h40, José Valim falará na Sala 1 sobre concorrência e sobre o papel e a importância disso no desenvolvimento de aplicações.

Às 11h, na sala 2, Carlos Antonio falará sobre como usar Active Model para escrevermos aplicações melhores.

Por fim, às 11h50, na sala 1, Rafael França falará sobre as entranhas do Rails e as grandes novidades do Rails 4.

Não deixe também de votar em nossos Lightning Talks, que serão realizados no primeiro dia, a partir das 18h20:

Nos vemos lá!

A couple weeks ago, Aaron Patterson (aka @tenderlove) wrote about getting rid of config.threadsafe! on Rails 4. When discussing multi-process and multi-threaded servers in production, one important aspect of the discussion that came up in the blog post was code loading.

This blog post is about which code loading strategies exist in a Rails application, how they affect multi-process and multi-threaded servers, and how to build libraries that will behave well in both environments.

Require

The most common form of loading code in Ruby is via requires.

require "some/library"

Some libraries work great by simply using requires, but as they grow, some of them tend to rely on autoload techniques to avoid loading all up-front. autoload is particularly important on Rails applications because it helps boot time to stay low in development and test environments, since we will load modules as we need them.

Ruby autoload

One of the strategies used by Rails and libraries is Ruby autoload:

module Foo
  autoload :Bar, "path/to/bar"
end

Now, the first time Foo::Bar is accessed, it will be automatically loaded. The issue with this approach is that it is not thread-safe, except for latest JRuby versions (since 1.7) and Ruby master (2.0).

This means that eager loading these modules on boot (i.e. loading it all up-front) is essential for multi-threaded production servers, otherwise it could lead to scenarios where your application tries to access a module that does not exist yet when it just booted.

In multi-process servers, we may or may not want to eager load them. For instance, if your web server uses fork to handle each request (like Unicorn, Passenger), loading them up-front will become more important as we move towards Ruby 2.0 which will provide copy-on-write semantics. In such cases, if we load Foo::Bar on boot, we will have one copy of Foo::Bar shared between all processes otherwise each process will end up loading its own copy of Foo::Bar possibly leading to higher memory usage.

The downside of eager loading is that we may eventually load code that your application never uses, increasing memory usage. This not a problem if your server uses fork or threads, it is exactly what we want (share and load the most we can), but in case you server doesn’t (for example, Thin) nothing bad is going to happen.

Rails autoload

Another strategy used to load code is the one provided by ActiveSupport::Dependencies. In this mode, you don’t need to specify which module to load and its file, but instead Rails tracks a series of directories and tries to load missing constants based on it.

For instance, when you first access the constant User, Rails tries to find it in many directories, including app/models in your application. If a user.rb exists in it, it is loaded.

This approach is not thread safe, so regardless of using Ruby master or JRuby for a threaded server, it needs to be eager loaded. For multi-process servers, the trade-offs are the same as in the previous section.

So, eager loading

In other words, if you are using Unicorn, Puma, Passenger or any JRuby server, you probably want to eager load. Geez, so it basically means that most of us probably do want to eager load.

Although Rails has been taking care of eager loading its frameworks (defined using Ruby autoload) and its application and engines (defined using Rails autoload), the Rails ecosystem has not taken care of eager loading its libraries. This blame is shared with Rails, which could have provided better mechanisms to do so.

In order to better understand the problem, let’s take a template engine like HAML as an example. If the template engine uses Ruby autoload and it is not eager loaded, one of these things will happen:

  1. On multi-threaded servers, it won’t be thread-safe depending on the Ruby version, leading to crashes on the first requests;

  2. On multi-process servers with fork, HAML will have to be loaded on every request, unnecessarily taking response time and memory on each process instance;

  3. On multi-process servers without fork, it will be loaded when its needed the first time, without extra-hassle;

To avoid problems with 1) and 2), Rails 4 will provide tools and mechanisms to make it easier for libraries to eager load code.

First, Rails 4 will ship with a config.eager_load option. Today, eager loading an application is coupled with the config.cache_classes configuration. This means that every time we cache classes, we eager load the app. This is not necessarily the best configuration. For example, in the test environment, an application could benefit by lazily loading the application when running a single test file.

Second, Rails will include a config.eager_load_namespaces option to allow libraries to register namespaces that Rails should eager load. For instance, for Rails 4, Simple Form will probably execute:

config.eager_load_namespaces << SimpleForm

Rails will invoke eager_load! on all objects in the config.eager_load_namespaces list whenever config.eager_load is set to true. For Simple Form, eager loading will load inputs, form builder and others on boot, instead of loading when they are first used in production.

The idea of registering namespaces (and not blocks) is that a user should be able to remove a namespace from the list if it is causing problems or if they don’t really need to eager load it.

Rails engines and applications will be automatically added to config.eager_load_namespaces. This is because engines and applications rely on the Rails autoload for everything inside the app directory, which is not thread-safe and should always run on production.

As an extra, Rails will also provide a convenience module called ActiveSupport::Autoload to make it easier to define which modules are auto and eager loaded. We’ll see how to use it next.

The recipe

In order to make your libraries eager load ready, here is an easy recipe:

1) Don’t worry about app. Everything in app is automatically taken care by Rails applications and engines since they are always added to config.eager_load_namespaces;

2) If you only use require inside lib, you are good to go! This is the recommended for Rails applications and small libraries. However, if your library is considerably big, you may want to use Ruby autoload (next step) to avoid loading your library up-front on boot (which affects rake tasks, development, etc).

3) If you are using autoload in lib, you need to eager load your code. In order to do that, you can use ActiveSupport::Autoload to annotate which modules to eager load. For example, here is the SimpleForm module before being eager load ready:

module SimpleForm
  autoload :Components,        'simple_form/components'
  autoload :ErrorNotification, 'simple_form/error_notification'
  autoload :FormBuilder,       'simple_form/form_builder'
  autoload :Helpers,           'simple_form/helpers'
  autoload :I18nCache,         'simple_form/i18n_cache'
  autoload :Inputs,            'simple_form/inputs'
  autoload :MapType,           'simple_form/map_type'
  autoload :Wrappers,          'simple_form/wrappers'
end

And now eager load ready:

module SimpleForm
  extend ActiveSupport::Autoload
 
  # Helpers are modules that are included on inputs.
  # Usually we don't need to worry about eager loading
  # modules because we will usually eager load a class
  # that uses the module.
  autoload :Helpers
 
  # Wrappers are loaded on boot, it is part of Simple Form
  # configuration API, so no need to worry about them.
  autoload :Wrappers
 
  # We need to eager load these otherwise they will
  # be rendered the first time just when a form is
  # first rendered.
  eager_autoload do
    # Notice ActiveSupport::Autoload allows us to skip
    # the file name as it guesses it from the module.
    autoload :Components
    autoload :ErrorNotification
    autoload :FormBuilder
    autoload :I18nCache
    autoload :Inputs
    autoload :MapType
  end
 
  # ActiveSupport::Autoload automatically defines
  # an eager_load! method. In this case, we are
  # extending the method to also eager load the
  # inputs and components inside Simple Form.
  def self.eager_load!
    super
    SimpleForm::Inputs.eager_load!
    SimpleForm::Components.eager_load!
  end
end

The comments above explain part of the decision process taken in choosing if a module should be eager loaded or not. To be clear, the question you should be asking is: “if I don’t eager load this module, when will it be autoloaded”? If the answer is: “during a request”, you have to eager load it.

This is the reason we don’t need to eager load something like ActionController::Base. Because Rails already eager loads your models, controllers, etc and if you are actually using Action Controller, you will have a controller inheriting from ActionController::Base which then forces it to be loaded on boot.

Similar reasoning applies to most of Active Model modules. There is no need to eager load ActiveModel::Validations because if an application is using it, it will load a framework or a model that actually requires it on boot. You will find that this reasoning will probably apply to most modules in your library.

After defining your eager loads, all you need to do is to define a Railtie and include your eager load namespace in it:

module SimpleForm
  class Railtie < Rails::Railtie
    config.eager_load_namespaces << SimpleForm
  end
end

Summing up

Although Ruby is moving to a direction where autoload is threadsafe, which makes eager load not a requirement, improvements to the language, garbage collectors and web-servers make eager load a “nice to have” feature.

Today, it is very likely that you want to eager load the majority of your code when deploying applications to production. Although Rails has always been eager load friendly, the majority of the tools in the ecosystem are not. Rails 4 will change this panorama by not only giving flexible configuration options to app developers but also convenient abstraction for library developers.

The general recommendation is to use require to load code inside the lib directory if the project is small. Engines and applications do not need to worry about the app directory, since it is automatically taken care by Rails.

The more complex libraries probably already use autoload in the lib directory to avoid loading unnecessary amount of code in development. In such cases they need to provide custom eager_load! instructions for productions environment, which can be done with the help of the recipe above and Rails modules.

This blog post ended up too long but the subject is important and should be considered with care by the authors who maintain the most used libraries out there. Cheers to them!

I’d like to start with a question: Have you ever seen code like this?

class User < ActiveRecord::Base
end
 
User.new.tap do |user|
  user.name     = "John Doe"
  user.username = "john.doe"
  user.password = "john123"
end

I have. But what few developers know is that many methods in Active Record already accept a block, so you don’t need to invoke tap in the first place. And that’s all because Active Record loves blocks! Let’s go through some examples.

Using blocks with Active Record

When creating an Active Record object, either by using new or create/create!, you can give a block straight to the method call instead of relying on tap:

User.new do |user|
  user.name     = "John Doe"
  user.username = "john.doe"
  user.password = "john123"
end
 
User.create do |user|
  user.name     = "John Doe"
  user.username = "john.doe"
  user.password = "john123"
end

And you can mix and match with hash initialization:

User.new(name: "John Doe") do |user|
  user.username = "john.doe"
  user.password = "john123"
end

All these methods, when receiving a block, yield the current object to the block so that you can do whatever you want with it. It’s basically the same effect as using tap. And it all happens after the attributes hash have been assigned and other internal Active Record code has been run during the object initialization, except by the after_initialize callbacks.

That’s neat. That means we can stop using tap in a few places now. But wait, there’s more.

Active Record associations also love blocks

We talked about using blocks when building an Active Record object using new or create, but associations like belongs_to or has_many also work with that, when calling build or create on them:

class User < ActiveRecord::Base
  has_many :posts
end
 
class Post < ActiveRecord::Base
  belongs_to :user
end
 
# has_many
user = User.first
user.posts.build do |post|
  post.title = "Active Record <3 Blocks"
  post.body  = "I can give tap a break! <3 <3 <3"
end
 
# belongs_to
post = Post.first
post.build_user do |user|
  user.name     = "John Doe <3 blocks"
  user.username = "john.doe"
  user.password = "john123"
end

That’s even better. That means we can stop using tap in a few more places.

Wrapping up: Active Record <3 blocks

It is possible to avoid extra work, sometimes simple stuff such as using tap with methods like new and create, other times more complicated ones, by getting to know what the framework can give us for free.

There are other places inside Active Record that accept blocks, for instance first_or_initialize and friends will execute the given block when the record is not found, to initialize the new one.

In short, next time you need a block when creating records using Active Record, take a minute to see if you can avoid using tap by using an already existing feature. Remember: Active Record <3 blocks. And don’t do that with blocks only, the main idea here is that you can learn more about the framework, and let it do more work for you.

How about you, do you have any small trick in Ruby or Rails that makes your work easier? Take a minute to share it with others in the comments. :)

Or, even better, why your web framework should not adopt a CGI-based API.

For the past few years I have been studying and observing the development of different emerging languages closely with a special focus on web frameworks/servers. Unfortunately, most of the new web frameworks are following the Rack/WSGI specification which may be a mistake depending on the platform you are targeting (particularly true for Erlang and Node.js which have very strong streaming foundations and is by default part of their stack).

This blog post is an attempt to detail the limitations in the Rack/CGI-based APIs that the Rails Core Team has found while working with the streaming feature that shipped with Rails 3.1 and why we need better abstractions in the long term.

Case in study

The use case we have in mind here is streaming. In Rails, we have focused on streaming as a way to quickly return the head of the HTML page to the browser, so the browser can start downloading assets (like javascript and stylesheet) while the server is generating the rest of the page. There is a great entry on Rails weblog about streaming in general and a Railscast if you want to focus on how to use it in your Rails applications. However, streaming is not only limited to HTML responses and can also be useful in API endpoints, for example, to stream search results as they pop-up or to synchronize with mobile devices.

The Rack specification in a nutshell

In Rack, the response is made by an array with three elements: [status, headers, body]

The body can be any object that responds to the method each. This means streaming can be done by passing an object that will, for example, lazily read a file and stream chunks when each is called.

A Rack application is any object that implements the method call and receives an environment hash/dictionary with the request information. When I said above that most new web frameworks are following the Rack specification, is because they are introducing an API similar to this one just described.

The issue

In order to understand the issue, we will consider three entities: the client, the server and the application. The client (for example a browser) sends a request to the server which forwards it to an application. In this case, the server and the application are communicating via the Rack API.

The issue in streaming cases is that, sending a response back from the application does not mean the application finished processing. For example, consider a middleware (a middleware is an object that sits in between the server and our application) that opens up a connection to the database for the duration of the request and cleans it afterwards:

def call(env)
  connection = DB.checkout_connection
  env["db.connection"] = connection
  @app.call(env)
ensure
  DB.checkin_connection connection
end

Without streaming, it would work as follow:

  1. The server receives a request and passes it down the stack
  2. The request reaches the middleware
  3. The middleware checks out the connection
  4. The application is invoked, renders a view accessing the database using the connection and returns the rendered view as a string
  5. The middleware checks the connection back in
  6. The response is sent back to the client

With streaming, this would happen:

  1. The server receives a request and passes it down the stack
  2. The request reaches the middleware
  3. The middleware checks out the connection
  4. The app is called but does not render anything. Instead it returns a lazy object as response body that will stream the HTML page in chunks as the `each` method is called
  5. The middleware checks the connection back in
  6. Back in the server, we have received the lazy body and will start to stream it
  7. While streaming the body, since the body is lazily calculated, now is the time it must access the database. But, since the middleware already checked the connection back in, our code will fail with a “not connected” exception

The first reaction to solve this issue is to ensure that all streaming happens inside the application, i.e. the application would have a mechanism to stream the response back and only when it is done it would return the Rack response back. However, if the application does this, any middleware that desires to modify the header or the response body won’t be able to do so because the response was already streamed from inside the application.

Our work-around for Rails was to create proxies that wrap the response body:

def call(env)
  connection = DB.checkout_connection
  env["db.connection"] = connection
  response = @app.call(env)
  ResponseProxy.new(response).on_close do
    DB.checkin_connection connection
  end
end

However, this is inefficient and extremely limited (not all middleware can be converted to such approach). In order for streaming to be successful, the underlying server API needs to acknowledge that the headers and the response body can be sent at different times. Not only that, it needs to provide proper callbacks around the response lifecycle (before sending headers, when the response is closed, on each stream, etc).

The trade-off here is that this can no longer be achieved with an easy API as Rack’s. In general, we would like to have a response objects that provides several life-cycle hooks. For example, the middleware above could be rewritten as:

def call(request, response)
  connection = DB.checkout_connection
  request.env["db.connection"] = connection
  response.on_close { DB.checkin_connection(connection) }
  @app.call(request, response)
end

The Java Servlet specification is a good example of how request and response objects could be designed to provide such hooks.

Other middleware

In the example above I focused on the database connection middleware but this limitation exists, in one way or the other, in the majority of middleware in a stack. For example, a middleware that rescues any exception inside the application to render a 500 page also needs to be adapted. Other middleware simply won’t work. For instance, Rails ships with a middleware that provides an ETag header based on the body which has to be disabled when streaming.

Looking back

Does this mean moving to Rack was a mistake? Not at all. Rack appeared when the web development Ruby community was fragmented and the simplicity of the Rack API made it possible to unify the different web frameworks and web servers available. Looking back, I would take the standardization provided by Rack any day regardless of the limitations it brings. Now that we have a standard, we are already working on addressing such issues, which leads us to…

Looking forward

Streaming will become more and more important. While working with HTML streaming requires special attention both technically and also in terms of usability, as outlined in Rails’ documentation, API endpoints could benefit from it with basically no extra cost. Not only that, features in HTML5 like server-sent events could easily be built on top of streaming without requiring a specific end-point in your application to handle them.

While CGI was originally streaming friendly, the abstractions we built on top of it (like middleware) are not. I believe web frameworks should be moving towards better connection/socket abstractions and away from the old CGI-based APIs, which served us well but it is finally time for us to let it go.

PS: Thanks to Aaron Patterson (who has also written about this issue in his blog), Yehuda Katz, Konstantin Haase and James Tucker for early review and feedback.

F.A.Q.

This section was added after the blog post was released based on some common questions.

Q: Isn’t it a bad idea to mix both streaming and non-streaming behavior in the same stack?

That depends on the stack. This is definitely not an issue with Erlang and Node.js since both stacks are streaming based. In Ruby, I believe a threaded jRuby or Thin will allow you to get away with keeping a socket open waiting for responses, but it will probably turn out to be a bad idea with other servers since the process holding the socket won’t be able to respond to any other request.

Q: Is there a need to do everything streaming based when a request/response would be fine?

No, there is no need. The point of the blog post is not to advocate for streaming only frameworks, but simply state that a Rack API may severely limit your streaming capability in case your platform supports it. Personally, I would like to be able to choose and mix both, if my stack allows me to do so.