Archive for June, 2012

Update 08/13/2012

Since the new deprecation policy the composed_of removal was reverted. We are still discussing the future of this feature.

The reason

A few days ago we started a discussion about the removal of the composed_of method from Active Record. I started this discussion because when I was tagging every Rails issue according to its framework, I found some of them related to composed_of, that are not only hard to fix but would add more complexity for a feature that, in my opinion, is not adding any visible value to the framework.

In this presentation, Aaron Patterson talks about three types of features: Cosmetic, Refactoring, and Course Correction. Aaron defines cosmetic features as a feature that adds dubious value and unknown debt (I highly recommend that you watch the whole presentation to see more about these types of features). This is exactly what I think about composed_of. At this time this feature is adding more debt than value to the Rails code base and applications, so the Rails team have decided to remove this method.

The plan

The removal plan is simple, deprecate in 3.2 and remove in 4.0. This means that you need to stop using this feature and implement it in another way.

The Rails team have chosen this path because this feature can be implemented using plain ruby methods for getters and setters. You will see how in the next section.

Implementation

In the simplest case, when you have only one attribute and needs to instantiate an object with the value of this attribute, you can use the serialize feature with a custom serializer:

class MoneySerializer
  def dump(money)
    money.value
  end
 
  def load(value)
    Money.new(value)
  end
end
 
class Account < ActiveRecord::Base
  serialize :balance, MoneySerializer.new
end

To use it with multiple attributes you can do the following:

class Person < ActiveRecord::Base
  def address
    @address ||= Address.new(address_street, address_city)
  end
 
  def address=(address)
    self[:address_street] = address.street
    self[:address_city]   = address.city
 
    @address = address
  end
end

Benefits for Rails developers

I already talked about what this removal can provide to Rails maintainers, but what benefits does it bring to Rails developers?

I think that the best advantages are:

  • It is easier to test the composite objects;
  • It is easier to understand the lazy methods;
  • It is easier to customize it without resorting to options like :converter, :constructor and :allow_nil.

Wrapping up

I strongly recommend that you read the whole discussion in the pull request. You will find more examples and additional information there.

Also, I want to thank @steveklabnik for working on this feature and the awesome work that he has been doing on the Rails Issues Team.

Finally, I want to invite you to help the Rails team to fix, test, and track issues. About half of the issues are related to the Active Record framework and we need to work on them. As a regular Rails contributor and Rails developer, I think there is still a lot we can do to improve the Rails code base, so join us.

Or, even better, why your web framework should not adopt a CGI-based API.

For the past few years I have been studying and observing the development of different emerging languages closely with a special focus on web frameworks/servers. Unfortunately, most of the new web frameworks are following the Rack/WSGI specification which may be a mistake depending on the platform you are targeting (particularly true for Erlang and Node.js which have very strong streaming foundations and is by default part of their stack).

This blog post is an attempt to detail the limitations in the Rack/CGI-based APIs that the Rails Core Team has found while working with the streaming feature that shipped with Rails 3.1 and why we need better abstractions in the long term.

Case in study

The use case we have in mind here is streaming. In Rails, we have focused on streaming as a way to quickly return the head of the HTML page to the browser, so the browser can start downloading assets (like javascript and stylesheet) while the server is generating the rest of the page. There is a great entry on Rails weblog about streaming in general and a Railscast if you want to focus on how to use it in your Rails applications. However, streaming is not only limited to HTML responses and can also be useful in API endpoints, for example, to stream search results as they pop-up or to synchronize with mobile devices.

The Rack specification in a nutshell

In Rack, the response is made by an array with three elements: [status, headers, body]

The body can be any object that responds to the method each. This means streaming can be done by passing an object that will, for example, lazily read a file and stream chunks when each is called.

A Rack application is any object that implements the method call and receives an environment hash/dictionary with the request information. When I said above that most new web frameworks are following the Rack specification, is because they are introducing an API similar to this one just described.

The issue

In order to understand the issue, we will consider three entities: the client, the server and the application. The client (for example a browser) sends a request to the server which forwards it to an application. In this case, the server and the application are communicating via the Rack API.

The issue in streaming cases is that, sending a response back from the application does not mean the application finished processing. For example, consider a middleware (a middleware is an object that sits in between the server and our application) that opens up a connection to the database for the duration of the request and cleans it afterwards:

def call(env)
  connection = DB.checkout_connection
  env["db.connection"] = connection
  @app.call(env)
ensure
  DB.checkin_connection connection
end

Without streaming, it would work as follow:

  1. The server receives a request and passes it down the stack
  2. The request reaches the middleware
  3. The middleware checks out the connection
  4. The application is invoked, renders a view accessing the database using the connection and returns the rendered view as a string
  5. The middleware checks the connection back in
  6. The response is sent back to the client

With streaming, this would happen:

  1. The server receives a request and passes it down the stack
  2. The request reaches the middleware
  3. The middleware checks out the connection
  4. The app is called but does not render anything. Instead it returns a lazy object as response body that will stream the HTML page in chunks as the `each` method is called
  5. The middleware checks the connection back in
  6. Back in the server, we have received the lazy body and will start to stream it
  7. While streaming the body, since the body is lazily calculated, now is the time it must access the database. But, since the middleware already checked the connection back in, our code will fail with a “not connected” exception

The first reaction to solve this issue is to ensure that all streaming happens inside the application, i.e. the application would have a mechanism to stream the response back and only when it is done it would return the Rack response back. However, if the application does this, any middleware that desires to modify the header or the response body won’t be able to do so because the response was already streamed from inside the application.

Our work-around for Rails was to create proxies that wrap the response body:

def call(env)
  connection = DB.checkout_connection
  env["db.connection"] = connection
  response = @app.call(env)
  ResponseProxy.new(response).on_close do
    DB.checkin_connection connection
  end
end

However, this is inefficient and extremely limited (not all middleware can be converted to such approach). In order for streaming to be successful, the underlying server API needs to acknowledge that the headers and the response body can be sent at different times. Not only that, it needs to provide proper callbacks around the response lifecycle (before sending headers, when the response is closed, on each stream, etc).

The trade-off here is that this can no longer be achieved with an easy API as Rack’s. In general, we would like to have a response objects that provides several life-cycle hooks. For example, the middleware above could be rewritten as:

def call(request, response)
  connection = DB.checkout_connection
  request.env["db.connection"] = connection
  response.on_close { DB.checkin_connection(connection) }
  @app.call(request, response)
end

The Java Servlet specification is a good example of how request and response objects could be designed to provide such hooks.

Other middleware

In the example above I focused on the database connection middleware but this limitation exists, in one way or the other, in the majority of middleware in a stack. For example, a middleware that rescues any exception inside the application to render a 500 page also needs to be adapted. Other middleware simply won’t work. For instance, Rails ships with a middleware that provides an ETag header based on the body which has to be disabled when streaming.

Looking back

Does this mean moving to Rack was a mistake? Not at all. Rack appeared when the web development Ruby community was fragmented and the simplicity of the Rack API made it possible to unify the different web frameworks and web servers available. Looking back, I would take the standardization provided by Rack any day regardless of the limitations it brings. Now that we have a standard, we are already working on addressing such issues, which leads us to…

Looking forward

Streaming will become more and more important. While working with HTML streaming requires special attention both technically and also in terms of usability, as outlined in Rails’ documentation, API endpoints could benefit from it with basically no extra cost. Not only that, features in HTML5 like server-sent events could easily be built on top of streaming without requiring a specific end-point in your application to handle them.

While CGI was originally streaming friendly, the abstractions we built on top of it (like middleware) are not. I believe web frameworks should be moving towards better connection/socket abstractions and away from the old CGI-based APIs, which served us well but it is finally time for us to let it go.

PS: Thanks to Aaron Patterson (who has also written about this issue in his blog), Yehuda Katz, Konstantin Haase and James Tucker for early review and feedback.

F.A.Q.

This section was added after the blog post was released based on some common questions.

Q: Isn’t it a bad idea to mix both streaming and non-streaming behavior in the same stack?

That depends on the stack. This is definitely not an issue with Erlang and Node.js since both stacks are streaming based. In Ruby, I believe a threaded jRuby or Thin will allow you to get away with keeping a socket open waiting for responses, but it will probably turn out to be a bad idea with other servers since the process holding the socket won’t be able to respond to any other request.

Q: Is there a need to do everything streaming based when a request/response would be fine?

No, there is no need. The point of the blog post is not to advocate for streaming only frameworks, but simply state that a Rack API may severely limit your streaming capability in case your platform supports it. Personally, I would like to be able to choose and mix both, if my stack allows me to do so.

When David Chelimsky was visiting São Paulo in last April, we invited him to go out for some coffee, beers and brazilian appetizers. We had a great time and we talked about different topics like OO, programming languages, authoring books and, as expected, about testing.

One of the topics in our testing discussion was the current confusion in rspec-rails request specs when using Capybara. There is an open issue for this in rspec-rails’ issues tracker and discussing it personally allowed us to talk about some possible solutions, which could take months in internet time. :)

rspec-rails is a gem that wraps Rails testing behaviors into RSpec’s example groups. For example, the controller example group is based on ActionController::TestCase::Behavior. There are also example groups for views, helpers and so forth, but for now we are interested in the request example group, which is as a wrapper for ActionDispatch::Integration::Runner. The Rails’ integration runner is built on top of rack-test, a great small gem that adds support to methods like get, post, put and delete and handle the rack request and response objects.

This setup with the request example group running on top of Rails’ Integration Runner works fine until you add Capybara to your application (which is always a good idea). The issue is that Capybara by default includes its DSL in the same request example group and that’s when the confusion starts.

Capybara, being an acceptance test framework, does not expose low-level details like a request or response object. In order to access a web page using Capybara, the developer needs to use the method visit (instead of get). To read the accessed page body, the developer must use page instead of manipulating the response.

However, since both Capybara DSL and Rails’ Integration Runner are included in the same example group, both methods visit and get are available! Not only that, even if I visit a web page using Capybara’s visit, I can still access the request and response object that comes from Rails, except that they will be blank since Capybara uses a completely different stack to access the application.

This confusion not only happens inside each test but it also leads to a poor testing suite. I have seen many, many files inside spec/requests that mixes both syntaxes.

Talking to David, I have expressed a possible solution to this problem based on how we have been building applications at Plataformatec. First of all, we start by having two directories: spec/requests and spec/acceptance. Since both are supported by Capybara, this (mostly) works out of the box.

Everything you want to test from the user/browser perspective goes under spec/acceptance. So if you want to test that by filling the body and the title fields and pressing the button “Publish” publishes a new blog post, you will test that under acceptance (protip: we usually have subdirectories inside spec/acceptance based on the application roles, like spec/acceptance/guest, spec/acceptance/admin, etc).

Everything under spec/requests applies to the inner working of your application. Is it returning the proper http headers? Is this route streaming the correct JSON response? Also, since APIs are not part of the user/browser perspective, they are also tested under spec/requests and not under spec/acceptance.

This separation of concerns already helps solving the confusion above. Under spec/acceptance, you should use only Capybara helpers. Inside spec/requests, you are using Rails provided tools. However, this does not solve the underlying problem that both helpers are still included in spec/requests.

Therefore, while this blog post means to provide some guidance for those that run into such problems, we also would like to propose a solution that we discussed with David. The solution goes like this:

1) We change RSpec to no longer generate spec/requests, but both spec/api and spec/features (I have proposed spec/acceptance but David pointed out those are not strictly speaking acceptance tests). The Capybara DSL (visit, page and friends) should not be included in spec/api under any circumstance.

2) We change Capybara to include by default its DSL and RSpec matchers under spec/features and change the feature method to rely on the type :features instead of :requests.

The proposal suggests the addition of two new directories instead of changing the behavior of existing ones in order to be backwards compatible while ensuring a safe and more semantic future for everyone else. David asked me to outline our conversation in a blog post, so we can get some awareness and feedback before undergoing such changes. So, what do you think?