{"id":288,"date":"2009-09-24T16:43:49","date_gmt":"2009-09-24T19:43:49","guid":{"rendered":"http:\/\/blog.plataformatec.com.br\/?p=288"},"modified":"2010-05-20T19:08:39","modified_gmt":"2010-05-20T22:08:39","slug":"exporting-data-to-csv-and-excel-in-your-rails-app","status":"publish","type":"post","link":"https:\/\/blog.plataformatec.com.br\/2009\/09\/exporting-data-to-csv-and-excel-in-your-rails-app\/","title":{"rendered":"Exporting data to CSV and Excel in your Rails apps"},"content":{"rendered":"

Sometimes users want to slice and dice data as they wish. In such scenarios, it’s usual to export the data in a tabular format so your users can use any spreadsheet editor and do whatever they want.<\/p>\n

Usually, we do that using CSV, right? OpenOffice and other editors can open CSV files flawlessly. Just double-click an CSV file and voil\u00e0.<\/p>\n

However, Excel has some gotchas. If you double-click a CSV file, Excel will open your data with everything in just one column. Of course you can go to Tools > Import, browse to the file, set up comma as separator and then after some clicks you get your data as you wish. This is not user-friendly at all and this is not something we want to explain to our clients. Our clients must be able to simply double-click the my_data.csv file and see the data well structured.<\/p>\n

So, here’s what we do at Plataforma<\/a> to deal with Excel formats.<\/p>\n

It’s TSV, not CSV, dude!<\/h3>\n

First of all, Excel expects your data with tabulations as fields separator. So, what you need is actually a TSV (tab-separated values).<\/p>\n

If you are using FasterCSV<\/a>, you just need to do:<\/p>\n

\r\ntsv_str = FasterCSV.generate(:col_sep => \"\\t\") do |tsv|\r\n  tsv << headers\r\n  # append your beautiful data here\r\nend\r\n<\/pre>\n

Keep this in mind. Excel demands tabulations, not commas! But there are worse things to come...<\/p>\n

No newline in fields<\/h3>\n

Excel doesnt' like when you put \"\\n\" inside fields. Although fields are separated by tabulations, it appears Excel can't cope with extra newlines.<\/p>\n

So, if you have some text fields in your model, beware that it may contain newlines and you need to strip those before exporting your data to Excel.<\/p>\n

Forget about UTF-8. Use UTF-16!<\/h3>\n

One of the most hidden specifications of Excel is that it expects our TSV files to be encoded using UTF-16 Little Endian. Did you know that? Well, we didn't!<\/p>\n

Some sources even say that this is the only Unicode format supported in Excel. <\/p>\n

What's the difference between UTF-8<\/a> and UTF-16<\/a>? UTF-8 is a variable byte encoding in which characters may use up to 4 bytes, however for western languages it usually uses one or two bytes. UTF-16 characters always uses at least 2 bytes (remember that characters and bytes are two different things in Unicode \"slang\"). Generally speaking, UTF-16 takes more space in the filesystem (most of the time).<\/p>\n

There's also the Little Endian part. UTF-16 always uses a pair of bytes to represent a character, however we need to know what's the correct order of those bytes. We won't get into details here, but the order is indicated by Byte-order Mark (BOM)<\/a>. In practice, the BOM in UTF-16 will add two bytes in the beginning of a file (you can see that in a hexeditor). <\/p>\n

One way to convert your TSV string (generated by FasterCSV) is with Iconv, a tool written for the GNU C Library. Luckly, Iconv is well wrapped in pure Ruby joy. It's inside Ruby's standard library and you just need to require it.<\/p>\n

However, if you convert an string to UTF-16 Little Endian, Iconv will NOT put the BOM in the beginning. This is compatible with the Unicode FAQ<\/a>. But since Excel is way out of the standards, you must manually insert the BOM.<\/p>\n

You can use Iconv just before sending the file to the user in your controller. It will look like this:<\/p>\n

\r\nrequire 'iconv'\r\n\r\nclass ProjectsController < ActionController::Base\r\n  BOM = \"\\377\\376\" #Byte Order Mark\r\n\r\n  def index\r\n    @projects = Project.all\r\n    respond_to do |format|\r\n      format.html\r\n      format.csv { export_csv(@projects) }\r\n    end\r\n  end\r\n\r\nprotected\r\n\r\n  def export_csv(projects)\r\n    filename = I18n.l(Time.now, :format => :short) + \"- Projects.csv\"\r\n    content = Project.to_csv(projects)\r\n    content = BOM + Iconv.conv(\"utf-16le\", \"utf-8\", content)\r\n    send_data content, :filename => filename\r\n  end\r\nend\r\n<\/pre>\n

You can see that we are still using CSV as an extension to the file. This is because TSV files are not usually associated with Excel. This excerpt uses a timestamp as the filename, which is generally a good practice.<\/p>\n

Wrapping up<\/h3>\n

So, these are the three rules for dealing with Excel-friendly-CSV:<\/p>\n

    \n
  1. Use tabulations, not commas.<\/li>\n
  2. Fields must NOT contain newlines.<\/li>\n
  3. Use UTF-16 Little Endian to send the file to the user. And include a Little Endian BOM manually.<\/li>\n<\/ol>\n

    Keep this in mind and you'll never have to explain to your clients how to open the data export in Excel.<\/p>\n

    There is one last issue: OpenOffice will NOT open files in the Excel \"specification\" easily. Google Analytics solves this by showing two links to the user: \"Export to CSV\" and \"Export to Excel\". The first is a regular CSV file and the second is the specially crafted Excel-friendly TSV file.<\/p>\n

    And blame Microsoft for this odd behaviour.<\/p>\n

    UPDATE:<\/strong> @danielvlopes<\/a> pointed us an already existing solution which encapsulates the process described in this post, called csv_builder<\/a>. You just need to set the @output_encoding to use the \"utf-16\" (beware of the BOM).<\/p>\n

    @jncoward<\/a> also provided a link to the spreadsheet<\/a> gem, which handles directly Excel formats. TSV is a much simpler and faster format, but the spreadsheet gem might be useful to you in more complex cases.<\/p>\n

    Finally, Kieran pointed in the comments that we can have XML spreadsheets, while Niko and Chris told us that we can you can also work with HTML tables in Excel, which is simple as well and even allows formatting.<\/p>\n","protected":false},"excerpt":{"rendered":"

    Sometimes users want to slice and dice data as they wish. In such scenarios, it’s usual to export the data in a tabular format so your users can use any spreadsheet editor and do whatever they want. Usually, we do that using CSV, right? OpenOffice and other editors can open CSV files flawlessly. Just double-click … \u00bb<\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"ngg_post_thumbnail":0,"footnotes":""},"categories":[1],"tags":[30,29,7,32,31],"aioseo_notices":[],"jetpack_sharing_enabled":true,"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/blog.plataformatec.com.br\/wp-json\/wp\/v2\/posts\/288"}],"collection":[{"href":"https:\/\/blog.plataformatec.com.br\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.plataformatec.com.br\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.plataformatec.com.br\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.plataformatec.com.br\/wp-json\/wp\/v2\/comments?post=288"}],"version-history":[{"count":25,"href":"https:\/\/blog.plataformatec.com.br\/wp-json\/wp\/v2\/posts\/288\/revisions"}],"predecessor-version":[{"id":309,"href":"https:\/\/blog.plataformatec.com.br\/wp-json\/wp\/v2\/posts\/288\/revisions\/309"}],"wp:attachment":[{"href":"https:\/\/blog.plataformatec.com.br\/wp-json\/wp\/v2\/media?parent=288"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.plataformatec.com.br\/wp-json\/wp\/v2\/categories?post=288"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.plataformatec.com.br\/wp-json\/wp\/v2\/tags?post=288"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}