The anatomy of code documentation

Writing code is the activity that turns our ideas into great products and tools.

Fingers over a keyboard and a text editor. This is how most people see our daily routine. Suddenly, no words emerge over the screen. The eyes gaze at the monitor and everything looks static: we’re reading code.

(And we read lots of code).

The reading process demands a huge cognitive effort from us. Once we read a piece of code for the very first time, we’re immersed in a long learning curve and a continuous cycle of questionings, doubts, and insights. Reading code requires us to dig through the mental model of a domain problem and to continuously analyze the input, processing and output flows of some business requirement.

Sure, good developers have been exposed to good writing practices, and there is nothing like reading a clean, robust and elegant piece of code. Our methods are small and expressive and intend to reveal interfaces.

Nevertheless, we still must read them line by line, and gradually lay down our understanding about it.

(“Only well written and tested code is not enough to connect the dots…”, some wise person may have already said this before.)

A large number of whiteboard discussions may have been lost, merely because there was no room for those. Commit messages were laconic and our memorization skills were overestimated. Yesterday’s obvious is now a mine of untangled knowledge.

Furthermore, the practice of comment about code is often discouraged. After all, we prioritize working software over overwhelming documentation. We’re agile developers and we cannot afford the maintenance burden. Code comments are mostly faced as the inability to express oneself in a clever way, and the activity of documenting code is usually associated with redundant, undesired and ungrateful comments.

But here it goes: the key point on documenting is maintaining information that resists as time goes by.

A practical example

Suppose our sample application deals with a concept of contracted plans on its domain, where each plan offers a storage limit in a remote storage service. One of the requirements is to present a summary of a contracted plan’s consumption, as shown in the example below:

'2.44% (500 MB of 20 GB)' # User has consumed 2.44% from a 20 gigabytes
                          # storage limit (i.e., 500 megabytes).

Our source data is a domain object ContractedPlan, which exposes two main pieces of information: the storage limit value (in the plan_storage_limit column) and the total consumed (in the total_consumed column) until the present moment, both persisted in numeric bytes in the database.

The consumption summary must present these two pieces of information, in addition to the percentage of how much has been consumed until now:

\> contracted_plan = ContractedPlan.last

\> contracted_plan.plan_storage_limit # Client has contracted a 20 GB plan.
# => 20_000_000_000

\> contracted_plan.total_consumed # User has consumed 500 megabytes until now.
# => 500_000_000

The percentage of how much has been consumed can be calculated simply applying the following statement:

\> total_consumed_in_percent =
  (total_consumed * 100.to_f / total_contracted).round(2)
# => 2.44

Knowing that such summary statement must be rendered in an HTML page, we’re gonna create a ContractedPlansHelper#plan_consumption_summary helper with the following implementation:

module ContractedPlansHelper
  def plan_consumption_summary(contracted_plan)
    total_contracted = contracted_plan.plan_storage_limit
    total_consumed = contracted_plan.total_consumed
    total_consumed_in_percent =
      (total_consumed * 100.to_f / total_contracted).round(2)

    format(
      '%s%% (%s de %s)',
      total_consumed_in_percent,
      number_to_human_size(total_consumed, precision: 4),
      number_to_human_size(total_contracted)
    )
  end
end

The #plan_consumption_summary method has a structure divided into two main parts: data preparation and data presentation (including the step for formatting the three expected pieces of information).
Even though it looks like a simple and straight logic, there are some points not so obvious for newcomers to that code. The data presentation part depends on a moderately complex structure, in a way that is not an easy piece to digest regarding the “shape” of the returned String (i.e., the syntax of the format method, besides the two calls for the #number_to_human_size method).

There are, of course, some ways to find out how the returned text will be printed out:

  • Reading the unit tests;
  • Opening the page that uses that helper on the web browser; or
  • Executing the helper on the application console (rails console).

All alternatives, however, requires the reader to make an additional effort. All of them encourage the reader to leave out its current context, which is reading the application code in the text editor. In that case, we can leverage code documentation as a technique to facilitate the expected understanding.

First of all, the plan_consumption_summary method can provide a simple usage example, with real data expected from business customers:

# plan_consumption_summary(contracted_plan)
# # => '2.44% (500 MB of 20 GB)'
def plan_consumption_summary(contracted_plan)
  total_contracted = contracted_plan.plan_storage_limit
  total_consumed = contracted_plan.total_consumed
  # ...

The first goal was accomplished. We have diminished the learning barrier to the method logic, without breaking the flow of reading code. Moreover, by being exposed to real data examples, we had a better understanding of some concepts present in the application domain, at a very low cost.

The next snippet that we may add is a title that describes briefly what is the method in that case. Let’s think for a moment. How would we describe the plan_consumption_summary to someone who has just joined the project?

A possible dialog could be:

Well, this method returns a summary of how much a user has consumed in a certain plan.

Bringing the same information to the code, we have this:

# A summary of how much some user has consumed in a certain plan.
#
# plan_consumption_summary(contracted_plan)
# # => '2.44% (500 MB of 20 GB)'
def plan_consumption_summary(contracted_plan)
  total_contracted = contracted_plan.plan_storage_limit
  total_consumed = contracted_plan.total_consumed
  # ...

By describing the title of a method, we have the chance to reflect and identify whatever noise is between a concept and its implementation. The mere act of writing documentation helps us to give remarkable names to our objects, methods, and variables. If the previously written code has been derived only from what was talked or listened, now we have a third communication channel.

At this moment, our documentation has two pieces of information with different semantics: a title and an example. We can apply to it a bit of hierarchy, (i.e., setting apart free text and code), we can make this structure a bit more evident:

# A summary of how much some user has consumed in a certain plan.
#
# Examples
#   plan_consumption_summary(contracted_plan)
#   # => '2.44% (500 MB of 20 GB)'
def plan_consumption_summary(contracted_plan)
  total_contracted = contracted_plan.plan_storage_limit
  total_consumed = contracted_plan.total_consumed
  # ...

We can also add up a piece of information that points out that the method has a public visibility and, therefore, can be used by outside application code (i.e., the view layer). In our example, we use the Public: keyword:

# Public: A summary of how much some user has consumed in a certain plan.
#
# Examples
#   plan_consumption_summary(contracted_plan)
#   # => '2.44% (500 MB of 20 GB)'
def plan_consumption_summary(contracted_plan)
  total_contracted = contracted_plan.plan_storage_limit
  total_consumed = contracted_plan.total_consumed
  # ...

The third point that we can cover is about the type/shape of data that will be returned from the method. Since we know that the return type will be a String, we can add such information with Returns a String:

# Public: A summary of how much some user has consumed in a certain plan.
#
# Examples
#   plan_consumption_summary(contracted_plan)
#   # => '2.44% (500 MB of 20 GB)'
#
# Returns a String.
def plan_consumption_summary(contracted_plan)
  total_contracted = contracted_plan.plan_storage_limit
  total_consumed = contracted_plan.total_consumed
  # ...

In order to complete the input-processing-output triad, we also describe the expected input data, along with a brief description of each snippet. Our case is simple, as we just want an instance of ContractedPlan model, and there is no other primitive values involved.

# Public: A summary of how much some user has consumed in a certain plan.
#
# contracted_plan - A ContractedPlan instance.
#
# Examples
#   plan_consumption_summary(contracted_plan)
#   # => '2.44% (500 MB of 20 GB)'
#
# Returns a String.
def plan_consumption_summary(contracted_plan)
  total_contracted = contracted_plan.plan_storage_limit
  total_consumed = contracted_plan.total_consumed
  # ...

We have now a basic backbone of code documentation. We know how to answer the “what’s” and “how’s” intrinsic to the #plan_consumption_summary method. We can finally include another significant piece of information, one that is mostly unfeasible to describe using just the programming language lexicon: the context.

We know context is everything. Unfortunately, there is information that might be lost in time, especially after some cycles of team rotation and turnovers. The context can help us out to surface the aspects of “when to use a method” and “the reason/intentions behind using this”.

In our example, we can describe something that conveys the following message:

The plan_consumption_summary method is important so that users have access to what has already been consumed by them (since they could run out of storage space), and it can be used on pages that present data on plan consumption.

Assuming that such information is relevant, we can include it on the code documentation:

# Public: A summary of how much some user has consumed in a certain plan.
#
# This can be used whenever the information about a contracted plan is
# presented to the its user. This way, they can get a quick overview
# regarding the current state of the plan usage.
#
# contracted_plan - A ContractedPlan instance.
#
# Examples
#
#   plan_consumption_summary(contracted_plan)
#   # => '2.44% (500 MB of 20 GB)'
#
# Returns a String.
def plan_consumption_summary(contracted_plan)
  total_contracted = contracted_plan.plan_storage_limit
  total_consumed = contracted_plan.total_consumed
  # ...

We can consider the job done now.

It is then a good time to notice that the format used in our example was not conceived by my free will. Attentive readers may have perceived that we used the TomDoc format, created by Tom Preston-Werner. The TomDoc spec aims for simplicity and clarity of communication and it is hugely used in GitHub projects (and here at Plataformatec as well).

Conclusion

An important goal of writing code documentation is to provide an initial context to facilitate and foster the understanding of a piece of code. It’s an empathetic tool that provides a rough, overall idea of the code, even before “one starts reading the code”. By using it wisely, the reader can get a glimpse of what will be executed by the language interpreter/compiler, with little effort.

Code documentation is, of course, only one of many other artifacts that are at our disposal, in order to keep the knowledge base of a software system alive. I consider that as important as having well-described commit messages, README files, project wikis, architectural decision records, BDD specification. etc. They all work together focused on the same essential goal: empower developers to write better software.

Share on FacebookShare on Google+Tweet about this on TwitterShare on LinkedInEmail this to someone