Power of the metrics: Don’t use average to forecast deadlines

Almost all project stakeholders are used to ask for deadlines when the subject is software development. Every time I face this type of question I usually answer something like this:

— “Well, the only thing I know is that uncertainty is one of the few certainties about software development, so let’s look at the team’s metrics and try to forecast this big mystery with some analytical techniques.”

In this blog post, I’ll show why averages aren’t a good measure when you are attempting to predict software projects’ deadline.

The Beginning

Metrics are important because they drive improvements and help teams to focus their people and resources on the process of developing good software. Take a look at a talk (in Portuguese) where I spoke about how it’s possible to learn with agile metrics when you are developing digital products.

One of the key metrics that we have been using in our projects is the throughput. Throughput is the number of things done in a given period.

In our projects, we measure the throughput as the number of work items (ex. user story) that our teams delivered in a week. This type of metric helps us to have deadline predictability. It is also useful to identify problems that could be occurring in the software development process.

A common concern about using the throughput to predict delivery dates is that it requires a relatively similar size of the stories in the backlog. Otherwise, the large standard deviation could affect directly the completion date predictions.

That concern is actually right when you’re using the average throughput to predict deadlines.

In the next section of this post, I will prove to you that the use of averages could be dangerous when you are trying to forecast software release dates.

Exploring some data

The next three histograms represent different projects that we worked at Plataformatec in a recent past.

The histogram is a graphical display of data that uses bars of different heights to show the frequency of various data points within an overall dataset.

Theses graphics have the throughput on the horizontal (x) axis and the count frequency on the vertical (y) axis. The benefit of this chart is that it gives us an overall idea of the shape of the distribution of your underlying data.

Analyzing the first case, we have a scenario where the average throughput of the project is 3.6 stories per week and a median (the number that is halfway into the set) of 3 stories per week.

If we ask the team of the project to estimate their throughput, maybe they will answer two, as a result of the mode (the value that appears most often in a dataset). So, in this case, why it would not be suggested to use an average?

From a qualitative perspective, despite the average not being far from the median, 60% of the throughput values are below the average, that means the other 40% are above. Consequently, the average is not representative of the team’s delivery capacity.
To deduce something from the data, it is important to understand the type of the data distribution and for this reason, we can’t say that the average, a single number, is a reliable measure. It doesn’t provide context for analysis.

In the second case, we have a situation where the average is located below the median, which means that the average represents less than 50% of the throughput of the team. So using the median would be a better measure of central tendency in this situation.

In the third case, it is possible to observe another situation that we have a representative concentration of deliveries above the average throughput, so it would be too risky to use this value as a reference to set the team’s delivery trend. Another argument that could support the conclusion about this case is: if you have a throughput mode of zero and you use the average as the trend, you are going against the typical team performance. That could cause a wrong estimate for the project’s delivery date.

Another observation that is important to make about averages is that extreme values can bias them to values that would not represent the real central tendency in a dataset.

Summary

From what we’ve seen in the examples of this post, projects will rarely have a distribution where average, median, and standard deviation are identical.

As the data becomes skewed, the average loses its ability to provide the best central location for the data because the data will drag it away from the typical value.

As we don’t have information about the data distribution, it’s wrong to make a forecast based on an average. So, if you are a Product Owner or Product Manager, you shouldn’t use the average throughput when you are trying to establish a release date.

In future posts of this series, I will talk about how we can use some statistical techniques like linear regression or Monte Carlo simulation to forecast deadlines range.