{"id":5597,"date":"2016-08-18T23:35:05","date_gmt":"2016-08-19T02:35:05","guid":{"rendered":"http:\/\/blog.plataformatec.com.br\/?p=5597"},"modified":"2017-05-31T18:58:14","modified_gmt":"2017-05-31T21:58:14","slug":"forecasting-software-projects-completion-date-through-monte-carlo-simulation","status":"publish","type":"post","link":"https:\/\/blog.plataformatec.com.br\/2016\/08\/forecasting-software-projects-completion-date-through-monte-carlo-simulation\/","title":{"rendered":"Forecasting software project’s completion date through Monte Carlo Simulation"},"content":{"rendered":"
Nowadays we are using a more probabilistic approach to manage our processes than deterministic. That means that we use different statistical methods to predict the future instead of blind estimations. But wait\u2026 wasn\u2019t unpredictability one of the main reasons that made us change from Waterfall to Agile?<\/p>\n
Yes, uncertainty is inherent to software development. For example, one could not possibly predict all the features of a system at the beginning of a project.<\/p>\n
But what if we could forecast how just the next few weeks<\/strong> would behave?<\/p>\n That is what we are trying to achieve using a gathering of metrics and running Monte Carlo simulation over them.<\/p>\n To understand how we collect our metrics, I recommend you read these posts:<\/p>\n Most people don\u2019t use any metrics to manage their projects. You can then imagine how many use complex statistical models to forecast a project completion. However, having this knowledge on your current project will help you deliver faster and better.<\/p>\n In this post, you will see how we went from a naive forecast approach to a more precise and robust one using Monte Carlo Simulation. Moreover, we\u2019ll explain how Monte Carlo simulation works in an easy and straightforward way and then make an analogy between the simple example and our real world approach. Knowing when your project will likely end can help you manage the project\u2019s budget, give better feedbacks to stakeholders and give you the confidence that you are indeed in control of your project.<\/p>\n We have been playing with other methods of forecasting for a long time, and all of them had some drawbacks. Here we list some of them and how and why we opted to use Monte Carlo.<\/p>\n Some of you might have thought of a simple method in which we would use the average throughput to forecast the end of the project. However, as you can see on this blog post Power of the metrics: Don\u2019t use average to forecast deadlines<\/em><\/a>, that would be an extremely naive method, and it doesn\u2019t work.<\/p>\n As a first approach, we used linear regression, based on our accumulated throughput, to forecast when a project would end:<\/p>\n <\/p>\n However, we identified two different problems that we were not considering when doing that analysis:<\/p>\n Firstly, we focused on avoiding problem number 2, and we started using a more parsimonious model, which used methods that we knew were somehow skewed, but we could better control them.<\/p>\n <\/p>\n We manually added the throughput performance that would cause each different forecast to reach the backlog in a certain date. This forecasting approach comes from the study of percentiles of our throughput history.<\/p>\n We had better results with that, since we could change the forecast manually and, therefore, tune it according to the current state of the project. But the problem number 1, the fact that we were not considering the change in the backlog, made us take a step further and try Monte Carlo Simulation.<\/p>\n Wikipedia defines Monte Carlo<\/a> method as<\/p>\n \nMonte Carlo methods vary, but tend to follow a particular pattern:<\/p>\n It may seem difficult or complex at first but it is actually much simpler than it looks. You can look at it as the brute force method of forecasting. We\u2019ll present a simple example and then show how we did in the real world. There\u2019s a very good simulation made by Larry Maccherone<\/a> that can elucidate it even more.<\/p>\n Imagine you are playing a dice game in which the goal is to reach a sum of 12 points with the least number of rolls. The best play here would be 2 consecutive rolls in which you get a 6 in both of them, and the worst would be 12 rolls getting a 1. What we want to calculate is the probability of ending the game after N runs.<\/p>\n Considering that we have 6 possible outcomes for each dice roll, the 6 faces of the dice, what is the probability of finishing the game in 1 roll?<\/p>\n Zero. Since you cannot get to 12 points if the dice goes only up to 6 points.<\/p>\n What about in the second roll?<\/p>\n It is the probability of getting two consecutive 6, which is basic statistics:<\/p>\n <\/p>\n What about in the third roll?<\/p>\n You can achieve 12 points in many ways now, some of them are: (3,3,6), (3,4,5), (3,5,4), (3,4,6), (5,5,2), (4,4,4) and etc. Now the statistics behind that calculation are not that straightforward, right? That is when Monte Carlo simulation comes handy.<\/p>\n What Monte Carlo does is to simulate thousands dice rolls and then analyze the outcome. For example, to know the probability of finishing the game on the third round, it would roll the dice three times, sum the points and store that result. After that, it would repeat those steps 5000 times and summarize how many rolls each sum of points got:<\/p>\n\n
Why not other methods?<\/h2>\n
Mean Throughput<\/h3>\n
Linear Regression Approach<\/h3>\n
\n
Manual Setting Approach<\/h3>\n
Monte Carlo Simulation<\/h2>\n
\n
Dice Game<\/h3>\n