Monte Carlo in Practice: Finding the ideal iteration value

One of the reasons to use any kind of project management methodology is to reduce costs.

A delay in a single week of a project creates two different cost types:

  • The first is the cost of the team, since they will need to work another week.
  • The second is the Cost of Delay, which is the income not collected due to the delay. For instance, suppose the new feature this team is working on would generate a daily income of $100, the Cost of Delay would be $700 ($100 * 7 days).

Since these costs can be high, using tools that provide more visibility on delivery dates is necessary. One tool that can help solve this problem is the Gantt Chart. With the Gantt Chart it’s possible to identify the most frail points of the project (the critical path) where more effort should be made to prevent delays, since a delay in any step of those points will impact the project as a whole.

Even though this tool is great for some types of projects (normally when there is a low uncertainty level), it’s not ideal for projects where the predictability is low due to the variance of the deliverables, as in projects that happen in the field of knowledge (writing a book or software development, for example).

Here at Plataformatec, we have always done our best to make our delivery predictions based on data and ensure they follow proven scientific methodology. The two methods we use the most are:

  • Linear Progression
  • Monte Carlo Simulation

LINEAR PROGRESSION

In linear progression, we do data analysis on the delivered work items during some time period (in software development we normally work with weeks) and through the study of this information we come up with the best values for the progression. For example: suppose we have the following historic values of work items by week, as shown below:

A value that could be used as a pessimistic projection would be to assume a delivery of 1 work item per week, seeing as we’ve had 5 cases where the throughput was 1 or 0. For the optimist projection we could use either the value of 3 or 4, or even 3.5, as we’ve had 4 instances where the throughput was 3 or higher. Lastly, for the likely projection, a good value would be 2, as it’s the median and the mode of this dataset.

The biggest problem with the linear progression is that we are inputting the values for each scenario, something that can be dangerous if the person operating the linear progression doesn’t have a good grasp of data analytics. Furthermore, linear progression ignores variance. Thus, in work systems with high variance of deliveries, the linear progression tool might not be the best approach.

MONTE CARLO

Another technique we can use is called Monte Carlo, which runs random picks from our historic throughput data to try and find the likelihood of delivery.

I’m having success using this method, however a question that keeps coming to my mind is: “How many interactions are necessary to make the method statistically valuable and the results reliable?”. Looking for an answer to this question, I’ve researched some statistics books, and also the internet, but wasn’t able to find any information that would help.

So I decided to run a study on this subject. Monte Carlo’s objective is to infer the probability of an event, “forcing” it to happen so many times, in order to use the law of large numbers in our favor. For example, if you flip a coin a thousand times, count how many times it falls as “heads”, and divide by a thousand, you will get an inference of the probability of getting “heads”.

To check the quality of the method, I first ran tests over probabilities for which I already knew the expected results. After that, I started increasing the complexity of the problem, in order to see if there was any correlation between the difficulty of a problem and the number of interactions necessary for the inferred value from Monte Carlo to be as close as possible to the calculated one.

The objective of the tests is to see how many interactions are necessary for the inferred value to be close enough to the calculated value that the gain of running more interactions would be too low to compensate the computer power necessary to run it (for a project probability standpoint an error in the first or second decimal values would be acceptable). I’d like to highlight that this blog post is not a scientific study of the subject, and the objective is to understand through inference a good value for project predictions.

THE TESTS

The tests were performed using the programing language R, which can be downloaded in this link. If you would rather use an IDE, I recommend using RStudio.

In each test, I’ve run the algorithm 100 times using the following interaction values: 100 (one hundred), 1000 (one thousand), 10000 (ten thousand), 100000 (one hundred thousand), 1000000 (one million), 10000000 (ten million), 100000000 (one hundred million) e 200000000 (two hundred million). The results of these 100 “rounds” are then consolidated.

THE COIN

The first test that I’ve run was the flipping of a single coin. In a coin flip, there is a 50% chance of it landing on any side, and this is the value we want the Monte Carlo to infer. For that I’ve used the following R code:

FLipCoin <- function(iterations) { # Creates a vector with the possible values for a coin 1 = Heads and 0 = Tails coin = c(0,1) result = 0 # Flips a number of coins equal to the interaction value and sum up all the times where the coin landed in "Heads" result = sum(sample(coin, iterations, replace=T)) # Turns the value in a percentage of the coin flips result = (result/iterations) * 100 return(result) } # Initiates variables result_vector = 0 time_vector = 0 control = 0 # Controls the interaction quantity iterations = 100 # Alocates the percentual of "Heads" results in a 100 size vector and the execution time in another while(control != 100) { start_time = Sys.time() result_vector = append(result_vector, FLipCoin(iterations)) finish_time = Sys.time() time = finish_time - start_time time_vector = append(time_vector, time) control = control + 1 } # Shows the percentual of "Heads" result_vector # Shows the execution times time_vector
Code language: PHP (php)

Changing the values of the iterations and compiling the results, I’ve got the following table:

  • IterationsMin resultMax resultExpected resultAverage resultAverage result - Expected resultResult's medianResult's median - Expectedresult | Result's Standard deviation
    10033.0000067.0000050.00000 50.02000 0.02000 50.00000 0.00000 5.09368
    100043.6000056.20000 50.00000 49.99440 0.00560 50.00000 0.00000 1.59105
    1000048.3500051.78000 50.00000 50.01263 0.01263 50.02000 0.02000 0.51001
    10000049.5800051.78000 50.00000 49.99110 0.00890 49.99350 0.00650 0.15805
    1000000 49.8517050.15090 50.00000 49.99883 0.00117 49.99785 0.00215 0.04811
    1000000049.95433 50.05807 50.00000 50.00013 0.00013 50.00010 0.00010 0.01564
    10000000049.98435 50.01637 50.00000 50.00004 0.00004 49.99989 0.00011 0.00516
    20000000049.9889050.01195 50.00000 49.99981 0.00019 49.99987 0.00013 0.00345

In the instance of a simple problem, like the flipping of a single coin, we can see that a good iteration value could be:

  • between 10M and 100M if you are ok with having an error in the third decimal,
  • between 100k and 1M in the second decimal
  • between 1k and 10k in the first decimal.

ONE DICE

With the coin test it was possible to infer the number of necessary interactions, based on the desired degree of reliability. However a question remains regarding if this behavior changes for more complex problems. Because of that, I’ve run a similar test with a single six-sided die that has a 16,67% chance of landing on any side.

RollDie <- function(iterations) { # Creates a vector with the possible values of the die die = 1:6 result = 0 # Alocates the die roll results result = sample(die,iterations,replace=T) # Sum up every instace where the die landed on 6 result = sum(result == 6) ​ # Turns the value in a percentage of the die rolls result = (result/iterations) * 100 return(result) } # Initiates variables result_vector = 0 time_vector = 0 control = 0 # Controls the interaction quantity iterations = 100 # Alocates the percentual of "6" results in a 100 size vector and the execution time in another while(control != 100) { start_time = Sys.time() result_vector = append(result_vector, RollDie(iterations)) finish_time = Sys.time() ​ time = finish_time - start_time time_vector = append(time_vector, time) control = control + 1 } # Shows the percentual of "6" result_vector # Shows the execution times time_vector
Code language: PHP (php)

Changing the values of the iterations and compiling the results , I’ve got the following table:

IterationsMin resultMax resultExpected resultAverage resultAverage result - Expected resultResult's medianResult's median - Expectedresult | Result's Standard deviation
10033.0000067.0000050.00000 50.02000 0.02000 50.00000 0.00000 5.09368
100043.6000056.20000 50.00000 49.99440 0.00560 50.00000 0.00000 1.59105
1000048.3500051.78000 50.00000 50.01263 0.01263 50.02000 0.02000 0.51001
10000049.5800051.78000 50.00000 49.99110 0.00890 49.99350 0.00650 0.15805
1000000 49.8517050.15090 50.00000 49.99883 0.00117 49.99785 0.00215 0.04811
1000000049.95433 50.05807 50.00000 50.00013 0.00013 50.00010 0.00010 0.01564
10000000049.98435 50.01637 50.00000 50.00004 0.00004 49.99989 0.00011 0.00516
20000000049.9889050.01195 50.00000 49.99981 0.00019 49.99987 0.00013 0.00345

We can see that in this case, the iteration values are the same as for the instance of the coin toss, where a good value could be something between 10M and 100M if you need to have an error in the third decimal, between 100k and 1M for an error in the second decimal and between 1k and 10k for an error in the first decimal.

This test result shows that, either the complexity of the problem has no influence on the number of interactions or that both problems (coin toss and single die roll) have similar complexity levels.

Analyzing the execution times for both problems, it’s possible to see that they took about the same time. This also corroborates with the hypothesis that they have similar complexity levels.

TWO DICE

To test the hypothesis that the complexity of the above problems were too close, and this was the reason for the interaction values to be close, I’ve decided to double the complexity of the die roll problem by rolling two dice instead of only one (it’s important to note that doubling the quantity of functions doesn’t always mean that the complexity order will double as well, as it depends on the Big-O Notation). The highest probability value for two dice is 7, with a 16,67% chance of occurrence.

RollsTwoDice <- function(iterations) { # Creates a vector with the possible values of the die die = 1:6 result = 0 # Alocates the die roll results first_roll = sample(die,iterations,replace=T) second_roll = sample(die,iterations,replace=T) # Sum the vector values position-wise result = first_roll + second_roll # Sum up every instace where the dice landed on 7 result = sum(result == 7) # Turns the value in a percentage of the dice rolls result = (result/iterations) * 100 return(result) } # Initiates variables result_vector = 0 time_vector = 0 control = 0 # Controls the interaction quantity iterations = 100 # Alocates the percentual of "7" results in a 100 size vector and the execution time in another while(control != 100) { start_time = Sys.time() result_vector = append(result_vector, RollsTwoDice(iterations)) finish_time = Sys.time() time = finish_time - start_time time_vector = append(time_vector, time) control = control + 1 } # Shows the percentual of "7" result_vector # Shows the execution times time_vector
Code language: PHP (php)

Changing the values of the iterations and compiling the results, I’ve got the following table:

IterationsMin resultMax resultExpected resultAverage resultAverage result - Expected resultResult's medianResult's median - Expectedresult | Result's Standard deviation
1006.0000028.0000016.6666716.58100-0.0856716.00000-0.666673.64372
100013.2000020.4000016.6666716.685000.0183316.60000-0.066671.16949
1000015.6000017.9400016.6666716.65962-0.0070516.670000.003330.38072
10000016.2220017.0620016.6666716.66109-0.0055716.66500-0.001670.11356
1000000 16.5496016.5496016.6666716.66616-0.0005016.667900.001230.03650
1000000016.6329416.7026616.6666716.66607-0.0006016.66577-0.000900.01220
10000000016.6559216.6794816.6666716.666700.0000416.666790.000120.00372
20000000016.6578716.6747616.6666716.666710.0000416.666710.000040.00267

Again we find the same interaction quantity for each error level. In the case of this problem it’s possible to identify that it was more complex since the execution for 200M took on average 7.47 seconds longer when compared to the single die roll.

FIVE DICE

The inference that I’ve built until this point is that the complexity of the problem influences very little in the iteration quantities, but to be sure of that, I ran one last test with 5 dice. The highest probability when rolling five dice is the values of 17 and 18, with 10.03% probability each.

Since this problem is a lot more complex than the others I’ve decided to run only 10 times each instead of 100.

RollsFiveDice <- function(iterations) { # Creates a vector with the possible values of the die dado = 1:6 result = 0 # Alocates the die roll results first_roll = sample(dado,iterations,replace=T) second_roll = sample(dado,iterations,replace=T) terceira_rolagem = sample(dado,iterations,replace=T) quarta_rolagem = sample(dado,iterations,replace=T) quinta_rolagem = sample(dado,iterations,replace=T) # Sum the vector values position-wise result = first_roll + second_roll + terceira_rolagem + quarta_rolagem + quinta_rolagem # Sum up every instace where the dice landed on 18 result = sum(result == 18) # Turns the value in a percentage of the dice rolls result = (result/iterations) * 100 return(result) } # Initiates variables result_vector = 0 time_vector = 0 control = 0 # Controls the interaction quantity iterations = 10 # Alocates the percentual of "18" results in a 100 size vector and the execution time in another while(control != 100) { start_time = Sys.time() result_vector = append(result_vector, RollsFiveDice(iterations)) finish_time = Sys.time() time = finish_time - start_time time_vector = append(time_vector, time) control = control + 1 } # Shows the percentual of "18" result_vector # Shows the execution times time_vector
Code language: PHP (php)

Changing the values of the iterations and compiling the results, I’ve got the following table:

IterationsMin resultMax resultExpected resultAverage resultAverage result - Expected resultResult's medianResult's median - Expected resultResult's Standard deviation
1005.0000031.0000016.6666716.37100-0.2956716.00000-0.666673.77487
100013.4000021.8000016.6666716.64710-0.0195716.60000-0.066671.18550
1000015.3300017.7700016.6666716.683070.0164016.685000.018330.38059
10000016.2310017.0660016.6666716.66546-0.0012016.668000.001330.11597
1000000 16.5445016.8117016.6666716.66657-0.0001016.668600.001930.03798
1000000016.6288816.7028216.6666716.66683 0.0001616.667050.000380.01155
10000000016.6540816.6789916.6666716.666690.0000216.666730.000070.00382
20000000016.6588116.6745516.6666716.666750.0000916.666820.000150.00258

Once again we can see that the values for the iterations are the same as above. It’s possible to notice that this problem is clearly more complex, as it took 35 seconds longer when compared to the roll of a single die.

CONCLUSION

After running those tests, it’s possible to conclude that for the use on project management, a good iteration value would be somewhere between 1k and 100M, depending on the error level as shown below:

  • Error in the first decimal: 1k to 10k
  • Error in the second decimal: 100k to 1M
  • Error in the third decimal: 10M to 100M

The value that I’ve been using is 3M, for stakeholder reports. However, when I’m running the method for my own analysis, I use a lower value, somewhere between 250k and 750k.

How about you? Have you tried to run Monte Carlo Simulations? How many iterations are you using? Tell us in the comment section below or on Twitter @plataformatec.

Comments are closed.