Monte Carlo in Practice: Finding the ideal iteration value

One of the reasons to use any kind of project management methodology is to reduce costs.

A delay in a single week of a project creates two different cost types:

The first is the cost of the team, since they will need to work another week.
The second is the Cost of Delay, which is the income not collected due to the delay. For instance, suppose the new feature this team is working on would generate a daily income of $100, the Cost of Delay would be $700 ($100 * 7 days).

Since these costs can be high, using tools that provide more visibility on delivery dates is necessary. One tool that can help solve this problem is the Gantt Chart. With the Gantt Chart it’s possible to identify the most frail points of the project (the critical path) where more effort should be made to prevent delays, since a delay in any step of those points will impact the project as a whole.

Even though this tool is great for some types of projects (normally when there is a low uncertainty level), it’s not ideal for projects where the predictability is low due to the variance of the deliverables, as in projects that happen in the field of knowledge (writing a book or software development, for example).

Here at Plataformatec, we have always done our best to make our delivery predictions based on data and ensure they follow proven scientific methodology. The two methods we use the most are:

Linear Progression
Monte Carlo Simulation

LINEAR PROGRESSION

In linear progression, we do data analysis on the delivered work items during some time period (in software development we normally work with weeks) and through the study of this information we come up with the best values for the progression. For example: suppose we have the following historic values of work items by week, as shown below:

A value that could be used as a pessimistic projection would be to assume a delivery of 1 work item per week, seeing as we’ve had 5 cases where the throughput was 1 or 0. For the optimist projection we could use either the value of 3 or 4, or even 3.5, as we’ve had 4 instances where the throughput was 3 or higher. Lastly, for the likely projection, a good value would be 2, as it’s the median and the mode of this dataset.

The biggest problem with the linear progression is that we are inputting the values for each scenario, something that can be dangerous if the person operating the linear progression doesn’t have a good grasp of data analytics. Furthermore, linear progression ignores variance. Thus, in work systems with high variance of deliveries, the linear progression tool might not be the best approach.

MONTE CARLO

Another technique we can use is called Monte Carlo, which runs random picks from our historic throughput data to try and find the likelihood of delivery.

I’m having success using this method, however a question that keeps coming to my mind is: “How many interactions are necessary to make the method statistically valuable and the results reliable?”. Looking for an answer to this question, I’ve researched some statistics books, and also the internet, but wasn’t able to find any information that would help.

So I decided to run a study on this subject. Monte Carlo’s objective is to infer the probability of an event, “forcing” it to happen so many times, in order to use the law of large numbers in our favor. For example, if you flip a coin a thousand times, count how many times it falls as “heads”, and divide by a thousand, you will get an inference of the probability of getting “heads”.

To check the quality of the method, I first ran tests over probabilities for which I already knew the expected results. After that, I started increasing the complexity of the problem, in order to see if there was any correlation between the difficulty of a problem and the number of interactions necessary for the inferred value from Monte Carlo to be as close as possible to the calculated one.

The objective of the tests is to see how many interactions are necessary for the inferred value to be close enough to the calculated value that the gain of running more interactions would be too low to compensate the computer power necessary to run it (for a project probability standpoint an error in the first or second decimal values would be acceptable). I’d like to highlight that this blog post is not a scientific study of the subject, and the objective is to understand through inference a good value for project predictions.

THE TESTS

The tests were performed using the programing language R, which can be downloaded in this link. If you would rather use an IDE, I recommend using RStudio.

In each test, I’ve run the algorithm 100 times using the following interaction values: 100 (one hundred), 1000 (one thousand), 10000 (ten thousand), 100000 (one hundred thousand), 1000000 (one million), 10000000 (ten million), 100000000 (one hundred million) e 200000000 (two hundred million). The results of these 100 “rounds” are then consolidated.

THE COIN

The first test that I’ve run was the flipping of a single coin. In a coin flip, there is a 50% chance of it landing on any side, and this is the value we want the Monte Carlo to infer. For that I’ve used the following R code:


FLipCoin  <-  function(iterations)
{
  # Creates  a  vector  with  the  possible  values  for  a  coin  1  =  Heads  and  0  =  Tails
  coin  =  c(0,1)
  result  =  0

  # Flips  a  number  of  coins  equal  to  the  interaction  value  and  sum  up  all  the  times  where  the  coin  landed  in  "Heads"
  result  =  sum(sample(coin,  iterations,  replace=T))

  # Turns  the  value  in  a  percentage  of  the  coin  flips
  result  =  (result/iterations)  *  100

  return(result)
}

# Initiates  variables
result_vector  =  0
time_vector  =  0
control  =  0

# Controls  the  interaction  quantity
iterations  =  100

# Alocates  the  percentual  of  "Heads"  results  in  a  100  size  vector  and  the  execution  time  in  another
while(control  !=  100)
{
  start_time  =  Sys.time()
  result_vector  =  append(result_vector,  FLipCoin(iterations))  
  finish_time  =  Sys.time()

  time  =  finish_time  -  start_time
  time_vector  =  append(time_vector,  time)  

  control  =  control  +  1
}

# Shows  the  percentual  of  "Heads"
result_vector

# Shows  the  execution  times
time_vector

Code language: PHP (php)

Changing the values of the iterations and compiling the results, I’ve got the following table:

Iterations	Min result	Max result	Expected result	Average result	Average result - Expected result	Result's median	Result's median - Expected	result \| Result's Standard deviation
100	33.00000	67.00000	50.00000	50.02000	0.02000	50.00000	0.00000	5.09368
1000	43.60000	56.20000	50.00000	49.99440	0.00560	50.00000	0.00000	1.59105
10000	48.35000	51.78000	50.00000	50.01263	0.01263	50.02000	0.02000	0.51001
100000	49.58000	51.78000	50.00000	49.99110	0.00890	49.99350	0.00650	0.15805
1000000	49.85170	50.15090	50.00000	49.99883	0.00117	49.99785	0.00215	0.04811
10000000	49.95433	50.05807	50.00000	50.00013	0.00013	50.00010	0.00010	0.01564
100000000	49.98435	50.01637	50.00000	50.00004	0.00004	49.99989	0.00011	0.00516
200000000	49.98890	50.01195	50.00000	49.99981	0.00019	49.99987	0.00013	0.00345

In the instance of a simple problem, like the flipping of a single coin, we can see that a good iteration value could be:

between 10M and 100M if you are ok with having an error in the third decimal,
between 100k and 1M in the second decimal
between 1k and 10k in the first decimal.

ONE DICE

With the coin test it was possible to infer the number of necessary interactions, based on the desired degree of reliability. However a question remains regarding if this behavior changes for more complex problems. Because of that, I’ve run a similar test with a single six-sided die that has a 16,67% chance of landing on any side.


RollDie  <-  function(iterations)
{
  # Creates  a  vector  with  the  possible  values  of  the  die
  die  =  1:6
  result  =  0

  # Alocates  the  die  roll  results
  result  =  sample(die,iterations,replace=T)

  # Sum  up  every  instace  where  the  die  landed  on  6
  result  =  sum(result  ==  6)


  # Turns  the  value  in  a  percentage  of  the  die  rolls
  result  =  (result/iterations)  *  100

  return(result)
}

# Initiates  variables
result_vector  =  0
time_vector  =  0
control  =  0

# Controls  the  interaction  quantity
iterations  =  100

# Alocates  the  percentual  of  "6"  results  in  a  100  size  vector  and  the  execution  time  in  another
while(control  !=  100)
{
  start_time  =  Sys.time()
  result_vector  =  append(result_vector,  RollDie(iterations))  
  finish_time  =  Sys.time()


  time  =  finish_time  -  start_time
  time_vector  =  append(time_vector,  time)  

  control  =  control  +  1
}

# Shows  the  percentual  of  "6"
result_vector

# Shows  the  execution  times
time_vector

Code language: PHP (php)

Changing the values of the iterations and compiling the results , I’ve got the following table:

Iterations	Min result	Max result	Expected result	Average result	Average result - Expected result	Result's median	Result's median - Expected	result \| Result's Standard deviation
100	33.00000	67.00000	50.00000	50.02000	0.02000	50.00000	0.00000	5.09368
1000	43.60000	56.20000	50.00000	49.99440	0.00560	50.00000	0.00000	1.59105
10000	48.35000	51.78000	50.00000	50.01263	0.01263	50.02000	0.02000	0.51001
100000	49.58000	51.78000	50.00000	49.99110	0.00890	49.99350	0.00650	0.15805
1000000	49.85170	50.15090	50.00000	49.99883	0.00117	49.99785	0.00215	0.04811
10000000	49.95433	50.05807	50.00000	50.00013	0.00013	50.00010	0.00010	0.01564
100000000	49.98435	50.01637	50.00000	50.00004	0.00004	49.99989	0.00011	0.00516
200000000	49.98890	50.01195	50.00000	49.99981	0.00019	49.99987	0.00013	0.00345

We can see that in this case, the iteration values are the same as for the instance of the coin toss, where a good value could be something between 10M and 100M if you need to have an error in the third decimal, between 100k and 1M for an error in the second decimal and between 1k and 10k for an error in the first decimal.

This test result shows that, either the complexity of the problem has no influence on the number of interactions or that both problems (coin toss and single die roll) have similar complexity levels.

Analyzing the execution times for both problems, it’s possible to see that they took about the same time. This also corroborates with the hypothesis that they have similar complexity levels.

TWO DICE

To test the hypothesis that the complexity of the above problems were too close, and this was the reason for the interaction values to be close, I’ve decided to double the complexity of the die roll problem by rolling two dice instead of only one (it’s important to note that doubling the quantity of functions doesn’t always mean that the complexity order will double as well, as it depends on the Big-O Notation). The highest probability value for two dice is 7, with a 16,67% chance of occurrence.

RollsTwoDice <- function(iterations)
{
  # Creates a vector with the possible values of the die
  die = 1:6
  result = 0
	
  # Alocates the die roll results
  first_roll = sample(die,iterations,replace=T)
  second_roll = sample(die,iterations,replace=T)

  # Sum the vector values position-wise
  result = first_roll + second_roll

  # Sum up every instace where the dice landed on 7
  result = sum(result == 7)

  # Turns the value in a percentage of the dice rolls
  result = (result/iterations) * 100

  return(result)
}

# Initiates variables
result_vector = 0
time_vector = 0
control = 0

# Controls the interaction quantity
iterations = 100

# Alocates the percentual of "7" results in a 100 size vector and the execution time in another
while(control != 100)
{
  start_time = Sys.time()
  result_vector = append(result_vector, RollsTwoDice(iterations)) 
  finish_time = Sys.time()

  time = finish_time - start_time
  time_vector = append(time_vector, time) 

  control = control + 1
}

# Shows the percentual of "7"
result_vector

# Shows the execution times
time_vector
Code language: PHP (php)

Changing the values of the iterations and compiling the results, I’ve got the following table:

Iterations	Min result	Max result	Expected result	Average result	Average result - Expected result	Result's median	Result's median - Expected	result \| Result's Standard deviation
100	6.00000	28.00000	16.66667	16.58100	-0.08567	16.00000	-0.66667	3.64372
1000	13.20000	20.40000	16.66667	16.68500	0.01833	16.60000	-0.06667	1.16949
10000	15.60000	17.94000	16.66667	16.65962	-0.00705	16.67000	0.00333	0.38072
100000	16.22200	17.06200	16.66667	16.66109	-0.00557	16.66500	-0.00167	0.11356
1000000	16.54960	16.54960	16.66667	16.66616	-0.00050	16.66790	0.00123	0.03650
10000000	16.63294	16.70266	16.66667	16.66607	-0.00060	16.66577	-0.00090	0.01220
100000000	16.65592	16.67948	16.66667	16.66670	0.00004	16.66679	0.00012	0.00372
200000000	16.65787	16.67476	16.66667	16.66671	0.00004	16.66671	0.00004	0.00267

Again we find the same interaction quantity for each error level. In the case of this problem it’s possible to identify that it was more complex since the execution for 200M took on average 7.47 seconds longer when compared to the single die roll.

FIVE DICE

The inference that I’ve built until this point is that the complexity of the problem influences very little in the iteration quantities, but to be sure of that, I ran one last test with 5 dice. The highest probability when rolling five dice is the values of 17 and 18, with 10.03% probability each.

Since this problem is a lot more complex than the others I’ve decided to run only 10 times each instead of 100.

RollsFiveDice <- function(iterations)
{
  # Creates a vector with the possible values of the die
  dado = 1:6
  result = 0
	
  # Alocates the die roll results
  first_roll = sample(dado,iterations,replace=T)
  second_roll = sample(dado,iterations,replace=T)
  terceira_rolagem = sample(dado,iterations,replace=T)
  quarta_rolagem = sample(dado,iterations,replace=T)
  quinta_rolagem = sample(dado,iterations,replace=T)

  # Sum the vector values position-wise
  result = first_roll + second_roll + terceira_rolagem + quarta_rolagem + quinta_rolagem

  # Sum up every instace where the dice landed on 18
  result = sum(result == 18)

  # Turns the value in a percentage of the dice rolls
  result = (result/iterations) * 100
	
  return(result)
}

# Initiates variables
result_vector = 0
time_vector = 0
control = 0

# Controls the interaction quantity
iterations = 10

# Alocates the percentual of "18" results in a 100 size vector and the execution time in another
while(control != 100)
{
  start_time = Sys.time()
  result_vector = append(result_vector, RollsFiveDice(iterations)) 
  finish_time = Sys.time()

  time = finish_time - start_time
  time_vector = append(time_vector, time) 

  control = control + 1
}

# Shows the percentual of "18"
result_vector

# Shows the execution times
time_vector
Code language: PHP (php)

Changing the values of the iterations and compiling the results, I’ve got the following table:

Iterations	Min result	Max result	Expected result	Average result	Average result - Expected result	Result's median	Result's median - Expected result	Result's Standard deviation
100	5.00000	31.00000	16.66667	16.37100	-0.29567	16.00000	-0.66667	3.77487
1000	13.40000	21.80000	16.66667	16.64710	-0.01957	16.60000	-0.06667	1.18550
10000	15.33000	17.77000	16.66667	16.68307	0.01640	16.68500	0.01833	0.38059
100000	16.23100	17.06600	16.66667	16.66546	-0.00120	16.66800	0.00133	0.11597
1000000	16.54450	16.81170	16.66667	16.66657	-0.00010	16.66860	0.00193	0.03798
10000000	16.62888	16.70282	16.66667	16.66683	0.00016	16.66705	0.00038	0.01155
100000000	16.65408	16.67899	16.66667	16.66669	0.00002	16.66673	0.00007	0.00382
200000000	16.65881	16.67455	16.66667	16.66675	0.00009	16.66682	0.00015	0.00258

Once again we can see that the values for the iterations are the same as above. It’s possible to notice that this problem is clearly more complex, as it took 35 seconds longer when compared to the roll of a single die.

CONCLUSION

After running those tests, it’s possible to conclude that for the use on project management, a good iteration value would be somewhere between 1k and 100M, depending on the error level as shown below:

Error in the first decimal: 1k to 10k
Error in the second decimal: 100k to 1M
Error in the third decimal: 10M to 100M

The value that I’ve been using is 3M, for stakeholder reports. However, when I’m running the method for my own analysis, I use a lower value, somewhere between 250k and 750k.

How about you? Have you tried to run Monte Carlo Simulations? How many iterations are you using? Tell us in the comment section below or on Twitter @plataformatec.

Tags: Agile, Posted in English, Comments Off