Wednesday, December 17, 2008

Simulating Mamma Mia under the xmas tree (with Python)

Norway has a population of ~4.7 million, and "Mamma Mia!" has during a few weeks been sold in ~600 thousand DVD/Blueray copies (> 12% of the population, i.e. breaking every sales record, perhaps with the exception of Sissel Kyrkjebø's Christmas Carol album which has sold a total of ~900 thousand copies in the same population, but that was over a period of 21 years).  In the UK it has sold more than 5 million (in a population of 59 million, i.e. >8% of the population).

Well, to the point. Guesstimating that there will be ~2 million xmas trees in Norway, one can assume that many of the trees will have (much) more than one "Mamma Mia!" dvd/blueray underneath it*,  the question is how many?

Simulating Mamma Mia under the xmas tree
.. Before you start recapping probability theory, combinatorics, birthday paradox, multiplication of improbabilities and whatnot, how about finding an alternative solution using simulation

A simple simulation model could be to assume that for every Mamma Mia dvd/blueray copy there is a lottery where all trees participate,  and then finally (or incrementally) count how many copies each tree won.

A friend wrote a nice Python script to simulate this:

import random

NUM_XMAS_TREES = 2000000
NUM_MAMMA_MIAS = 600000

tree_supplies = {}
for mamma_mia in range(NUM_MAMMA_MIAS):
    winner_tree = random.randint(0, NUM_XMAS_TREES)
    tree_supplies[winner_tree] = tree_supplies.get(winner_tree,0) + 1

tree_stats = {}
for tree in tree_supplies:
    tree_stats[tree_supplies[tree]] = tree_stats.get(tree_supplies[tree], 0) + 1

print tree_stats

Results:

$ for k in `seq 1 10`; do echo -n "$k " ; python mammamia.py; done
1 {1: 443564, 2: 67181, 3: 6618, 4: 510, 5: 36}
2 {1: 444497, 2: 66811, 3: 6543, 4: 520, 5: 32, 6: 2}
3 {1: 444796, 2: 66376, 3: 6738, 4: 510, 5: 36, 6: 3}
4 {1: 444499, 2: 66750, 3: 6652, 4: 469, 5: 30, 6: 2, 7: 1}
5 {1: 444347, 2: 66717, 3: 6697, 4: 494, 5: 28, 6: 2}
6 {1: 444511, 2: 66389, 3: 6763, 4: 551, 5: 40, 6: 3}
7 {1: 443914, 2: 66755, 3: 6785, 4: 511, 5: 33, 6: 2}
8 {1: 444747, 2: 66558, 3: 6667, 4: 484, 5: 40}
9 {1: 444553, 2: 66703, 3: 6631, 4: 497, 5: 32}
10 {1: 443903, 2: 66853, 3: 6774, 4: 487, 5: 23, 6: 1}

Conclusion

So we see that in run 4 there was one xmas tree that according to the simulation model got 7(!) Mamma Mia DVD/Bluerays underneath it, but the overall simulation shows that 5 or 6 (at most) is probably more likely (assuming the model is right).

Regarding the simulation model, it is probably way too simplistic, i.e. not taking into account people buying mamma mia for themselves (or not as xmas gifts), a likely skewness in terms of number of gifts per christmas tree, interaction between buyers, etc. But it can with relatively simple manners be extended with code to make it more realistic. Check out Markov Chain Monte Carlo simulation for more info on how to create more realistic simulation models.