Monte Carlo Simulation and Variants with Python

Your Guide to Monte Carlo Simulation and Must Know Statistical Sampling Techniques With Python Implementation

Monte Carlo Simulation is based on repeated random sampling. The underlying concept of Monte Carlo is to use randomness to solve problems that might be deterministic in principle. Monte Carlo simulation is one of the most popular techniques to draw inferences about a population without knowing the true underlying population distribution. This sampling technique becomes handy especially when one doesn’t have the luxury to repeatedly sample from the original population. Applications of Monte Carlo Simulation range from solving problems in theoretical physics to predicting trends in financial investments.

Monte Carlo has 3 main usages: estimate parameters or statistical measures, examine the properties of the estimates, approximate integrals

This article is about these 3 usages of the Monte Carlo procedures and about 3 Monte Carlo variants, statistical sampling techniques, which can be used to generate independent random samples. The article will cover the following topics:

- Introduction to Monte Carlo Simulation
- MC Parameter Estimation
- MC Examining the Estimate Properties
- MC Integrals Approximation
- Importance Sampling
- Rejection Sampling
- Inverse Transform Sampling

This article is suited for readers who have prior Statistical knowledge since it will cover medium-level statistical concepts and examples. If you want to learn essential statistical concepts from scratch, you can check my previous article about Fundamentals Of Statistics here.

Monte Carlo Simulation Introduction

Monte Carlo simulation was initially invented to solve [Buffon’s needle problem](en.wikipedia.org/wiki/Buffon%27s_needle_pro.. "Buffon's needle problem"), in which π, pi, could be estimated by dropping needles on a floor made of parallel equidistant strips. The modern version of Monte Carlo Simulation was invented by Stanislaw Ulam, inventor of the modern version of the Markov Chain Monte Carlo technique during his work on nuclear weapons projects, and John von Neumann who programmed a special computer to perform Monte Carlo calculations. Von Neumann is also known for his famous approach to making unfair dice a fair one by tossing an unfair coin twice and ignoring the differing {Head, Tail} and {Tail, Head} options.

Ulam’s proposed approach which was based on a list of truly random numbers was extremely slow, but Von Neumann developed a method to calculate pseudorandom numbers, which was much faster than Ulam’s approach. The name of this approach comes from the [Monte Carlo Casino](en.wikipedia.org/wiki/Monte_Carlo_Casino "A Million Random Digits with 100,000 Normal Deviates") in Monaco where Ulam’s uncle would borrow money from relatives to gamble.

Von Neumann is also known for his famous approach to making an unfair dice a fair one by tossing unfair coin twice and ignoring the differing HT and TH options.

The idea behind Monte Carlo is that, as we use more samples, our answers should get more and more accurate.

Monte Carlo Estimation

Monte Carlo uses the Law of Large Numbers (LLN) to come up with an estimate for a population parameter using simulated values. LLN states: suppose X1, X2, . . . , Xn are all independent random variables with the same underlying distribution, also called independent identically-distributed or i.i.d, where all X’s have the same mean μ and standard deviation σ. As the sample size grows, the probability that the average of all X’s is equal to the mean μ is equal to 1. The LLN can be summarized as follows:

Consequently, the idea behind Monte Carlo estimation is that when we obtain an estimate for a parameter a large number of times, let’s say M = 10000 times, then the mean of these estimates will form a Monte Carlo unbiased estimate for that parameter.

Monte Carlo Estimation: Python Implementation

Let’s say we want to estimate a causal effect between two variables, pair of independent and dependent variables and we have some idea about possible values of intercept alpha and slope parameter beta.

What we can do is we can randomly sample from normal distribution to generate error terms, dependent and independent variable values. Then we can estimate the coefficient of beta, beta_hat, and repeat this process M = 10000 times. Then by the LLN, the sample mean of these 10000 beta_hats will be an unbiased estimate for the true beta. That is:

Examining Estimate Properties with Monte Carlo

Monte Carlo Simulation is a powerful tool t examine the properties of estimators. This might not be very useful when examining the properties of Linear Regression estimates since most of the statistical software packages already provide measures to examine the estimates. However, in the case of other estimates using Monet Carlo Estimation might be the only way to find out whether the estimator is unbiased, efficient, and consistent.

Is the Estimate Unbiased?

The bias of an estimator is the difference between its expected value and the true value of the parameter being estimated and can be expressed as follows:

When we state that the estimator is unbiased what we mean is that the bias is equal to zero, which implies that the expected value of the estimator is equal to the true parameter value, that is:

To check whether the estimator is unbiased, we can use the Monte Carlo samples of beta_hats that we obtained in the previous step and draw this sampling distribution. Then, if this sampling distribution is centered around the true parameter value, then the estimator is unbiased.

From the figure we can see that, the sampling distribution of beta estimates that we obtained using Monte Carlo simulation with 10000 iterations, is centered around the true parameter beta. So, the Monte Carlo estimate of beta_hat is unbiased.

Image Source: The Author

Is the Estimate Consistent and Efficient?

If the estimator converges to the true parameter as the sample size becomes very large, then this estimator is said to be consistent, that is:

To check whether the estimator is consistent, we can use the Monte Carlo Samples of beta_hats that we obtained in the previous step and draw its sampling distribution for a small and large number of M simulations. If we see that the sampling distribution gets narrower and more centered around the true parameter values as the sample size increases (number of Monte Carlo Simulation interactions), then the estimate is likely to be consistent.

A parameter can have multiple estimators but the one with the lowest variance is called efficient**.** To check whether the estimator is efficient, we can use the Monte Carlo Samples of beta_hats that we obtained in the previous step and draw this sampling distribution for a small and large number of M simulations. If we see that the width of the sampling distribution gets smaller as the sample size increases, then the estimate is likely to be efficient.

Let’s say we run a Monte Carlo simulation with M = 1000 and obtain a Monte Carlo estimate for beta (left histogram). Moreover, we rerun this simulation with M = 10000 (right histogram). We see that as M increases from 1000 to 10000, the sampling distribution of beta_hats gets more centered around true parameter value. So, the Monte Carlo estimate of beta_hat is consistent. We see that as M increases from 1000 to 10000, the width of the sampling distribution of beta_hat decreases. So, the Monte Carlo estimate of beta_hat is efficient.

Image Source: The Author

Approximating Integrals with Monte Carlo

For known functions such as the Normal distribution function, calculating integral might be simple and would not require the usage of MC. However, for more complicated functions computing integrals might be very hard and in those situations using MC could be the only way to calculate this integral.

MC for approximating integrals is based on the LLN and the idea behind it is that if we can generate random samples xi from a given distribution P(x), then we can estimate the expected value of a function under this distribution by summation, rather than integration. Stated differently, we can find the value of an integral by determining the average value of its integrand, h(x). The MC is based on this principle, as we saw earlier.

MC Integral Approximation: Python Implementation

Let’s say we want to obtain the probability Pr[X ≥ 3] where X~Norm(10, 2), which can be expressed by the following integral where f(x) is the pdf function of Normal distribution.

Then this integral can be obtained using Monte Carlo simulation by calculating this amount 10000 times and taking the mean of these values.

Importance Sampling

Importance sampling is one way to make Monte Carlo simulations converge much faster. Moreover, Importance sampling results also in lower variance compared to the naive Monte Carlo approach. It is used for estimating the expected value of a certain h(x) function from target distribution g(x) while having access to some f(x) function. The idea is to use some proposal distribution f(x) to sample from and using importance weights w = g(x)/f(x) which mitigate this effect of overestimating or underestimating the target distribution g(x).

Importance Sampling: Python Implementation

Let’s say we want to estimate the expected value of a random variable X which follows Exponential Distribution. So, our target distribution is exponential , g(x) = Exp(1), and the h(x) = exp(-X + cos(X)).

For Importance sampling we need to choose a proposal distribution that is as close as possible to the target distribution, hence we choose f(x) = Exp(2). Given that the pdf of the exponential function is exp(-lambda*x), then we have:

where the f(x) is the proposal distribution from which we can sample, h(x) the function for which the expected value needs to be estimated, g(x) is the target distribution from which we can’t sample, w(x) is the importance weights. Then the expectation can of h(x) be expressed as follows:

Then using the LLN, we can express this expectation as follows:

Following graph visualizes the h(x), the target distribution g(x) and the proposal distribution f(x) plots. As we can see, the proposal distribution is very close to the target distribution. The f(x) is used to randomly sample 1000 observations and calculate each time the expression g(x)/f(x)*h(x), that is to calculate the importance weight and multiply it with h(x). Then, we take the mean of these values which gives us 1.52 which is the Expected Value we were looking for.

Image Source: The Author

Rejection Sampling

Rejection Sampling is usually used to generate independent samples from the unnormalized target distribution. The idea behind this Monte Carlo sampling variant is that if we want to generate random samples from target unnormalized distribution P(x) then we can use some proposal distribution Q(x), a normalization constant c such that cQ(x) is an upper bound for some auxiliary distribution P*(x) where P(x) = P*(x) / c to come up with a list of samples from which the accepted values will form independent samples from the target P(X) distribution.

1: Choose a proposal function Q(X) that is close to target distribution P(X)

2: Choose a normalization constant c such that P*(x) ≤ cQ(x)

3: Choose auxiliary distribution P*(X) s.t. P(x) = P*(x)/c

4: Generate random samples, x, from Q(X)

5: Generate random samples, u, from Unif(0 , cQ(x))

6: Repeat step 1 and 2 M times (e.g. 10000 times)

7: Accept x if u ≤ P*(x) and reject otherwise

Image Source: The Author

Important Requirements for Rejection Sampling:

Proposal distribution Q(x) to sample from (Uniform or Gaussian)
Normalization constant c such that c*Q(x)
Auxiliary distribution P*(x)
Access to sampling from the proposal distribution

Rejection Sampling: Python Implementation

Let's say we want to generate independent samples from a mixture of Normal Distributions, which we expect to have a distribution similar to p*(x) = N(30,10) + N(80,20). We have access to Normal and Uniform distributions to sample from to generate these target samples. We can use a proposal function Q(x) = N(50,30), with a normalization constant c = max(P*(x)/Q(x)).

We then follow the steps described earlier to generate samples from which a very large amount gets rejected and some get accepted. Following histogram visualizes the set of accepted samples which are independent samples from a Mixture of Normal distribution, while having access only to the Normal and Uniform random generators.

Image Source: The Author

Rejection Sampling is highly inefficient given that it rejects very large amount of sample points, which results in very large computation time.

Inverse Transform Sampling

Like Rejection sampling, Inverse Transform Sampling is a way to generate independent samples but unlike Rejection Sampling, Inverse Transform Sapling is 100% efficient. The idea behind Inverse Transform Sampling is to use the Inverse Cumulative Distribution Function of a target population distribution to which we have no access to sample from and use a random generator from which we can easily sample, to generate an independent random sample from the target population.

1: Sample value u from Unif(0,1)

2: Use the inverse CDF function of target distribution to get the x value corresponding to the inverse CDF with value u

3: Repeat step 1 and 2 M times (e.g. 10000 times)

4: Collect these x values which follow the desired distribution

Important Requirements for ITS:

Access to sampling from Unif(0,1)
Know the target distribution PDF/CDF
Able to determine the inverse CDF of the target distribution

Inverse Transform Sampling: Python Implementation

Let’s say we want to generate independent samples from Exponential distribution with lambda equal to1 while we can only sample from Uniform distribution. Hence, we can determine the inverse CDF of Exponential distribution as follows:

Then, we randomly sample from Unif(0,1) and use this value u to determine the x using the defined inverse CDF of the target distribution, which is -log(1-u). Once this process is repeated M = 10000 times, the stored samples are independent samples from the target distribution. The following histogram visualizes these samples.

Image Source: The Author

Unlike Rejection Sampling, Inverse Transform Sapling is 100% efficient.

Notes on the Choice of Proposal Distribution

The computational efficiency of the Monte Carlo variant will be the best if the proposal distribution looks a lot like the target distribution.
Importance Sampling, Rejection Sampling, and Inverse Transform Sampling methods can all fail badly when the proposal distribution has 0 density in a region where the target distribution has non-negligible density. Thus, the proposal distribution should have heavy tails.

Additional Resources

[Fundamentals Of Statistics For Data Scientists and Data Analysts
Key statistical concepts for your data science or data analytics journeytowardsdatascience.com](https://towardsdatascience.com/fundamentals-of-statistics-for-data-scientists-and-data-analysts-69d93a05aae7 "towardsdatascience.com/fundamentals-of-stat..")

https://www.youtube.com/watch?v=kYWHfgkRc9s&ab_channel=BenLambert

https://www.youtube.com/watch?v=V8f8ueBc9sY&t=1s&ab_channel=BenLambert

https://www.youtube.com/watch?v=rnBbYsysPaU&t=566s&ab_channel=BenLambert

Additional Resources

[TatevKaren — Overview
I’m Tatev Karen Aslanyan, Data Scientist and Quantitative Analyst with strong background in Mathematics, Statistics…github.com](https://github.com/TatevKaren "github.com/TatevKaren")

If you liked this article, here are some other articles you may enjoy:

[How To Crack Spotify Data Science Technical Screen Interview
List of exact Python/SQL commands and experimentation topics you should know to nail Spotify Tech Screentowardsdatascience.com](https://towardsdatascience.com/how-to-crack-spotify-data-science-technical-screen-interview-23f0f7205928 "towardsdatascience.com/how-to-crack-spotify..")

[Simple and Complete Guide to A/B Testing
End-to-end A/B testing for your Data Science experiments for non-technical and technical specialists with examples and…towardsdatascience.com](https://towardsdatascience.com/simple-and-complet-guide-to-a-b-testing-c34154d0ce5a "towardsdatascience.com/simple-and-complet-g..")

[Monte Carlo Simulation and Variants with Python
Your Guide to Monte Carlo Simulation and Must Know Statistical Sampling Techniques With Python Implementationtowardsdatascience.com](https://towardsdatascience.com/monte-carlo-simulation-and-variants-with-python-43e3e7c59e1f "towardsdatascience.com/monte-carlo-simulati..")

[Bias-Variance Trade-off in Machine Learning
Introduction to bias-variance trade-off in Machine Learning and Statistical modelstatev-aslanyan.medium.com](https://tatev-aslanyan.medium.com/bias-variance-trade-off-in-machine-learning-7f885355e847 "tatev-aslanyan.medium.com/bias-variance-tra..")

[Data Sampling Methods in Python
A ready-to-run code with different data sampling techniques to create a random sample in Pythontatev-aslanyan.medium.com](https://tatev-aslanyan.medium.com/data-sampling-methods-in-python-a4400628ea1b "tatev-aslanyan.medium.com/data-sampling-met..")

[PySpark Cheat Sheet: Big Data Analytics
Here is a cheat sheet for the essential PySpark commands and functions. Start your big data analysis in PySpark.medium.com](https://medium.com/analytics-vidhya/pyspark-cheat-sheet-big-data-analytics-161a8e1f6185 "medium.com/analytics-vidhya/pyspark-cheat-s..")

Thanks for the read

I encourage you to join Medium today to have complete access to all of the great locked content published across Medium and on my feed where I publish about various Data Science, Machine Learning, and Deep Learning topics.

Follow me up on Medium to read more articles about various Data Science and Data Analytics topics. For more hands-on applications of Machine Learning, Mathematical and Statistical concepts check out my Github account.
I welcome feedback and can be reached out on LinkedIn.

Happy learning!