monte_carlo slides

Monte Carlo Studies¶

Stats 507, Fall 2021

James Henderson, PhD
October 5, 2021

Overview¶

Monte Carlo
Example - Estimating pi
LLN & CLT
Precision
Takeaways

Monte Carlo - Why¶

In statistics and data science we are often interested in computing expectations (means) of various types of random outcomes.
When analytic expectations are unavailable or cumbersome to compute, it can be useful to obtain Monte Carlo approximations.

Monte Carlo - What¶

We form Monte Carlo estimates by simulating (from) a random process and then directly averaging the values of interest.

Example¶

In this example, we'll from a Monte Carlo estimate of $\pi.$
Our estimate is based on the fact that the unit circle has area $\pi.$

In [ ]:

import numpy as np
from scipy.stats import norm
n_mc = 1000
rng = np.random.default_rng(10 * 5 / 2021)
xy = rng.random.uniform(2 * n_mc)
xy = xy.reshape((2, n_mc))
d = np.power(xy[0, :], 2) + np.power(xy[1, :], 2)
p_hat = np.mean(d < 1)
pi_hat = 4 * p_hat
pi_hat

Law of Large Numbers¶

Monte Carlo estimation works because the sample average is generally a good estimate of the corresponding expectation $\bar{\theta}_n \to_p \theta $. $$ {\bar{\theta}}_{n} := \sum_{i=1}^n X_i / n \to_p \theta := E[X]. $$

Central Limit Theorem¶

Assume our data are:
- independent and identically distributed (iid),
- from a distribution with finite variance.
Then, the rate of convergence of a sample average to its population counterpart is characterized by the central limit theorem (CLT): $$ \sqrt{n}(\bar{\theta}_n - \theta) \to_{d} N(0,\sigma^2) $$

Central Limit Theorem¶

On the previous slide, $\sigma^2$ is the variance of the underlying distribution from which X is drawn.
Estimate the variance of a Monte Carlo estimate to construct confidence intervals around your estimates.
Choose the number of Monte Carlo replicates to attain the desired precision.

In [ ]:

se = 4 * np.sqrt(p_hat * (1 - p_hat) / n_mc)

se2 = 4 * np.sqrt( np.var(d) / n_mc )
z = norm.ppf(.975)
lwr, upr = pi_hat - z * se, pi_hat + z * se
'{0:5.3f} ({1:4.2f}, {2:4.2f})'.format(pi_hat, lwr, upr)

Confidence Level¶

We chose $z = \Phi^{-1}(0.975)$ on the previous slide to attain a 95% confidence level.
This means that we expect the population parameter ($\pi$) to be in our (random) interval 95% of the time.

In [ ]:

lwr < np.pi < upr # Expect this to be True ~95% of the time with z = 1.96

Desired Precision¶

Suppose we need to estimate $\pi$ to within $\pm 0.001$ with 99% confidence.
To determine the number of Monte Carlo replicates we should use to attain the desired margin of error, solve the inequality,

$$ \Phi^{-1}(0.995) 4 \sqrt{ \frac{\pi}{4} \left(1 - \frac{\pi}{4}\right) / n } < 0.001. $$

In [ ]:

z = norm.ppf(.995)
n_min =  (1000 * z * 4 * np.sqrt(np.pi / 4 * (1 - np.pi / 4))) ** 2
n_min

Example¶

Here is our earlier example with 18 million Monte Carlo replicates.
This is feasible because we are using vectorized operations.

In [ ]:

n_mc = int(1.8e7)
xy = rng.uniform(size=2 * n_mc).reshape((2, n_mc))
d = np.power(xy[0, :], 2) + np.power(xy[1, :], 2)
eps = 4 * np.mean(d < 1) - np.pi
(eps, np.abs(eps) < 0.001)

Takeaways¶

Monte Carlo estimates are statistical estimates of population parameters using "data" simulated from a (pseudo-)random process.
Use vectorized operations for efficient Monte Carlo estimation.
Use the CLT or the Normal approximation to the Binomial to estimate the variance of a Monte Carlo estimate.
Choose the number of Monte Carlo replicates based on the required/desired precision.