Monte Carlo Studies¶

Stats 507, Fall 2021

James Henderson, PhD
October 5, 2021

Overview¶

  • Monte Carlo
  • Example - Estimating pi
  • LLN & CLT
  • Precision
  • Takeaways

Monte Carlo - Why¶

  • In statistics and data science we are often interested in computing expectations (means) of various types of random outcomes.
  • When analytic expectations are unavailable or cumbersome to compute, it can be useful to obtain Monte Carlo approximations.

Monte Carlo - What¶

  • We form Monte Carlo estimates by simulating (from) a random process and then directly averaging the values of interest.

Example¶

  • In this example, we'll from a Monte Carlo estimate of $\pi.$
  • Our estimate is based on the fact that the unit circle has area $\pi.$
In [ ]:
import numpy as np
from scipy.stats import norm
n_mc = 1000
rng = np.random.default_rng(10 * 5 / 2021)
xy = rng.random.uniform(2 * n_mc)
xy = xy.reshape((2, n_mc))
d = np.power(xy[0, :], 2) + np.power(xy[1, :], 2)
p_hat = np.mean(d < 1)
pi_hat = 4 * p_hat
pi_hat

Law of Large Numbers¶

  • Monte Carlo estimation works because the sample average is generally a good estimate of the corresponding expectation $\bar{\theta}_n \to_p \theta $. $$ {\bar{\theta}}_{n} := \sum_{i=1}^n X_i / n \to_p \theta := E[X]. $$

Central Limit Theorem¶

  • Assume our data are:
    • independent and identically distributed (iid),
    • from a distribution with finite variance.
  • Then, the rate of convergence of a sample average to its population counterpart is characterized by the central limit theorem (CLT): $$ \sqrt{n}(\bar{\theta}_n - \theta) \to_{d} N(0,\sigma^2) $$

Central Limit Theorem¶

  • On the previous slide, $\sigma^2$ is the variance of the underlying distribution from which X is drawn.

  • Estimate the variance of a Monte Carlo estimate to construct confidence intervals around your estimates.

  • Choose the number of Monte Carlo replicates to attain the desired precision.

In [ ]:
se = 4 * np.sqrt(p_hat * (1 - p_hat) / n_mc)

se2 = 4 * np.sqrt( np.var(d) / n_mc )
z = norm.ppf(.975)
lwr, upr = pi_hat - z * se, pi_hat + z * se
'{0:5.3f} ({1:4.2f}, {2:4.2f})'.format(pi_hat, lwr, upr)

Confidence Level¶

  • We chose $z = \Phi^{-1}(0.975)$ on the previous slide to attain a 95% confidence level.
  • This means that we expect the population parameter ($\pi$) to be in our (random) interval 95% of the time.
In [ ]:
lwr < np.pi < upr # Expect this to be True ~95% of the time with z = 1.96

Desired Precision¶

  • Suppose we need to estimate $\pi$ to within $\pm 0.001$ with 99% confidence.
  • To determine the number of Monte Carlo replicates we should use to attain the desired margin of error, solve the inequality,
$$ \Phi^{-1}(0.995) 4 \sqrt{ \frac{\pi}{4} \left(1 - \frac{\pi}{4}\right) / n } < 0.001. $$
In [ ]:
z = norm.ppf(.995)
n_min =  (1000 * z * 4 * np.sqrt(np.pi / 4 * (1 - np.pi / 4))) ** 2
n_min

Example¶

  • Here is our earlier example with 18 million Monte Carlo replicates.
  • This is feasible because we are using vectorized operations.
In [ ]:
n_mc = int(1.8e7)
xy = rng.uniform(size=2 * n_mc).reshape((2, n_mc))
d = np.power(xy[0, :], 2) + np.power(xy[1, :], 2)
eps = 4 * np.mean(d < 1) - np.pi
(eps, np.abs(eps) < 0.001)

Takeaways¶

  • Monte Carlo estimates are statistical estimates of population parameters using "data" simulated from a (pseudo-)random process.
  • Use vectorized operations for efficient Monte Carlo estimation.
  • Use the CLT or the Normal approximation to the Binomial to estimate the variance of a Monte Carlo estimate.
  • Choose the number of Monte Carlo replicates based on the required/desired precision.