logo

Monte Carlo Integration 📂Machine Learning

Monte Carlo Integration

Overview

Monte Carlo integration is a numerical approximation method used when calculating the integral of a given function is difficult. Consider the following scenario: For an integrable function ff in the given [0,1][0, 1], we know the formula of f(x)f(x) but calculating its integral is not straightforward. However, we want to calculate the integral I[f]I[f] of ff.

I[f]=[0,1]f(x)dx \begin{equation} I[f] = \int_{[0,1]} f(x) dx \end{equation}

Definition

Monte Carlo integration is a method to estimate the integral of ff by drawing samples {xi}\left\{ x_{i} \right\} from a distribution on [0,1][0, 1] as follows:

I[f]In[f]:=1ni=1nf(xi) I[f] \approx I_{n}[f] := \dfrac{1}{n}\sum_{i=1}^{n} f(x_{i})

Difference from Riemann Sum

The idea of Riemann Sum is to divide the interval [0,1][0,1] into nn equal parts and get points {xi=i1n}i=1n\left\{ x_{i} = \frac{i-1}{n} \right\}_{i=1}^{n}, then sum up the function values at these points.

mensuration by parts[f]=1ni=1nf(xi) \text{mensuration by parts}[f] = \dfrac{1}{n}\sum_{i=1}^{n} f(x_{i})

At first glance, Monte Carlo integration and Riemann Sum may seem similar, but their meanings are completely different. In Riemann Sum, {xi}\left\{ x_{i} \right\} are points obtained by dividing the interval [0,1][0, 1] into nn equal parts, whereas in Monte Carlo integration, xx represents nn samples drawn from the distribution p(x)p(x). Thus, the value obtained by Riemann Sum simply represents the area under the graph drawn by ff, while the value obtained by Monte Carlo integration represents the expectation of ff.

Properties

The statistical meaning of equation (1)(1) is that ’the I[f]I[f] is equal to the expectation of f(X)f(X) when XX follows a uniform distribution'.

XU(0,1)    I[f]=[0,1]f(x)dx=E[f(X)] X \sim U(0,1) \implies I[f] = \int_{[0,1]} f(x) dx = E\left[ f(X) \right]

Expectation

Let the random variable XX follow a uniform distribution. In[f]I_{n}[f] is an unbiased estimator of I[f]I[f].

E[In[f]]=I[f] E\left[ I_{n}[f] \right] = I[f]

Proof

E[In[f]]=E[1ni=1nf(Xi)]=1ni=1nE[f(Xi)]by linearity of E=1ni=1nI[f]=I[f] \begin{align*} E\left[ I_{n}[f] \right] &= E\left[ \dfrac{1}{n} \sum\limits_{i=1}^{n} f(X_{i}) \right] \\ &= \dfrac{1}{n} \sum\limits_{i=1}^{n} E\left[ f(X_{i}) \right] \qquad \text{by linearity of EE} \\ &= \dfrac{1}{n} \sum\limits_{i=1}^{n} I\left[ f \right] \\ &= I\left[ f \right] \end{align*}

Variance

Proof

Properties of Variance

[a] Var(aX)=a2Var(X)\Var (aX) = a^{2} \Var (X)

[b] If X,YX, Y are independent, Var(X+Y)=Var(X)+Var(Y)\Var (X + Y) = \Var(X) + \Var(Y)

Let’s denote the variance of f(X)f(X) as σ2\sigma^{2}. Then, by the properties of variance:

Var[In[f]]=Var[1ni=1nf(Xi)]=1n2Var[i=1nf(Xi)]=1n2i=1nVar[f(Xi)]=1n2i=1nσ2=σ2n \begin{align*} \Var \left[ I_{n}[f] \right] &= \Var \left[ \dfrac{1}{n} \sum\limits_{i=1}^{n} f(X_{i}) \right] \\ &= \dfrac{1}{n^{2}} \Var \left[ \sum\limits_{i=1}^{n} f(X_{i}) \right] \\ &= \dfrac{1}{n^{2}} \sum\limits_{i=1}^{n} \Var \left[ f(X_{i}) \right] \\ &= \dfrac{1}{n^{2}} \sum\limits_{i=1}^{n} \sigma^{2} \\ &= \dfrac{\sigma^{2}}{n} \end{align*}

Generalization

Now consider a function p(x)0p(x) \ge 0 and [0,1]p=1\int_{[0,1]} p = 1. Think about the integral I[fp]I[fp].

I[fp]=[0,1]f(x)p(x)dx I[fp] = \int_{[0, 1]}f(x)p(x) dx

This is the same as the expectation of f(X)f(X) for a random variable XX with a probability density function pp. To approximate this value, we can consider the following two methods:

  1. Draw samples {xi}i=1n\left\{ x_{i} \right\}_{i=1}^{n} uniformly and approximate I[fp]I[fp] as follows:

XiU(0,1)I[fp]1nif(xi)p(xi) X_{i} \sim U(0,1) \qquad I[fp] \approx \dfrac{1}{n}\sum\limits_{i}f(x_{i})p(x_{i})

  1. Draw samples {xi}i=1n\left\{ x_{i} \right\}_{i=1}^{n} from p(x)p(x) and approximate I[fp]I[fp] as follows:

Xip(x)I[fp]=Ip[f]1nif(xi) X_{i} \sim p(x) \qquad I[fp] = I_{p}[f] \approx \dfrac{1}{n}\sum\limits_{i}f(x_{i})

In other words, 1. is averaging f(x)p(x)f(x)p(x) by uniform sampling, and 2. is averaging f(x)f(x) by sampling from p(x)p(x). Among these, the one with smaller variance is 1. Let’s simplify the notation as I=I[fp]=I[fp]I = I[fp] = I[fp].

  • In case of 1. σ12=Var[fp]=E[(fpI)2]=(fpI)2dx=(fp)2dx2Ifpdx+I2dx=(fp)2dx2I2+I2=(fp)2dxI2 \begin{align*} \sigma_{1}^{2} = \Var [fp] &= E \left[ (fp - I)^{2} \right] \\ &= \int (fp - I)^{2} dx \\ &= \int (fp)^{2} dx - 2I\int fp dx + I^{2}\int dx\\ &= \int (fp)^{2} dx - 2I^{2} + I^{2}\\ &= \int (fp)^{2} dx - I^{2}\\ \end{align*}

  • In case of 2. σ22=Var[f]=Ep[(fI)2]=(fI)2pdx=f2pdx2Ifpdx+I2pdx=f2pdx2I2+I2=f2pdxI2 \begin{align*} \sigma_{2}^{2} = \Var [f] &= E_{p} \left[ (f - I)^{2} \right] \\ &= \int (f - I)^{2}p dx \\ &= \int f^{2}p dx - 2I\int fp dx + I^{2}\int pdx\\ &= \int f^{2}p dx - 2I^{2} + I^{2}\\ &= \int f^{2}p dx - I^{2}\\ \end{align*}

However, since 0p10 \le p \le 1, it follows that f2pf2p2f^{2}p \ge f^{2}p^{2}. Therefore,

σ12σ22 \sigma_{1}^{2} \le \sigma_{2}^{2}