Monte Carlo Integration
📂Machine LearningMonte Carlo Integration
Overview
Monte Carlo integration is a numerical approximation method used when calculating the integral of a given function is difficult. Consider the following scenario: For an integrable function f in the given [0,1], we know the formula of f(x) but calculating its integral is not straightforward. However, we want to calculate the integral I[f] of f.
I[f]=∫[0,1]f(x)dx
Definition
Monte Carlo integration is a method to estimate the integral of f by drawing samples {xi} from a distribution on [0,1] as follows:
I[f]≈In[f]:=n1i=1∑nf(xi)
Difference from Riemann Sum
The idea of Riemann Sum is to divide the interval [0,1] into n equal parts and get points {xi=ni−1}i=1n, then sum up the function values at these points.
mensuration by parts[f]=n1i=1∑nf(xi)
At first glance, Monte Carlo integration and Riemann Sum may seem similar, but their meanings are completely different. In Riemann Sum, {xi} are points obtained by dividing the interval [0,1] into n equal parts, whereas in Monte Carlo integration, x represents n samples drawn from the distribution p(x). Thus, the value obtained by Riemann Sum simply represents the area under the graph drawn by f, while the value obtained by Monte Carlo integration represents the expectation of f.
Properties
The statistical meaning of equation (1) is that ’the I[f] is equal to the expectation of f(X) when X follows a uniform distribution'.
X∼U(0,1)⟹I[f]=∫[0,1]f(x)dx=E[f(X)]
Expectation
Let the random variable X follow a uniform distribution. In[f] is an unbiased estimator of I[f].
E[In[f]]=I[f]
Proof
E[In[f]]=E[n1i=1∑nf(Xi)]=n1i=1∑nE[f(Xi)]by linearity of E=n1i=1∑nI[f]=I[f]
■
Variance
Proof
Properties of Variance
[a] Var(aX)=a2Var(X)\Var (aX) = a^{2} \Var (X)Var(aX)=a2Var(X)
[b] If X,YX, YX,Y are independent, Var(X+Y)=Var(X)+Var(Y)\Var (X + Y) = \Var(X) + \Var(Y)Var(X+Y)=Var(X)+Var(Y)
Let’s denote the variance of f(X)f(X)f(X) as σ2\sigma^{2}σ2. Then, by the properties of variance:
Var[In[f]]=Var[1n∑i=1nf(Xi)]=1n2Var[∑i=1nf(Xi)]=1n2∑i=1nVar[f(Xi)]=1n2∑i=1nσ2=σ2n
\begin{align*}
\Var \left[ I_{n}[f] \right]
&= \Var \left[ \dfrac{1}{n} \sum\limits_{i=1}^{n} f(X_{i}) \right] \\
&= \dfrac{1}{n^{2}} \Var \left[ \sum\limits_{i=1}^{n} f(X_{i}) \right] \\
&= \dfrac{1}{n^{2}} \sum\limits_{i=1}^{n} \Var \left[ f(X_{i}) \right] \\
&= \dfrac{1}{n^{2}} \sum\limits_{i=1}^{n} \sigma^{2} \\
&= \dfrac{\sigma^{2}}{n}
\end{align*}
Var[In[f]]=Var[n1i=1∑nf(Xi)]=n21Var[i=1∑nf(Xi)]=n21i=1∑nVar[f(Xi)]=n21i=1∑nσ2=nσ2
■
Generalization
Now consider a function p(x)≥0p(x) \ge 0p(x)≥0 and ∫[0,1]p=1\int_{[0,1]} p = 1∫[0,1]p=1. Think about the integral I[fp]I[fp]I[fp].
I[fp]=∫[0,1]f(x)p(x)dx
I[fp] = \int_{[0, 1]}f(x)p(x) dx
I[fp]=∫[0,1]f(x)p(x)dx
This is the same as the expectation of f(X)f(X)f(X) for a random variable XXX with a probability density function ppp. To approximate this value, we can consider the following two methods:
- Draw samples {xi}i=1n\left\{ x_{i} \right\}_{i=1}^{n}{xi}i=1n uniformly and approximate I[fp]I[fp]I[fp] as follows:
Xi∼U(0,1)I[fp]≈1n∑if(xi)p(xi)
X_{i} \sim U(0,1) \qquad I[fp] \approx \dfrac{1}{n}\sum\limits_{i}f(x_{i})p(x_{i})
Xi∼U(0,1)I[fp]≈n1i∑f(xi)p(xi)
- Draw samples {xi}i=1n\left\{ x_{i} \right\}_{i=1}^{n}{xi}i=1n from p(x)p(x)p(x) and approximate I[fp]I[fp]I[fp] as follows:
Xi∼p(x)I[fp]=Ip[f]≈1n∑if(xi)
X_{i} \sim p(x) \qquad I[fp] = I_{p}[f] \approx \dfrac{1}{n}\sum\limits_{i}f(x_{i})
Xi∼p(x)I[fp]=Ip[f]≈n1i∑f(xi)
In other words, 1. is averaging f(x)p(x)f(x)p(x)f(x)p(x) by uniform sampling, and 2. is averaging f(x)f(x)f(x) by sampling from p(x)p(x)p(x). Among these, the one with smaller variance is 1. Let’s simplify the notation as I=I[fp]=I[fp]I = I[fp] = I[fp]I=I[fp]=I[fp].
In case of 1.
σ12=Var[fp]=E[(fp−I)2]=∫(fp−I)2dx=∫(fp)2dx−2I∫fpdx+I2∫dx=∫(fp)2dx−2I2+I2=∫(fp)2dx−I2
\begin{align*}
\sigma_{1}^{2} = \Var [fp]
&= E \left[ (fp - I)^{2} \right] \\
&= \int (fp - I)^{2} dx \\
&= \int (fp)^{2} dx - 2I\int fp dx + I^{2}\int dx\\
&= \int (fp)^{2} dx - 2I^{2} + I^{2}\\
&= \int (fp)^{2} dx - I^{2}\\
\end{align*}
σ12=Var[fp]=E[(fp−I)2]=∫(fp−I)2dx=∫(fp)2dx−2I∫fpdx+I2∫dx=∫(fp)2dx−2I2+I2=∫(fp)2dx−I2
In case of 2.
σ22=Var[f]=Ep[(f−I)2]=∫(f−I)2pdx=∫f2pdx−2I∫fpdx+I2∫pdx=∫f2pdx−2I2+I2=∫f2pdx−I2
\begin{align*}
\sigma_{2}^{2} = \Var [f]
&= E_{p} \left[ (f - I)^{2} \right] \\
&= \int (f - I)^{2}p dx \\
&= \int f^{2}p dx - 2I\int fp dx + I^{2}\int pdx\\
&= \int f^{2}p dx - 2I^{2} + I^{2}\\
&= \int f^{2}p dx - I^{2}\\
\end{align*}
σ22=Var[f]=Ep[(f−I)2]=∫(f−I)2pdx=∫f2pdx−2I∫fpdx+I2∫pdx=∫f2pdx−2I2+I2=∫f2pdx−I2
However, since 0≤p≤10 \le p \le 10≤p≤1, it follows that f2p≥f2p2f^{2}p \ge f^{2}p^{2}f2p≥f2p2. Therefore,
σ12≤σ22
\sigma_{1}^{2} \le \sigma_{2}^{2}
σ12≤σ22