중요도 샘플링
Overview1
Importance sampling is one of the Monte Carlo methods, and it is a sampling trick that can be used when approximating integrations (expectations) with finite sums.
Build-up
The probability that a value from a standard normal distribution exceeds is about .
The Julia code to calculate this integral using Monte Carlo integration is as follows.
using Distributions
n = 50000 # sample size
z₀ = 4.0
for i in 1:5
z = rand(Normal(), n) # n-samples drawn from a standard normal distribution
est_MC = sum(z .> z₀)/n # estmiates using Monte Carlo
println(est_MC)
end
4.0e-5
6.0e-5
0.0
4.0e-5
4.0e-5
If you try it about five times, the calculated value doesn’t seem close to the true value, and sometimes it even calculates as . This inaccuracy arises because the tail of the standard normal distribution is too thin. Although integration on paper includes calculation for the tails regardless of how thin they are, when sampling with a computer, you only deal with a finite number of samples, so an event with a very low occurrence probability might not be included in the calculations. Of course, increasing will give more accuracy, but that’s too obvious to call a solution.
Now, consider a scenario where we calculate the expectation of for a random variable with a probability density function (PDF) of . It can be calculated as on the left side of the above expression, where multiplying and dividing by any still gives the same value. While the right side bears the same value as the left, it signifies the ’expectation of for a random variable with a PDF of .’ If is chosen as a PDF with a thicker tail than , inaccuracies due to very low occurrence probability events can be reduced.
Definition
Let’s say that is a random variable with a PDF of . Assume to be another PDF that satisfies the following.
Importance sampling refers to the method of computing the expectation of by calculating . Here, denotes the support of the function.
In this context, is called the proposal distribution, and is termed the importance weight.
Explanation
In other words, importance sampling calculates Monte Carlo integration as follows.
It’s helpful when used in situations like the examples in the build-up:
- When the tail of is too thin, propose a distribution with a larger support as a proposal function.
- When it is difficult to sample from the distribution of , set a distribution easy to sample from as the proposal function .
Since is multiplied in the denominator of , must include .
The result of recalculating the integral from the build-up using the Cauchy distribution as the proposal distribution and importance sampling is as follows.
# Julia code
using Distributions
n = 50000 # sample size
z₀ = 4.0
for i in 1:5
z = rand(Cauchy(), n) # n-samples drawn from a Cauchy distribution
w = pdf.(Normal(), z) ./ pdf.(Cauchy(), z)
est_IS = sum(w[z .> z₀])/n # estmiates using Importance Sampling
println(est_IS)
end
3.0330744703037005e-5
3.1808448538011614e-5
3.049259883894785e-5
3.305423127141489e-5
3.1807695818207125e-5
Properties
Unbiased Estimator
is an unbiased estimator of .
Proof
■
분산
중요도샘플링에서의 분산은
이므로, 유한한 분산을 갖기 위해선 다음이 성립해야한다.
This implies that
Proof
The second term of
For a function
that is convex and twice differentiable on an open interval ϕ \phi , if a random variable I I has expectation X X and μ \mu then X ⊂ I X \subset I ϕ [ E ( X ) ] ≤ E [ ϕ ( X ) ] \phi [ E(X) ] \le E [ \phi (X)]
However, since the quadratic function is convex,
The equality holds if
■
Christoper M. Bishop, Pattern Recognition annd Machine Learning (2006), p532-534 ↩︎