Rejection Sampling
Overview 1
Rejection sampling is one of the Monte Carlo methods, where a proposal distribution $q$, easy to sample from, is used to obtain samples following a given distribution $p$, especially when it’s difficult to sample from $p$ directly.
Build-up
Let’s assume we are given a random variable $X$ with a probability density function $p$. We want to sample from the distribution of $p$, but it’s challenging. (For instance, if it’s difficult to find the inverse function of the cumulative distribution function of $X$, implementing sampling is challenging.) And for all $x$, choose another distribution $q$ and a constant $k$ that satisfies $kq(x) \ge p(x)$. Here, $q$ should be an easily samplable distribution like uniform or normal distribution and resemble the target distribution as much as possible.
Now, suppose we’ve obtained $x$ through sampling the random variable $X \sim q$ and $y$ through sampling for $Y \sim U(0,kq(x))$, where $U$ is a uniform distribution. Upon comparing two values $p(x), y$ for $x, y$,
- If $p(x) \gt y$, then $x$ is accepted into the sample.
- If $p(x) \le y$, then $x$ is rejected from the sample.
Definition
Let $X$ be a random variable with the probability density function of $p$. Choose another probability density function $q$ and a constant $k$ that satisfy the following.
$$ p(x) \le k q(x),\quad \forall x $$
Rejection sampling is the method of extracting sample $\left\{ x_{i} \right\}$ as follows.
$$ \left\{ x_{i} \Big| \dfrac{p(x_{i})}{kq(x_{i})} \gt y_{i},\quad X_{i} \sim q, Y_{i} \sim U(0,kq(x_{i})) \right\} $$
Here, $p$ is called the target distribution, and $q$ is referred to as the proposal distribution.
Explanation
This method can be used when it is specifically known about $p(x)$, but difficult to sample from it. In essence, rejection sampling means selecting a $q$ that is easy to sample from and satisfies $kq(x) \ge p(x)$, sampling $x_{i}$ from $q$, and probabilistically rejecting $x_{i}$ based on the difference (ratio) between $kq(x_{i})$ and $p(x_{i})$. The figure above shows that the bigger the difference between the target $p$ and the proposal $q$, the higher the probability of rejection. In other words, the probability originally meant to represent $p(x_{i})$ has been mathematically manipulated to represent the probability of $x_{i}$ being accepted, $\frac{p(x_{i})}{kq(x_{i})}$. Or it can be understood as giving a penalty to the probability of sampling based on the difference because the distribution was changed from $p$ to $kq$.
When the proposal distribution $q$ and the constant $k$ are chosen, the acceptance rate of the sample can be calculated as follows. $$ \text{acceptance rate} = \int \frac{p(x)}{kq(x)} q(x) dx = \dfrac{1}{k}\int p(x)dx = \dfrac{1}{k} $$ Therefore, the time it takes to collect a sufficient amount of samples depends on $k$, and for optimization, the smallest possible $k$ satisfying $p(x) \le k q(x)$ should be chosen.
Implementing rejection sampling in Julia and creating scatter plots and histograms yields the following.
Code
using Distributions
using Plots
using LaTeXStrings
N = 20000
target(x) = 0.5*pdf.(Normal(-3, 0.8), x) + pdf.(Normal(3, 3), x)
proposal = Normal(0, 5)
samples = Float64[]ㅌ
accepted = Array{Float64}[]
rejected = Array{Float64}[]
for i ∈ 1:N
x = rand(proposal)
y = rand(Uniform(0,4.2pdf(proposal, x)))
if y < target(x)
push!(samples, x)
push!(accepted, [x, y])
else
push!(rejected, [x, y])
end
end
accepted = hcat(accepted...)
rejected = hcat(rejected...)
scatter(accepted[1,:], accepted[2,:], label="accepted", color=:blue, markerstrokewidth=0, markersize=2, dpi=300, size=(728,300), legend=:outertopright)
scatter!(rejected[1,:], rejected[2,:], label="rejected", color=:red, markerstrokewidth=0, markersize=2)
savefig("RS3.png")
x = range(-15, 15, length=1000)
plot(x, target(x), label="target "*L"p(x)", framestyle=:none)
histogram!(samples, label="accepted samples", color=:blue, alpha=0.5, bins=100, normed=true, dpi=300, size=(728,300), legend=:outertopright)
savefig("RS4.png")
Environment
- OS: Windows11
- Version: Julia 1.8.3, Plots v1.38.6, Distributions v0.25.80, LaTeXStrings v1.3.0
Christoper M. Bishop, Pattern Recognition annd Machine Learning (2006), p528-531 ↩︎