Mixture Distributions
📂Mathematical StatisticsMixture Distributions
Build-up
Suppose we want to approximate a probability distribution with the probability density function as shown in the image below.

One of the basic methods to approximate a probability distribution is to find a normal distribution that closely resembles the distribution we aim to approximate. However, as the following figures show, the distribution we want to approximate has three peaks, making it challenging to approximate using a normal distribution.

Here, we can think in the following way: a normal distribution has one peak, while the distribution we want to approximate has three peaks. Then, could we approximate the desired distribution by adding three normal distributions? This is precisely the idea of mixture distributions. In fact, the probability density function above is as follows.
p(x)=21N(5,1)+41N(−4,1)+41N(0,1.5)

Definition
Simple Definition
A mixture distribution is a linear combination of probability distributions.
Let there be k probability distributions with probability density functions denoted by f1,…,fk. The distribution that follows the next probability density function f given mixing coefficient π1,…,πk is called a mixture distribution.
f(x)=π1f1(x)+⋯+πkfk(x)=i=1∑kπifi(x)(1)
Explanation
Meanwhile, for f to be a probability density function, the sum of πi must equal 1. To show this simply, integrating both sides of (1) leads to the following.
∫f(x)dx=∫(i=1∑kπifi(x))dx=i=1∑kπi∫fi(x)dx=i=1∑kπi=1
⟹i=1∑kπi=1
Additionally, since fi≥0 holds, each πi must satisfy πi∈[0,1].
Meanwhile, πi can naturally be called probability, weight, etc. For the interpretation of probability, refer to the Responsibility Values Section below the document.
Mixing Coefficients | Mixing Probabilities | Weights |
---|
mixing coefficients | mixing probabilities | weights |
πi | pi | wi |
i=1∑kπi=1 | i=1∑kpi=1 | i=1∑kwi=1 |
0≤πi≤1 | 0≤pi≤1 | 0≤wi≤1 |
Cumulative Distribution Function
If the random variable corresponding to the probability density function f is denoted by X, the cumulative distribution function FX of X is as follows.
FX(x)=i=1∑kπkFi(x)
At this time, Fi is the cumulative distribution function corresponding to fi.
Expectation
Let the random variables corresponding to each fi be Xi. If the mean of Xi is μi=E[Xi], the expectation of X is as follows.
E[X]=∫xf(x)dx=∫xi=1∑kπifi(x)dx=i=1∑kπi∫xfi(x)dx=i=1∑kπiμi=μ
In other words, μ=E[X] is the weighted average of μi.
Variance
The variance of X is as follows.
Var(X)=∫(x−μ)2f(x)dx=∫(x−μ)2i=1∑kπifi(x)dx=i=1∑kπi∫(x−μ)2fi(x)dx=i=1∑kπi∫[(x−μi)+(μi−μ)]2fi(x)dx=i=1∑kπi[∫(x−μi)2fi(x)dx+2(μi−μ)∫(x−μi)fi(x)dx+∫(μi−μ)2fi(x)dx]=i=1∑kπi∫(x−μi)2fi(x)dx+i=1∑kπi(μi−μ)2∫fi(x)dx
The last equality holds because the cross terms become ∫(x−μi)fi(x)dx=0. Completing the arrangement of the expression gives the following.
Var(X)=i=1∑kπiσi2+i=1∑kπi(μi−μ)2
At this point, σi2 is the variance of Xi.
Responsibility Values
i=1∑kπi=1πi≥0∀i
Since πi satisfies characteristics similar to probability, it can be interpreted as such, in which case it is called mixing probability. If f(i)=πi is considered as the prior probability density for selecting the ith component, fi(x) can be seen as the conditional probability density for i given as follows.
fi(x)=f(x∣i)
Then the posterior probability density f(i∣x), according to Bayes’ theorem, is as follows, and γi is called the responsibility.
γi(x)=f(i∣x)=j=1∑kf(j)f(x∣j)f(i)f(x∣i)=f(x)f(i)f(x∣i)