logo

Mixture Distributions 📂Mathematical Statistics

Mixture Distributions

Build-up1

Suppose we want to approximate a probability distribution with the probability density function as shown in the image below.

One of the basic methods to approximate a probability distribution is to find a normal distribution that closely resembles the distribution we aim to approximate. However, as the following figures show, the distribution we want to approximate has three peaks, making it challenging to approximate using a normal distribution.

Here, we can think in the following way: a normal distribution has one peak, while the distribution we want to approximate has three peaks. Then, could we approximate the desired distribution by adding three normal distributions? This is precisely the idea of mixture distributions. In fact, the probability density function above is as follows.

p(x)=12N(5,1)+14N(4,1)+14N(0,1.5) p(x) = \dfrac{1}{2}N(5, 1) + \dfrac{1}{4}N(-4, 1) + \dfrac{1}{4}N(0, 1.5)

Definition

Simple Definition

A mixture distribution is a linear combination of probability distributions.

Formal Definition2

Let there be kk probability distributions with probability density functions denoted by f1,,fkf_{1}, \dots, f_{k}. The distribution that follows the next probability density function ff given mixing coefficient π1,,πk\pi_{1}, \dots, \pi_{k} is called a mixture distribution.

f(x)=π1f1(x)++πkfk(x)=i=1kπifi(x)(1) f(x) = \pi_{1} f_{1}(x) + \cdots + \pi_{k} f_{k}(x) = \sum_{i=1}^{k} \pi_{i} f_{i}(x) \tag{1}

Explanation

Meanwhile, for ff to be a probability density function, the sum of πi\pi_{i} must equal 11. To show this simply, integrating both sides of (1)(1) leads to the following.

f(x)dx=(i=1kπifi(x))dx=i=1kπifi(x)dx=i=1kπi=1 \int f(x) dx = \int \left( \sum_{i=1}^{k} \pi_{i} f_{i}(x) \right) dx = \sum_{i=1}^{k} \pi_{i} \int f_{i}(x) dx = \sum_{i=1}^{k} \pi_{i} = 1     i=1kπi=1 \implies \sum_{i=1}^{k} \pi_{i} = 1

Additionally, since fi0f_{i} \ge 0 holds, each πi\pi_{i} must satisfy πi[0,1]\pi_{i} \in [0, 1].

Meanwhile, πi\pi_{i} can naturally be called probability, weight, etc. For the interpretation of probability, refer to the Responsibility Values Section below the document.

Mixing CoefficientsMixing ProbabilitiesWeights
mixing coefficientsmixing probabilitiesweights
πi\pi_{i}pip_{i}wiw_{i}
i=1kπi=1\sum\limits_{i=1}^{k} \pi_{i} = 1i=1kpi=1\sum\limits_{i=1}^{k} p_{i} = 1i=1kwi=1\sum\limits_{i=1}^{k} w_{i} = 1
0πi10 \le \pi_{i} \le 10pi10 \le p_{i} \le 10wi10 \le w_{i} \le 1

Cumulative Distribution Function

If the random variable corresponding to the probability density function ff is denoted by XX, the cumulative distribution function FXF_{X} of XX is as follows.

FX(x)=i=1kπkFi(x) F_{X}(x) = \sum\limits_{i=1}^{k} \pi_{k} F_{i}(x)

At this time, FiF_{i} is the cumulative distribution function corresponding to fif_{i}.

Expectation

Let the random variables corresponding to each fif_{i} be XiX_{i}. If the mean of XiX_{i} is μi=E[Xi]\mu_{i} = E[X_{i}], the expectation of XX is as follows.

E[X]=xf(x)dx=xi=1kπifi(x)dx=i=1kπixfi(x)dx=i=1kπiμi=μ \begin{align*} E[X] &= \int x f(x) dx = \int x\sum\limits_{i=1}^{k} \pi_{i}f_{i}(x)dx \\ &= \sum\limits_{i=1}^{k}\pi_{i} \int x f_{i}(x)dx = \sum\limits_{i=1}^{k}\pi_{i}\mu_{i} \\ &= \mu \end{align*}

In other words, μ=E[X]\mu = E[X] is the weighted average of μi\mu_{i}.

Variance

The variance of XX is as follows.

Var(X)=(xμ)2f(x)dx=(xμ)2i=1kπifi(x)dx=i=1kπi(xμ)2fi(x)dx=i=1kπi[(xμi)+(μiμ)]2fi(x)dx=i=1kπi[(xμi)2fi(x)dx+2(μiμ)(xμi)fi(x)dx+(μiμ)2fi(x)dx]=i=1kπi(xμi)2fi(x)dx+i=1kπi(μiμ)2fi(x)dx \begin{align*} &\Var(X) \\ &= \int (x - \mu)^{2} f(x) dx = \int (x - \mu)^{2} \sum\limits_{i=1}^{k} \pi_{i}f_{i}(x)dx \\ &= \sum\limits_{i=1}^{k}\pi_{i} \int (x - \mu)^{2} f_{i}(x)dx \\ &= \sum\limits_{i=1}^{k}\pi_{i} \int \big[(x - \mu_{i}) + (\mu_{i} - \mu) \big]^{2} f_{i}(x)dx \\ &= \sum\limits_{i=1}^{k}\pi_{i} \left[ \int (x - \mu_{i})^{2} f_{i}(x)dx + 2(\mu_{i} - \mu) \int (x - \mu_{i}) f_{i}(x)dx \right. \\ &\qquad \qquad + \left. \int (\mu_{i} - \mu)^{2} f_{i}(x)dx \right] \\ &= \sum\limits_{i=1}^{k}\pi_{i} \int (x - \mu_{i})^{2} f_{i}(x)dx + \sum\limits_{i=1}^{k}\pi_{i} (\mu_{i} - \mu)^{2} \int f_{i}(x)dx \end{align*}

The last equality holds because the cross terms become (xμi)fi(x)dx=0\int (x - \mu_{i}) f_{i}(x)dx = 0. Completing the arrangement of the expression gives the following.

Var(X)=i=1kπiσi2+i=1kπi(μiμ)2 \Var(X) = \sum\limits_{i=1}^{k}\pi_{i} \sigma_{i}^{2} + \sum\limits_{i=1}^{k}\pi_{i} (\mu_{i} - \mu)^{2}

At this point, σi2\sigma_{i}^{2} is the variance of XiX_{i}.

Responsibility Values

i=1kπi=1πi0i \begin{array}{c} \sum\limits_{i=1}^{k} \pi_{i} = 1 \\[1em] \pi_{i} \ge 0 \quad \forall i \end{array}

Since πi\pi_{i} satisfies characteristics similar to probability, it can be interpreted as such, in which case it is called mixing probability. If f(i)=πif(i) = \pi_{i} is considered as the prior probability density for selecting the iith component, fi(x)f_{i}(x) can be seen as the conditional probability density for ii given as follows.

fi(x)=f(xi) f_{i}(x) = f(x | i)

Then the posterior probability density f(ix)f(i | x), according to Bayes’ theorem, is as follows, and γi\gamma_{i} is called the responsibility.

γi(x)=f(ix)=f(i)f(xi)j=1kf(j)f(xj)=f(i)f(xi)f(x) \begin{align*} \gamma_{i}(x) &= f(i | x) \\ &= \dfrac{f(i) f(x | i)}{\sum\limits_{j=1}^{k} f(j) f(x | j)} \\ &= \dfrac{f(i) f(x | i)}{f(x)} \end{align*}


  1. Christoper M. Bishop, Pattern Recognition annd Machine Learning (2006), p110-113 ↩︎

  2. Hogg et al. (2018). Introduction to Mathematical Statistcs(8th Edition): p218~220. ↩︎