Mixture Distributions 📂Mathematical Statistics

Mixture Distributions

Build-up¹

Suppose we want to approximate a probability distribution with the probability density function as shown in the image below.

One of the basic methods to approximate a probability distribution is to find a normal distribution that closely resembles the distribution we aim to approximate. However, as the following figures show, the distribution we want to approximate has three peaks, making it challenging to approximate using a normal distribution.

Here, we can think in the following way: a normal distribution has one peak, while the distribution we want to approximate has three peaks. Then, could we approximate the desired distribution by adding three normal distributions? This is precisely the idea of mixture distributions. In fact, the probability density function above is as follows.

$$ p(x) = \dfrac{1}{2}N(5, 1) + \dfrac{1}{4}N(-4, 1) + \dfrac{1}{4}N(0, 1.5) $$

Definition

Simple Definition

A mixture distribution is a linear combination of probability distributions.

Formal Definition²

Let there be $k$ probability distributions with probability density functions denoted by $f_{1}, \dots, f_{k}$. The distribution that follows the next probability density function $f$ given mixing coefficient $\pi_{1}, \dots, \pi_{k}$ is called a mixture distribution.

$$ f(x) = \pi_{1} f_{1}(x) + \cdots + \pi_{k} f_{k}(x) = \sum_{i=1}^{k} \pi_{i} f_{i}(x) \tag{1} $$

Explanation

Meanwhile, for $f$ to be a probability density function, the sum of $\pi_{i}$ must equal $1$. To show this simply, integrating both sides of $(1)$ leads to the following.

$$ \int f(x) dx = \int \left( \sum_{i=1}^{k} \pi_{i} f_{i}(x) \right) dx = \sum_{i=1}^{k} \pi_{i} \int f_{i}(x) dx = \sum_{i=1}^{k} \pi_{i} = 1 $$ $$ \implies \sum_{i=1}^{k} \pi_{i} = 1 $$

Additionally, since $f_{i} \ge 0$ holds, each $\pi_{i}$ must satisfy $\pi_{i} \in [0, 1]$.

Meanwhile, $\pi_{i}$ can naturally be called probability, weight, etc. For the interpretation of probability, refer to the Responsibility Values Section below the document.

Mixing Coefficients	Mixing Probabilities	Weights
mixing coefficients	mixing probabilities	weights
$\pi_{i}$	$p_{i}$	$w_{i}$
$\sum\limits_{i=1}^{k} \pi_{i} = 1$	$\sum\limits_{i=1}^{k} p_{i} = 1$	$\sum\limits_{i=1}^{k} w_{i} = 1$
$0 \le \pi_{i} \le 1$	$0 \le p_{i} \le 1$	$0 \le w_{i} \le 1$

Cumulative Distribution Function

If the random variable corresponding to the probability density function $f$ is denoted by $X$, the cumulative distribution function $F_{X}$ of $X$ is as follows.

$$ F_{X}(x) = \sum\limits_{i=1}^{k} \pi_{k} F_{i}(x) $$

At this time, $F_{i}$ is the cumulative distribution function corresponding to $f_{i}$.

Expectation

Let the random variables corresponding to each $f_{i}$ be $X_{i}$. If the mean of $X_{i}$ is $\mu_{i} = E[X_{i}]$, the expectation of $X$ is as follows.

$$ \begin{align*} E[X] &= \int x f(x) dx = \int x\sum\limits_{i=1}^{k} \pi_{i}f_{i}(x)dx \\ &= \sum\limits_{i=1}^{k}\pi_{i} \int x f_{i}(x)dx = \sum\limits_{i=1}^{k}\pi_{i}\mu_{i} \\ &= \mu \end{align*} $$

In other words, $\mu = E[X]$ is the weighted average of $\mu_{i}$.

Variance

The variance of $X$ is as follows.

$$ \begin{align*} &\Var(X) \\ &= \int (x - \mu)^{2} f(x) dx = \int (x - \mu)^{2} \sum\limits_{i=1}^{k} \pi_{i}f_{i}(x)dx \\ &= \sum\limits_{i=1}^{k}\pi_{i} \int (x - \mu)^{2} f_{i}(x)dx \\ &= \sum\limits_{i=1}^{k}\pi_{i} \int \big[(x - \mu_{i}) + (\mu_{i} - \mu) \big]^{2} f_{i}(x)dx \\ &= \sum\limits_{i=1}^{k}\pi_{i} \left[ \int (x - \mu_{i})^{2} f_{i}(x)dx + 2(\mu_{i} - \mu) \int (x - \mu_{i}) f_{i}(x)dx \right. \\ &\qquad \qquad + \left. \int (\mu_{i} - \mu)^{2} f_{i}(x)dx \right] \\ &= \sum\limits_{i=1}^{k}\pi_{i} \int (x - \mu_{i})^{2} f_{i}(x)dx + \sum\limits_{i=1}^{k}\pi_{i} (\mu_{i} - \mu)^{2} \int f_{i}(x)dx \end{align*} $$

The last equality holds because the cross terms become $\int (x - \mu_{i}) f_{i}(x)dx = 0$. Completing the arrangement of the expression gives the following.

$$ \Var(X) = \sum\limits_{i=1}^{k}\pi_{i} \sigma_{i}^{2} + \sum\limits_{i=1}^{k}\pi_{i} (\mu_{i} - \mu)^{2} $$

At this point, $\sigma_{i}^{2}$ is the variance of $X_{i}$.

Responsibility Values

$$ \begin{array}{c} \sum\limits_{i=1}^{k} \pi_{i} = 1 \\[1em] \pi_{i} \ge 0 \quad \forall i \end{array} $$

Since $\pi_{i}$ satisfies characteristics similar to probability, it can be interpreted as such, in which case it is called mixing probability. If $f(i) = \pi_{i}$ is considered as the prior probability density for selecting the $i$th component, $f_{i}(x)$ can be seen as the conditional probability density for $i$ given as follows.

$$ f_{i}(x) = f(x | i) $$

Then the posterior probability density $f(i | x)$, according to Bayes’ theorem, is as follows, and $\gamma_{i}$ is called the responsibility.

$$ \begin{align*} \gamma_{i}(x) &= f(i | x) \\ &= \dfrac{f(i) f(x | i)}{\sum\limits_{j=1}^{k} f(j) f(x | j)} \\ &= \dfrac{f(i) f(x | i)}{f(x)} \end{align*} $$

Christoper M. Bishop, Pattern Recognition annd Machine Learning (2006), p110-113 ↩︎
Hogg et al. (2018). Introduction to Mathematical Statistcs(8th Edition): p218~220. ↩︎