logo

Mutual Information

Mutual Information

양자정보이론
[ 펼치기 · 접기 ]

Definition1 2

Let $P_{X}$, $P_{Y}$, $P_{X,Y}$ be the probability mass functions of discrete random variables $X$, $Y$ and their joint probability mass function, respectively. The mutual information of $X$ and $Y$ is defined as follows:

$$ \begin{align*} I(X, Y) &:= D(P_{X,Y} \| P_{X} P_{Y}) \\ &= \sum\limits_{x \in X, y \in Y} P_{X,Y}(x,y) \log_{2} \left( \dfrac{P_{X,Y}(x,y)}{P_{X}(x)P_{Y}(x)} \right) \end{align*} $$

Here, $D$ is the relative entropy.

Explanation

The following notations are used:

$$ I(X, Y) = I(X : Y) = I(X ; Y) = H(X : Y) $$

$D(p \| q)$ indicates how poor an estimate $q$ is when $p$ is the actual distribution. Thus, $I(X, Y) = D(P_{X,Y} \| P_{X} P_{Y})$ tells how bad the assumption ($X$ and $Y$ are independent) is when $P_{X,Y}$ is the actual distribution.

$I(X, Y)$ becomes smaller as $X$ and $Y$ are closer to being independent, so if $(X, Y)$ is a normal distribution, it can be understood as a function that evaluates the correlation between $X$ and $Y$. For a simple example, let’s say $(X, Y)$ is a normal distribution with a mean of $(0, 0)$ and a covariance matrix $\Sigma = \begin{bmatrix} 1 & \rho \\ \rho & 1 \end{bmatrix}$. Then, by the property below and the entropy formula for normal distribution, the mutual information of $X, Y$ is:

$$ \begin{align*} I(X, Y) &= H(X) + H(Y) - H(X, Y) \\ &= \dfrac{1}{2}\ln(2\pi e) + \dfrac{1}{2}\ln(2\pi e) - \dfrac{1}{2}\ln[(2\pi e)^{2}(1-\rho^{2})] \\ &= \dfrac{1}{2}\ln(2\pi e)^{2} - \dfrac{1}{2}\ln[(2\pi e)^{2}(1-\rho^{2})] \\ &= - \dfrac{1}{2}\ln (1-\rho^{2}) \\ \end{align*} $$

Therefore, if $X, Y$ is independent, then $\rho = 0$ and $I(X, Y) = 0$. Conversely, if $X, Y$ has a strong correlation, that is, if $\rho = \pm 1$, then it becomes $I(X, Y) = \infty$.

Properties

  1. Symmetry $$ I(X, Y) = I(Y, X) $$ This is trivial by definition.

  2. Non-negativity $$ I(X, Y) \ge 0 $$ This is trivial since $D(p \| q) \ge 0$. Equality holds when $X$ and $Y$ are independent.

  3. Relationship with Joint and Conditional Entropy

    $$ \begin{align*} I(X, Y) &= H(X) + H(Y) - H(X, Y) \\ &= H(X) - H(X | Y) \\ &= H(Y) - H(Y | X) \\ &= H(X, Y) - H(X | Y) - H(Y | X) \end{align*} $$

    Here, $H(X)$ is the entropy, $H(X, Y)$ is the joint entropy, and $H(X | Y)$ is the conditional entropy.


  1. Stephen M. Barnett, Quantum Information (2009), p11-12 ↩︎

  2. 김영훈·허재성, 양자 정보 이론 (2020), p248-250 ↩︎