Mutual Information
양자정보이론 | ||||||||||||||||
[ 펼치기 · 접기 ]
|
Definition1 2
Let $P_{X}$, $P_{Y}$, $P_{X,Y}$ be the probability mass functions of discrete random variables $X$, $Y$ and their joint probability mass function, respectively. The mutual information of $X$ and $Y$ is defined as follows:
$$ \begin{align*} I(X, Y) &:= D(P_{X,Y} \| P_{X} P_{Y}) \\ &= \sum\limits_{x \in X, y \in Y} P_{X,Y}(x,y) \log_{2} \left( \dfrac{P_{X,Y}(x,y)}{P_{X}(x)P_{Y}(x)} \right) \end{align*} $$
Here, $D$ is the relative entropy.
Explanation
The following notations are used:
$$ I(X, Y) = I(X : Y) = I(X ; Y) = H(X : Y) $$
$D(p \| q)$ indicates how poor an estimate $q$ is when $p$ is the actual distribution. Thus, $I(X, Y) = D(P_{X,Y} \| P_{X} P_{Y})$ tells how bad the assumption ($X$ and $Y$ are independent) is when $P_{X,Y}$ is the actual distribution.
$I(X, Y)$ becomes smaller as $X$ and $Y$ are closer to being independent, so if $(X, Y)$ is a normal distribution, it can be understood as a function that evaluates the correlation between $X$ and $Y$. For a simple example, let’s say $(X, Y)$ is a normal distribution with a mean of $(0, 0)$ and a covariance matrix $\Sigma = \begin{bmatrix} 1 & \rho \\ \rho & 1 \end{bmatrix}$. Then, by the property below and the entropy formula for normal distribution, the mutual information of $X, Y$ is:
$$ \begin{align*} I(X, Y) &= H(X) + H(Y) - H(X, Y) \\ &= \dfrac{1}{2}\ln(2\pi e) + \dfrac{1}{2}\ln(2\pi e) - \dfrac{1}{2}\ln[(2\pi e)^{2}(1-\rho^{2})] \\ &= \dfrac{1}{2}\ln(2\pi e)^{2} - \dfrac{1}{2}\ln[(2\pi e)^{2}(1-\rho^{2})] \\ &= - \dfrac{1}{2}\ln (1-\rho^{2}) \\ \end{align*} $$
Therefore, if $X, Y$ is independent, then $\rho = 0$ and $I(X, Y) = 0$. Conversely, if $X, Y$ has a strong correlation, that is, if $\rho = \pm 1$, then it becomes $I(X, Y) = \infty$.
Properties
Symmetry $$ I(X, Y) = I(Y, X) $$ This is trivial by definition.
Non-negativity $$ I(X, Y) \ge 0 $$ This is trivial since $D(p \| q) \ge 0$. Equality holds when $X$ and $Y$ are independent.
Relationship with Joint and Conditional Entropy
$$ \begin{align*} I(X, Y) &= H(X) + H(Y) - H(X, Y) \\ &= H(X) - H(X | Y) \\ &= H(Y) - H(Y | X) \\ &= H(X, Y) - H(X | Y) - H(Y | X) \end{align*} $$
Here, $H(X)$ is the entropy, $H(X, Y)$ is the joint entropy, and $H(X | Y)$ is the conditional entropy.