logo

Mutual Information

Mutual Information

양자정보이론
[ 펼치기 · 접기 ]

Definition1 2

Let PXP_{X}, PYP_{Y}, PX,YP_{X,Y} be the probability mass functions of discrete random variables XX, YY and their joint probability mass function, respectively. The mutual information of XX and YY is defined as follows:

I(X,Y):=D(PX,YPXPY)=xX,yYPX,Y(x,y)log2(PX,Y(x,y)PX(x)PY(x)) \begin{align*} I(X, Y) &:= D(P_{X,Y} \| P_{X} P_{Y}) \\ &= \sum\limits_{x \in X, y \in Y} P_{X,Y}(x,y) \log_{2} \left( \dfrac{P_{X,Y}(x,y)}{P_{X}(x)P_{Y}(x)} \right) \end{align*}

Here, DD is the relative entropy.

Explanation

The following notations are used:

I(X,Y)=I(X:Y)=I(X;Y)=H(X:Y) I(X, Y) = I(X : Y) = I(X ; Y) = H(X : Y)

D(pq)D(p \| q) indicates how poor an estimate qq is when pp is the actual distribution. Thus, I(X,Y)=D(PX,YPXPY)I(X, Y) = D(P_{X,Y} \| P_{X} P_{Y}) tells how bad the assumption (XX and YY are independent) is when PX,YP_{X,Y} is the actual distribution.

I(X,Y)I(X, Y) becomes smaller as XX and YY are closer to being independent, so if (X,Y)(X, Y) is a normal distribution, it can be understood as a function that evaluates the correlation between XX and YY. For a simple example, let’s say (X,Y)(X, Y) is a normal distribution with a mean of (0,0)(0, 0) and a covariance matrix Σ=[1ρρ1]\Sigma = \begin{bmatrix} 1 & \rho \\ \rho & 1 \end{bmatrix}. Then, by the property below and the entropy formula for normal distribution, the mutual information of X,YX, Y is:

I(X,Y)=H(X)+H(Y)H(X,Y)=12ln(2πe)+12ln(2πe)12ln[(2πe)2(1ρ2)]=12ln(2πe)212ln[(2πe)2(1ρ2)]=12ln(1ρ2) \begin{align*} I(X, Y) &= H(X) + H(Y) - H(X, Y) \\ &= \dfrac{1}{2}\ln(2\pi e) + \dfrac{1}{2}\ln(2\pi e) - \dfrac{1}{2}\ln[(2\pi e)^{2}(1-\rho^{2})] \\ &= \dfrac{1}{2}\ln(2\pi e)^{2} - \dfrac{1}{2}\ln[(2\pi e)^{2}(1-\rho^{2})] \\ &= - \dfrac{1}{2}\ln (1-\rho^{2}) \\ \end{align*}

Therefore, if X,YX, Y is independent, then ρ=0\rho = 0 and I(X,Y)=0I(X, Y) = 0. Conversely, if X,YX, Y has a strong correlation, that is, if ρ=±1\rho = \pm 1, then it becomes I(X,Y)=I(X, Y) = \infty.

Properties

  1. Symmetry I(X,Y)=I(Y,X) I(X, Y) = I(Y, X) This is trivial by definition.

  2. Non-negativity I(X,Y)0 I(X, Y) \ge 0 This is trivial since D(pq)0D(p \| q) \ge 0. Equality holds when XX and YY are independent.

  3. Relationship with Joint and Conditional Entropy

    I(X,Y)=H(X)+H(Y)H(X,Y)=H(X)H(XY)=H(Y)H(YX)=H(X,Y)H(XY)H(YX) \begin{align*} I(X, Y) &= H(X) + H(Y) - H(X, Y) \\ &= H(X) - H(X | Y) \\ &= H(Y) - H(Y | X) \\ &= H(X, Y) - H(X | Y) - H(Y | X) \end{align*}

    Here, H(X)H(X) is the entropy, H(X,Y)H(X, Y) is the joint entropy, and H(XY)H(X | Y) is the conditional entropy.


  1. Stephen M. Barnett, Quantum Information (2009), p11-12 ↩︎

  2. 김영훈·허재성, 양자 정보 이론 (2020), p248-250 ↩︎