Mutual InformationMutual Information
Definition
Let PX, PY, PX,Y be the probability mass functions of discrete random variables X, Y and their joint probability mass function, respectively. The mutual information of X and Y is defined as follows:
I(X,Y):=D(PX,Y∥PXPY)=x∈X,y∈Y∑PX,Y(x,y)log2(PX(x)PY(x)PX,Y(x,y))
Here, D is the relative entropy.
Explanation
The following notations are used:
I(X,Y)=I(X:Y)=I(X;Y)=H(X:Y)
D(p∥q) indicates how poor an estimate q is when p is the actual distribution. Thus, I(X,Y)=D(PX,Y∥PXPY) tells how bad the assumption (X and Y are independent) is when PX,Y is the actual distribution.
I(X,Y) becomes smaller as X and Y are closer to being independent, so if (X,Y) is a normal distribution, it can be understood as a function that evaluates the correlation between X and Y. For a simple example, let’s say (X,Y) is a normal distribution with a mean of (0,0) and a covariance matrix Σ=[1ρρ1]. Then, by the property below and the entropy formula for normal distribution, the mutual information of X,Y is:
I(X,Y)=H(X)+H(Y)−H(X,Y)=21ln(2πe)+21ln(2πe)−21ln[(2πe)2(1−ρ2)]=21ln(2πe)2−21ln[(2πe)2(1−ρ2)]=−21ln(1−ρ2)
Therefore, if X,Y is independent, then ρ=0 and I(X,Y)=0. Conversely, if X,Y has a strong correlation, that is, if ρ=±1, then it becomes I(X,Y)=∞.
Properties
Symmetry
I(X,Y)=I(Y,X)
This is trivial by definition.
Non-negativity
I(X,Y)≥0
This is trivial since D(p∥q)≥0. Equality holds when X and Y are independent.
Relationship with Joint and Conditional Entropy
I(X,Y)=H(X)+H(Y)−H(X,Y)=H(X)−H(X∣Y)=H(Y)−H(Y∣X)=H(X,Y)−H(X∣Y)−H(Y∣X)
Here, H(X) is the entropy, H(X,Y) is the joint entropy, and H(X∣Y) is the conditional entropy.