Relative Entropy (Kullback-Leibler Divergence) in Classical Information TheoryRelative Entropy (Kullback-Leibler Divergence) in Classical Information Theory
For a discrete random variable X with a probability mass function p,q, the relative entropy concerning q of p is defined as follows:
D(p∥q):=∑p(x)log2q(x)p(x)(1)
In this case, for p=0, it is defined as plog2(0p):=∞. For continuous random variables, it is defined through integration.
D(p∥q):=∫p(x)lnq(x)p(x)dx
The expected value form is as follows:
D(p∥q)=Ep[logq(X)p(X)]
Explanation
Relative entropy is also referred to as Kullback-Leibler divergence and is denoted using the following notations:
D(p∥q)=DKL(p∥q)=H(p∥q)
D(p∥q), when assuming the distribution of X as q (given that the actual distribution of X is p), measures how poorly q represents p, in other words, the degree of dissimilarity between q and p. Since −logq indicates the information quantity of q, definition (1) represents the average difference in information between q and p.
∑p(x)log2q(x)p(x)=∑p(x)[−log2q(x)−(−log2p(x))]=∑p(x)[I(q(x))−I(p(x))]=E[I(q)−I(p)]
Properties
Non-symmetry
D(p∥q)=D(q∥p)
Non-negativity
D(p∥q)≥0
The equality holds when p=q.
Proof
2.
By definition, if p=q then D(p∥q)=0, so let’s consider p=q.
−D(p∥q)=∑p(x)log2p(x)q(x)≤log2(∑p(x)p(x)q(x))=log2(∑q(x))=log21=0
The inequality holds because the logarithm function is concave, thus verified by the Jensen’s inequality.
Jensen’s Inequality
If f is a concave function, the following holds true for ∑k=1nλk=1,
f(k=1∑nλkxk)≥k=1∑nλkf(xk)
Therefore, by multiplying both sides by negative one, we have:
0≤D(p∥q)
■
See Also