Relative Entropy (Kullback-Leibler Divergence) in Classical Information TheoryRelative Entropy (Kullback-Leibler Divergence) in Classical Information Theory
Discrete random variables ’s probability mass function p,q regarding, the relative entropy of q with respect to p is defined as follows.
D(p∥q):=∑p(x)log2q(x)p(x)(1)
For p=0, it is defined as plog2(0p):=∞.
For continuous random variables, it is defined by integration.
D(p∥q):=∫p(x)lnq(x)p(x)dx
Description
Relative entropy is also known as Kullback-Leibler divergence (KLd) and is noted by the following notations.
D(p∥q)=DKL(p∥q)=H(p∥q)
D(p∥q) measures how inadequate it is to assume the distribution of X as q (when the actual distribution of X is p), in other words, how much q differs from p. Since −logq represents the information content of q, the definition (1) means the average difference of information between q and p.
∑p(x)log2q(x)p(x)=∑p(x)[−log2q(x)−(−log2p(x))]=∑p(x)[I(q(x))−I(p(x))]=E[I(q)−I(p)]
Properties
Non-symmetry
D(p∥q)=D(q∥p)
Non-negativity
D(p∥q)≥0
Equality holds when p=q.
Proof
2.
If p=q, then by definition D(p∥q)=0, so let’s consider p=q.
−D(p∥q)=∑p(x)log2p(x)q(x)≤log2(∑p(x)p(x)q(x))=log2(∑q(x))=log21=0
The inequality holds because the logarithm function is concave, according to the Jensen’s inequality.
Jensen’s inequality
If f is a concave function, then the following holds. For ∑k=1nλk=1,
f(k=1∑nλkxk)≥k=1∑nλkf(xk)
Therefore, by multiplying both sides by minus,
0≤D(p∥q)
■
See also