logo

Relative Entropy (Kullback-Leibler Divergence) in Classical Information Theory

Relative Entropy (Kullback-Leibler Divergence) in Classical Information Theory

양자정보이론
[ 펼치기 · 접기 ]

For a discrete random variable XX with a probability mass function p,qp, q, the relative entropy concerning qq of pp is defined as follows:

D(pq):=p(x)log2p(x)q(x)(1) D(p \| q) := \sum p(x) \log_{2} \dfrac{p(x)}{q(x)} \tag{1}

In this case, for p0p \ne 0, it is defined as plog2(p0):=p \log_{2}(\frac{p}{0}) := \infty. For continuous random variables, it is defined through integration.

D(pq):=p(x)lnp(x)q(x)dx D(p \| q) := \int p(x) \ln \dfrac{p(x)}{q(x)} dx

The expected value form is as follows:

D(pq)=Ep[logp(X)q(X)] D(p \| q) = E_{p} \left[ \log \dfrac{p(X)}{q(X)} \right]

Explanation

Relative entropy is also referred to as Kullback-Leibler divergence and is denoted using the following notations:

D(pq)=DKL(pq)=H(pq) D(p \| q) = D_{\text{KL}}(p \| q) = H(p \| q)

D(pq)D(p \| q), when assuming the distribution of XX as qq (given that the actual distribution of XX is pp), measures how poorly qq represents pp, in other words, the degree of dissimilarity between qq and pp. Since logq-\log q indicates the information quantity of qq, definition (1)(1) represents the average difference in information between qq and pp.

p(x)log2p(x)q(x)=p(x)[log2q(x)(log2p(x))]=p(x)[I(q(x))I(p(x))]=E[I(q)I(p)] \begin{align*} \sum p(x) \log_{2} \dfrac{p(x)}{q(x)} &= \sum p(x) \big[ -\log_{2}q(x) - (-\log_{2}p(x)) \big] \\ &= \sum p(x) \big[ I(q(x)) - I(p(x)) \big] \\ &= E \big[ I(q) - I(p) \big] \end{align*}

Properties

  1. Non-symmetry D(pq)D(qp) D(p \| q) \ne D(q \| p)

  2. Non-negativity D(pq)0 D(p \| q) \ge 0 The equality holds when p=qp = q.

Proof

2.

By definition, if p=qp=q then D(pq)=0D(p \| q) = 0, so let’s consider pqp \ne q.

D(pq)=p(x)log2q(x)p(x)log2(p(x)q(x)p(x))=log2(q(x))=log21=0 \begin{align*} -D(p \| q) &= \sum p(x) \log_{2} \dfrac{q(x)}{p(x)} \\ &\le \log_{2} \left( \sum p(x) \dfrac{q(x)}{p(x)} \right) \\ &= \log_{2} \left( \sum q(x) \right) \\ &= \log_{2} 1 \\ &= 0 \end{align*}

The inequality holds because the logarithm function is concave, thus verified by the Jensen’s inequality.

Jensen’s Inequality

If ff is a concave function, the following holds true for k=1nλk=1\sum_{k=1}^{n} \lambda_{k} = 1,

f(k=1nλkxk)k=1nλkf(xk) f\left( \sum\limits_{k=1}^{n}\lambda_{k}x_{k} \right) \ge \sum\limits_{k=1}^{n} \lambda_{k} f(x_{k})

Therefore, by multiplying both sides by negative one, we have:

0D(pq) 0 \le D(p \| q)

See Also