logo

Relative Entropy (Kullback-Leibler Divergence) in Classical Information Theory

Relative Entropy (Kullback-Leibler Divergence) in Classical Information Theory

Discrete random variables ’s probability mass function $p, q$ regarding, the relative entropy of $q$ with respect to $p$ is defined as follows.

$$ D(p \| q) := \sum p(x) \log_{2} \dfrac{p(x)}{q(x)} \tag{1} $$

For $p \ne 0$, it is defined as $p \log_{2}(\frac{p}{0}) := \infty$.

For continuous random variables, it is defined by integration.

$$ D(p \| q) := \int p(x) \ln \dfrac{p(x)}{q(x)} dx $$

Description

Relative entropy is also known as Kullback-Leibler divergence (KLd) and is noted by the following notations.

$$ D(p \| q) = D_{\text{KL}}(p \| q) = H(p \| q) $$

$D(p \| q)$ measures how inadequate it is to assume the distribution of $X$ as $q$ (when the actual distribution of $X$ is $p$), in other words, how much $q$ differs from $p$. Since $-\log q$ represents the information content of $q$, the definition $(1)$ means the average difference of information between $q$ and $p$.

$$ \begin{align*} \sum p(x) \log_{2} \dfrac{p(x)}{q(x)} &= \sum p(x) \big[ -\log_{2}q(x) - (-\log_{2}p(x)) \big] \\ &= \sum p(x) \big[ I(q(x)) - I(p(x)) \big] \\ &= E \big[ I(q) - I(p) \big] \end{align*} $$

Properties

  1. Non-symmetry $$ D(p \| q) \ne D(q \| p) $$

  2. Non-negativity $$ D(p \| q) \ge 0 $$ Equality holds when $p = q$.

Proof

2.

If $p=q$, then by definition $D(p \| q) = 0$, so let’s consider $p \ne q$.

$$ \begin{align*} -D(p \| q) &= \sum p(x) \log_{2} \dfrac{q(x)}{p(x)} \\ &\le \log_{2} \left( \sum p(x) \dfrac{q(x)}{p(x)} \right) \\ &= \log_{2} \left( \sum q(x) \right) \\ &= \log_{2} 1 \\ &= 0 \end{align*} $$

The inequality holds because the logarithm function is concave, according to the Jensen’s inequality.

Jensen’s inequality

If $f$ is a concave function, then the following holds. For $\sum_{k=1}^{n} \lambda_{k} = 1$,

$$ f\left( \sum\limits_{k=1}^{n}\lambda_{k}x_{k} \right) \ge \sum\limits_{k=1}^{n} \lambda_{k} f(x_{k}) $$

Therefore, by multiplying both sides by minus,

$$ 0 \le D(p \| q) $$

See also