logo

What is Conditional Entropy in Classical Information Theory?

What is Conditional Entropy in Classical Information Theory?

양자정보이론
[ 펼치기 · 접기 ]

Buildup1 2

For a discrete random variable $X$ and its probability mass function $p(x)$, the entropy $H(X)$ of $X$ is defined as follows.

$$ H(X) = - \sum_{i} p(x_{i}) \log_{2}p(x_{i}) $$

Similarly, for the joint probability mass function $X, Y$ of $p(x,y)$, the joint entropy $H(X, Y)$ is defined as follows.

$$ H(X, Y) = - \sum_{i, j} p(x_{i}, y_{j}) \log_{2}p(x_{i}, y_{j}) $$

Then, for the conditional probability $p(x_{i} | y_{j})$ when $Y = y_{j}$, $H(X | Y=y_{j})$ can be defined as follows.

$$ \begin{equation} H(X | Y=y_{j}) = - \sum_{i} p(x_{i} | y_{j}) \log_{2}p(x_{i} | y_{j}) \end{equation} $$

Naturally, $H(X | Y)$ can be defined as the expectation of $H(X | Y=y_{j})$ for all $y_{j}$. Let’s call this conditional entropy.

$$ \begin{equation} \begin{aligned} H(X | Y) &:= \sum_{j} p(y_{j})H(X | Y=y_{j}) \\ &= -\sum_{i, j} p(y_{j})p(x_{i} | y_{j}) \log_{2}p(x_{i} | y_{j}) \\ &= -\sum_{i, j} p(x_{i}, y_{j}) \log_{2}p(x_{i} | y_{j}) \\ \end{aligned} \end{equation} $$

Definition

For a discrete random variable $X, Y$, the conditional entropy is defined as follows.

$$ \begin{equation} H(X | Y) = -\sum_{i, j} p(x_{i}, y_{j}) \log_{2}p(x_{i} | y_{j}) \end{equation} $$

If it’s a continuous random variable,

$$ H(X|Y) = - \int_{-\infty}^{\infty}p(x,y)\log_{2}p(x|y)dxdy $$

Description

When you first encounter the definition of conditional entropy as $(3)$, it might be hard to understand why we multiply by $p(x,y)$ instead of $p(x|y)$. It’s easier to accept the formula for $(3)$ if you think of it as being defined in terms of the expectation of $(1)$. In other words, conditional entropy is the expectation of the information $-\log_{2}p(x | y)$ of conditional probabilities.

Properties

$$ H(X | Y) = H(X, Y) - H(Y) $$

It can be derived directly from $(2)$.

$$ \begin{align*} H(X | Y) &= - \sum_{i, j} p(y_{j})p(x_{i} | y_{j}) \log_{2}p(x_{i} | y_{j}) \\ &= - \sum_{i, j} p(x_{i}, y_{j}) \log_{2} \dfrac{p(x_{i}, y_{j})}{p(y_{j})} \\ &= - \sum_{i, j} p(x_{i}, y_{j}) \log_{2} p(x_{i}, y_{j}) + \sum_{i, j} p(x_{i}, y_{j}) \log_{2} p(y_{j}) \\ &= H(X, Y) + \sum_{j} \left( \sum_{i} p(x_{i}, y_{j}) \right) \log_{2} p(y_{j}) \\ &= H(X, Y) + \sum_{j} p(y_{j}) \log_{2} p(y_{j}) \\ &= H(X, Y) - H(Y) \end{align*} $$

Upon expanding, we get $H(X, Y) = H(X | Y) + H(Y)$, but since entropy is the log of probabilities, it’s acceptable to interpret that the product turns into a sum in $p(x, y) = p(x | y) p(y)$.


  1. Stephen M. Barnett, Quantum Information (2009), p12 ↩︎

  2. 김영훈·허재성, 양자 정보 이론 (2020), p248-250 ↩︎