logo

What is Conditional Entropy in Classical Information Theory?

What is Conditional Entropy in Classical Information Theory?

양자정보이론
[ 펼치기 · 접기 ]

Buildup1 2

For a discrete random variable XX and its probability mass function p(x)p(x), the entropy H(X)H(X) of XX is defined as follows.

H(X)=ip(xi)log2p(xi) H(X) = - \sum_{i} p(x_{i}) \log_{2}p(x_{i})

Similarly, for the joint probability mass function X,YX, Y of p(x,y)p(x,y), the joint entropy H(X,Y)H(X, Y) is defined as follows.

H(X,Y)=i,jp(xi,yj)log2p(xi,yj) H(X, Y) = - \sum_{i, j} p(x_{i}, y_{j}) \log_{2}p(x_{i}, y_{j})

Then, for the conditional probability p(xiyj)p(x_{i} | y_{j}) when Y=yjY = y_{j}, H(XY=yj)H(X | Y=y_{j}) can be defined as follows.

H(XY=yj)=ip(xiyj)log2p(xiyj) \begin{equation} H(X | Y=y_{j}) = - \sum_{i} p(x_{i} | y_{j}) \log_{2}p(x_{i} | y_{j}) \end{equation}

Naturally, H(XY)H(X | Y) can be defined as the expectation of H(XY=yj)H(X | Y=y_{j}) for all yjy_{j}. Let’s call this conditional entropy.

H(XY):=jp(yj)H(XY=yj)=i,jp(yj)p(xiyj)log2p(xiyj)=i,jp(xi,yj)log2p(xiyj) \begin{equation} \begin{aligned} H(X | Y) &:= \sum_{j} p(y_{j})H(X | Y=y_{j}) \\ &= -\sum_{i, j} p(y_{j})p(x_{i} | y_{j}) \log_{2}p(x_{i} | y_{j}) \\ &= -\sum_{i, j} p(x_{i}, y_{j}) \log_{2}p(x_{i} | y_{j}) \\ \end{aligned} \end{equation}

Definition

For a discrete random variable X,YX, Y, the conditional entropy is defined as follows.

H(XY)=i,jp(xi,yj)log2p(xiyj) \begin{equation} H(X | Y) = -\sum_{i, j} p(x_{i}, y_{j}) \log_{2}p(x_{i} | y_{j}) \end{equation}

If it’s a continuous random variable,

H(XY)=p(x,y)log2p(xy)dxdy H(X|Y) = - \int_{-\infty}^{\infty}p(x,y)\log_{2}p(x|y)dxdy

Description

When you first encounter the definition of conditional entropy as (3)(3), it might be hard to understand why we multiply by p(x,y)p(x,y) instead of p(xy)p(x|y). It’s easier to accept the formula for (3)(3) if you think of it as being defined in terms of the expectation of (1)(1). In other words, conditional entropy is the expectation of the information log2p(xy)-\log_{2}p(x | y) of conditional probabilities.

Properties

H(XY)=H(X,Y)H(Y) H(X | Y) = H(X, Y) - H(Y)

It can be derived directly from (2)(2).

H(XY)=i,jp(yj)p(xiyj)log2p(xiyj)=i,jp(xi,yj)log2p(xi,yj)p(yj)=i,jp(xi,yj)log2p(xi,yj)+i,jp(xi,yj)log2p(yj)=H(X,Y)+j(ip(xi,yj))log2p(yj)=H(X,Y)+jp(yj)log2p(yj)=H(X,Y)H(Y) \begin{align*} H(X | Y) &= - \sum_{i, j} p(y_{j})p(x_{i} | y_{j}) \log_{2}p(x_{i} | y_{j}) \\ &= - \sum_{i, j} p(x_{i}, y_{j}) \log_{2} \dfrac{p(x_{i}, y_{j})}{p(y_{j})} \\ &= - \sum_{i, j} p(x_{i}, y_{j}) \log_{2} p(x_{i}, y_{j}) + \sum_{i, j} p(x_{i}, y_{j}) \log_{2} p(y_{j}) \\ &= H(X, Y) + \sum_{j} \left( \sum_{i} p(x_{i}, y_{j}) \right) \log_{2} p(y_{j}) \\ &= H(X, Y) + \sum_{j} p(y_{j}) \log_{2} p(y_{j}) \\ &= H(X, Y) - H(Y) \end{align*}

Upon expanding, we get H(X,Y)=H(XY)+H(Y)H(X, Y) = H(X | Y) + H(Y), but since entropy is the log of probabilities, it’s acceptable to interpret that the product turns into a sum in p(x,y)=p(xy)p(y)p(x, y) = p(x | y) p(y).


  1. Stephen M. Barnett, Quantum Information (2009), p12 ↩︎

  2. 김영훈·허재성, 양자 정보 이론 (2020), p248-250 ↩︎