What is Conditional Entropy in Classical Information Theory?

양자정보이론

[ 펼치기 · 접기 ]

양자계산	논리게이트	비트 · 부울함수(AND · OR · NOT · XOR · NAND · NOR · CNOT · CCNOT · CSWAP) · 범용 게이트 · 복제 함수 · 사영 · 주입
	양자게이트	큐비트 · 양자 얽힘 · 양자 회로 · 양자 게이트(파울리 게이트 · 위상 게이트 · 아다마르 게이트 · 양자 CNOT · 교환 게이트 · 양자 CSWAP · 양자 CSWAP) · 솔베이-키타예프 정리 · 복제 불가 정리
정보이론	고전정보이론	정보량 · 엔트로피(결합 엔트로피 · 조건부 엔트로피 · 상대적 엔트로피 · 상호 정보 ) · 부호화 · 복호화
	양자정보이론	TBD · TBD · TBD
관련 분야
선형대수 · 양자역학

Buildup¹ ²

For a discrete random variable $X$ and its probability mass function $p(x)$ , the entropy $H(X)$ of $X$ is defined as follows.

$H(X) = - \sum_{i} p(x_{i}) \log_{2}p(x_{i})$

Similarly, for the joint probability mass function $X, Y$ of $p(x,y)$ , the joint entropy $H(X, Y)$ is defined as follows.

$H(X, Y) = - \sum_{i, j} p(x_{i}, y_{j}) \log_{2}p(x_{i}, y_{j})$

Then, for the conditional probability $p(x_{i} | y_{j})$ when $Y = y_{j}$ , $H(X | Y=y_{j})$ can be defined as follows.

$\begin{equation} H(X | Y=y_{j}) = - \sum_{i} p(x_{i} | y_{j}) \log_{2}p(x_{i} | y_{j}) \end{equation}$

Naturally, $H(X | Y)$ can be defined as the expectation of $H(X | Y=y_{j})$ for all $y_{j}$ . Let’s call this conditional entropy.

$\begin{equation} \begin{aligned} H(X | Y) &:= \sum_{j} p(y_{j})H(X | Y=y_{j}) \\ &= -\sum_{i, j} p(y_{j})p(x_{i} | y_{j}) \log_{2}p(x_{i} | y_{j}) \\ &= -\sum_{i, j} p(x_{i}, y_{j}) \log_{2}p(x_{i} | y_{j}) \\ \end{aligned} \end{equation}$

Definition

For a discrete random variable $X, Y$ , the conditional entropy is defined as follows.

$\begin{equation} H(X | Y) = -\sum_{i, j} p(x_{i}, y_{j}) \log_{2}p(x_{i} | y_{j}) \end{equation}$

If it’s a continuous random variable,
$H(X|Y) = - \int_{-\infty}^{\infty}p(x,y)\log_{2}p(x|y)dxdy$

Description

When you first encounter the definition of conditional entropy as $(3)$ , it might be hard to understand why we multiply by $p(x,y)$ instead of $p(x|y)$ . It’s easier to accept the formula for $(3)$ if you think of it as being defined in terms of the expectation of $(1)$ . In other words, conditional entropy is the expectation of the information $-\log_{2}p(x | y)$ of conditional probabilities.

Properties

$H(X | Y) = H(X, Y) - H(Y)$

It can be derived directly from $(2)$ .

$\begin{align*} H(X | Y) &= - \sum_{i, j} p(y_{j})p(x_{i} | y_{j}) \log_{2}p(x_{i} | y_{j}) \\ &= - \sum_{i, j} p(x_{i}, y_{j}) \log_{2} \dfrac{p(x_{i}, y_{j})}{p(y_{j})} \\ &= - \sum_{i, j} p(x_{i}, y_{j}) \log_{2} p(x_{i}, y_{j}) + \sum_{i, j} p(x_{i}, y_{j}) \log_{2} p(y_{j}) \\ &= H(X, Y) + \sum_{j} \left( \sum_{i} p(x_{i}, y_{j}) \right) \log_{2} p(y_{j}) \\ &= H(X, Y) + \sum_{j} p(y_{j}) \log_{2} p(y_{j}) \\ &= H(X, Y) - H(Y) \end{align*}$

Upon expanding, we get $H(X, Y) = H(X | Y) + H(Y)$ , but since entropy is the log of probabilities, it’s acceptable to interpret that the product turns into a sum in $p(x, y) = p(x | y) p(y)$ .

Stephen M. Barnett, Quantum Information (2009), p12 ↩︎
김영훈·허재성, 양자 정보 이론 (2020), p248-250 ↩︎