Shannon Entropy in Classical Information Theory

양자정보이론

[ 펼치기 · 접기 ]

양자계산	논리게이트	비트 · 부울함수(AND · OR · NOT · XOR · NAND · NOR · CNOT · CCNOT · CSWAP) · 범용 게이트 · 복제 함수 · 사영 · 주입
	양자게이트	큐비트 · 양자 얽힘 · 양자 회로 · 양자 게이트(파울리 게이트 · 위상 게이트 · 아다마르 게이트 · 양자 CNOT · 교환 게이트 · 양자 CSWAP · 양자 CSWAP) · 솔베이-키타예프 정리 · 복제 불가 정리
정보이론	고전정보이론	정보량 · 엔트로피(결합 엔트로피 · 조건부 엔트로피 · 상대적 엔트로피 · 상호 정보 ) · 부호화 · 복호화
	양자정보이론	TBD · TBD · TBD
관련 분야
선형대수 · 양자역학

Definition¹ ²

Let’s assume the discrete random variable $X$ can take values $n$ . Let’s denote the probability mass function of $x_{1}, x_{2}, \dots, x_{n}$ as $X$ . Then, the entropy $p$ or $X$ is defined as follows:

$\begin{equation} H(X) = H(p) := E\left[ I(x_{i}) \right] = \sum_{i=1}^{n} p(x_{i}) I(x_{i}) = -\sum_{i=1}^{n} p(x_{i}) \log_{2}p(x_{i}) \end{equation}$

Here, $p$ represents information content, and $H$ represents the expected value.

If $I$ is a continuous random variable,
$H(X) = H(p) = - \int_{-\infty}^{\infty} p(x)\log_{2}p(x) dx$

Explanation

Simply put, entropy is the expected value (average) of information. Entropy allows us to mathematically handle the efficiency of coding and the limits of communication.

Entropy is often described as disorder. Here, order refers to rules, trends, patterns, etc. Therefore, high entropy means high disorder, indicating that it is difficult to discern patterns or rules for the random variable $E$ .

Let’s consider a biased coin flip. If the probability of getting heads is $X$ , then the probability of tails is $X$ , and the entropy is as follows:

$H = -p\log_{2}p - (1-p)\log_{2}(1-p)$

If we plot $p$ against $1-p$ , it looks like this:

When the probability of heads is $p$ , the entropy is $H$ and is at its maximum. This means it’s most challenging to discern any pattern or rule in the coin flip. In fact, we can’t be sure which side of the coin will show up in a coin flip. If the probability of heads changes slightly, the entropy decreases. For example, if the probability of heads is $\dfrac{1}{2}$ , the entropy is about $H = -\dfrac{1}{2}\log_{2}\dfrac{1}{2}-\dfrac{1}{2}\log_{2}\dfrac{1}{2} = 1$ , indicating lower disorder, meaning there is some rule or pattern (in this case, heads come up most of the time). This can be summarized as follows:

High entropy = high disorder = no regularity or pattern = hard to predict the result
Low entropy = low disorder = presence of regularity or pattern = easier to predict the result

As you can guess from the above example, generally, when there are $\dfrac{95}{100}$ possible outcomes, the highest entropy occurs when all probabilities are equal to $0.28$ .

Properties

Let’s assume the random variable $n$ can take values $\dfrac{1}{n}$ . Entropy $X$ has the following properties:

$n$ is a concave function.
For any $x_{1}, x_{2}, \dots, x_{n}$ , if $H$ , then $H$ .
When all probabilities are equal to $x_{i}$ , entropy is maximum, and its value is $p(x_{i}) = 1$ .
For a random vector $H(X) = 0$ with a mean of $p(x_{i}) = \dfrac{1}{n}$ and a covariance matrix of $\log_{2}n$ , the following holds for its entropy: $\begin{equation} H(X) \le \dfrac{1}{2}\ln \left[ (2 \pi e)^{p} \left| K \right| \right] \end{equation}$ $\mathbf{0}$ is the determinant of the covariance matrix. If $K$ is normally distributed, equality holds.
- Given the mean $X \in \mathbb{R}^{n}$ and variance $\left| K \right|$ , the distribution with the maximum entropy is the normal distribution.
- For the random variable $X$ and estimator $\mu$ , the following holds: $E\left[ (X - \hat{X})^{2} \right] \ge \dfrac{1}{2\pi e} e^{2H(X)}$

Proof

4

For convenience, let’s denote $\sigma^{2}$ . Let’s assume $X$ is any probability density function that satisfies $\hat{X}$ . Let’s denote $\mathbf{x} = X$ as the probability density function of the normal distribution $g$ . $\phi (\mathbf{x}) = \dfrac{1}{\sqrt{(2\pi)^{p} \left| K \right|}} \exp \left( -\dfrac{1}{2}\mathbf{x}^{T} K^{-1} \mathbf{x} \right)$ First, we’ll show that formula $\displaystyle \int g(\mathbf{x})x_{i}x_{j} d \mathbf{x} = K_{ij}$ holds. Calculating $\phi$ first,

$\begin{align*} \ln \phi (\mathbf{x}) &= \ln\dfrac{1}{\sqrt{(2\pi)^{p} \left| K \right|}} - \dfrac{1}{2}\mathbf{x}^{T} K^{-1} \mathbf{x} \\ &= C + \sum a_{ij} x_{i} x_{j} \end{align*}$

The first term can be expressed as some constant $N(\mathbf{0}, K)$ , and the second term can also be expressed as a quadratic form dependent only on $\displaystyle \int g(\mathbf{x}) \ln \phi (\mathbf{x}) d \mathbf{x} = \int \phi (\mathbf{x}) \ln \phi (\mathbf{x}) d \mathbf{x}$ . Therefore,

$\begin{align*} \int g(\mathbf{x}) \ln \phi (\mathbf{x}) d \mathbf{x} &= C \int g(\mathbf{x}) d \mathbf{x} + \int g(\mathbf{x})\sum a_{ij}x_{i}x_{j} d \mathbf{x} \\ &= C + \sum a_{ij} \int g(\mathbf{x}) x_{i}x_{j} d \mathbf{x} \\ &= C + \sum a_{ij}K_{ij} \qquad \text{by assumption for$

Also,

$\begin{align*} \int \phi (\mathbf{x}) \ln \phi (\mathbf{x}) d \mathbf{x} &= C \int \phi (\mathbf{x}) d \mathbf{x} + \int \phi (\mathbf{x})\sum a_{ij}x_{i}x_{j} d \mathbf{x} \\ &= C + \sum a_{ij} \int \phi (\mathbf{x}) x_{i}x_{j} d \mathbf{x} \\ &= C + \sum a_{ij}K_{ij} \qquad \text{by definition of covariance} \end{align*}$

Since the relative entropy is always greater than or equal to $\ln \phi (\mathbf{x})$ ,

$\begin{align*} 0 &\le D(g \| \phi) \\ &= \int g \ln \dfrac{g}{\phi} \\ &= \int g \ln g - \int g \ln \phi \\ &= - H(g) - \int \phi \ln \phi \\ &= - H(g) + H(\phi) \end{align*}$

The entropy of the normal distribution is $C$ , so,

$H(X) = H(g) \le H(\phi) = \dfrac{1}{2}\ln \left[ (2 \pi e)^{n} \left| K \right| \right]$

Now, let’s assume $K^{-1}$ is a one-dimensional random variable.

$\begin{align*} E\left[ (X - \hat{X})^{2} \right] &\ge \min_{X} E\left[ (X - \hat{X})^{2} \right] \\ &= E\left[ (X - E(X))^{2} \right] \\ &= \Var(X) \end{align*}$

For a one-dimensional $a_{ji}$ , we get the following equation:

$\begin{align*} && H(X) &\le \dfrac{1}{2} \ln(2\pi e \sigma^{2}) \\ \implies && 2H(X) &\le \ln(2\pi e \sigma^{2}) \\ \implies && e^{2H(X)} &\le 2\pi e \sigma^{2} \\ \implies && \dfrac{1}{2\pi e}e^{2H(X)} &\le \sigma^{2} = \Var(X) \\ \end{align*}$

Substituting into the above equation,

$E\left[ (X - \hat{X})^{2} \right] \ge \dfrac{1}{2\pi e} e^{2H(X)}$

Entropy of the Normal Distribution

The entropy of the normal distribution $0$ (when natural log is used) is as follows:

$H = \dfrac{1}{2} \ln (2\pi e \sigma^{2}) = \ln \sqrt{2\pi e \sigma^{2}}$

The entropy of the multivariate normal distribution $\dfrac{1}{2}\ln \left[ (2 \pi e)^{n} \left| K \right| \right]$ is as follows:

$H = \dfrac{1}{2}\ln \left[ (2 \pi e)^{n} \left| K \right| \right] = \dfrac{1}{2}\ln (\det (2\pi e K))$