Encoding and Decoding in Information Theory

양자정보이론

[ 펼치기 · 접기 ]

양자계산	논리게이트	비트 · 부울함수(AND · OR · NOT · XOR · NAND · NOR · CNOT · CCNOT · CSWAP) · 범용 게이트 · 복제 함수 · 사영 · 주입
	양자게이트	큐비트 · 양자 얽힘 · 양자 회로 · 양자 게이트(파울리 게이트 · 위상 게이트 · 아다마르 게이트 · 양자 CNOT · 교환 게이트 · 양자 CSWAP · 양자 CSWAP) · 솔베이-키타예프 정리 · 복제 불가 정리
정보이론	고전정보이론	정보량 · 엔트로피(결합 엔트로피 · 조건부 엔트로피 · 상대적 엔트로피 · 상호 정보 ) · 부호화 · 복호화
	양자정보이론	TBD · TBD · TBD
관련 분야
선형대수 · 양자역학

Definition¹

Let’s define a set of alphabets as $\Sigma$ . An encoding scheme defined as below, with a compression rate of $0 \lt \alpha \le 1$ , is called as follows:

$f_{n} : \Sigma^{n} \to \left\{ 0, 1 \right\}^{\lfloor \alpha n \rfloor}$

Similarly, a decoding scheme with compression rate of $\alpha$ is defined as below:

$g_{n} : \left\{ 0, 1 \right\}^{\lfloor \alpha n \rfloor} \to \Sigma^{n}$

We refer to $f_{n}$ as the encoder, and $g_{n}$ as the decoder.

Explanation

Simply put, the compression rate refers to the ratio of the number of bits $0$ and $1$ mapped to the alphabets, divided by the number of alphabets. For example, if the mapping is as $\text{LESSERAFIM} \mapsto 101_{(2)}$ , the compression rate is calculated as follows:

$0.3 = \dfrac{3}{10} = \dfrac{\left| \left\{ 1, 0, 1 \right\} \right|}{ \left| \left\{ L, E, S, S, E, R, A, F, I, M \right\} \right| }$

With a large amount of data, it becomes difficult to efficiently store and transmit it. Thus, in information theory, there is interest in how efficiently a message $m \in \Sigma^{n}$ can be compressed. Here, efficiency includes not just compression but also restoration. If mapped as $m \mapsto 1$ , it may be highly efficient compression, but restoring this $1$ back to its original message $m$ is impossible. Therefore, finding the minimal $\alpha$ that allows the existence of $f_{n}, g_{n}$ satisfying $m = g_{n}(f_{n}(m))$ is equivalent to efficiently compressing the message. This concept was mathematically organized by the father of information theory, Claude Shannon.

Noiseless coding theorem
Noisy channel coding theorem

This article aims to simplify the efficient compression of messages. Consider a scenario of sending a message of length $n$ consisting of $0$ and $1$ . If the probability of $0$ appearing is $p$ , then the probability of $1$ appearing is $1-p$ .

Law of Large Numbers
Let $\left\{ X_{i} \right\}_{i=1}^{n}$ be independent random variables. If the average of $X_{i}$ is $\mu$ , then for any $\epsilon \gt 0$ , the following holds:
$\lim\limits_{n \to \infty} P\left( \left| \dfrac{1}{n}\sum\limits_{i=1}^{n}X_{i} - \mu \right| \ge \epsilon \right) = 0$

The law of large numbers implies that, if the number of trials $n$ is sufficiently large, the sample mean $\dfrac{1}{n}\sum\limits_{i=1}^{n}X_{i}$ will be sufficiently close to the population mean $\mu$ . Thus, when $n$ is large, the number of $0$ in the message is $np$ and the number of $1$ is $n(1-p) = (n - np)$ . The number of possible messages that can be formed with $0$ and $1$ is the number of ways to choose the $np$ slots for $0$ from $n$ slots, without regard to order, as follows:

$\dfrac{n!}{(np)! (n-np)!}$

The number of bits needed to distinguish these messages by numbering them in binary is as follows:

$\begin{equation} \log_{2}\dfrac{n!}{(np)! (n-np)!} \end{equation}$

Stirling’s Approximation
$\log_{2} n! \approx n \log_{2} n - n \log_{2}e$

Approximating $(1)$ using Stirling’s formula yields the following:

$\begin{align*} & \log_{2} n! - \log_{2}(np)! - \log_{2}(n-np)! \\ &\approx \left( n \log_{2} n - n \log_{2}e \right) - \left( np \log_{2} (np) - np \log_{2}e \right) \\ &\qquad - \left( (n-np) \log_{2} (n-np) - (n-np) \log_{2}e \right) \\ &= n \log_{2} n - {\color{red}\cancel{\color{black}n \log_{2}e}} - np \log_{2} (np) + {\color{green}\cancel{\color{black}np \log_{2}e}} \\ &\qquad - (n-np) \log_{2} (n-np) + ({\color{red}\cancel{\color{black}n}}-{\color{green}\cancel{\color{black}np}}) \log_{2}e \\ &= n \log_{2} n - np \log_{2} n - np \log_{2} p - n\log_{2} n(1-p) + np\log_{2} n(1-p) \\ &= n \log_{2} n - np \log_{2} n - np \log_{2} p \\ &\qquad - n\log_{2} n - n\log_{2}(1-p) + np\log_{2} n + np\log_{2} (1-p) \\ &= - np \log_{2} p - n\log_{2}(1-p) + np\log_{2} (1-p) \\ &= - np \log_{2} p - n(1-p)\log_{2}(1-p) \\ &= n\Big(- p \log_{2} p - (1-p)\log_{2}(1-p) \Big) \end{align*}$

The term in the parenthesis is the entropy. Therefore, we obtain the following:

$\log_{2}\dfrac{n!}{(np)! (n-np)!} \approx nH(p)$

Thus, when $n$ is sufficiently large, coding with $nH(p)$ bits is efficient, and the efficient compression rate is $H(p) = \dfrac{nH(p)}{n}$ . This result implies that the entropy of the random variable $X$ represents the compression efficiency of messages generated by independent trials of $X$ .

To generalize, if the probability of a certain alphabet $x_{i}$ appearing is $p(x_{i})$ , then the average number of times alphabet $x_{i}$ appears in the message is $np(x_{i})$ . The number of messages of length $n$ made using $k \ge 2$ alphabets is as follows:

$\dfrac{n!}{\prod\limits_{1\le i \le k}(np(x_{i}))!}$

By numbering these messages in binary,

$\begin{align*} &\log_{2} \dfrac{n!}{\prod\limits_{1\le i \le k}(np(x_{i}))!} \\ &= \log_{2}n! - \sum\limits_{1\le i \le k}\log_{2}(np(x_{i}))! \\ &\approx \left( n \log_{2} n - n \log_{2}e \right) - \sum\limits_{1\le i \le k}\Big( np(x_{i})\log_{2}(np(x_{i})) - np\log_{2}e \Big) \\ &= - n\sum\limits_{1\le i \le k} p(x_{i})\log_{2}p(x_{i}) \\ &= - nH \end{align*}$

김영훈·허재성, 양자 정보 이론 (2020), p258~261 ↩︎

Encoding and Decoding in Information Theory

Definition1

Explanation

Definition¹