logo

Encoding and Decoding in Information Theory

Encoding and Decoding in Information Theory

양자정보이론
[ 펼치기 · 접기 ]

Definition1

Let’s define a set of alphabets as $\Sigma$. An encoding scheme defined as below, with a compression rate of $0 \lt \alpha \le 1$, is called as follows:

$$ f_{n} : \Sigma^{n} \to \left\{ 0, 1 \right\}^{\lfloor \alpha n \rfloor} $$

Similarly, a decoding scheme with compression rate of $\alpha$ is defined as below:

$$ g_{n} : \left\{ 0, 1 \right\}^{\lfloor \alpha n \rfloor} \to \Sigma^{n} $$

Explanation

Simply put, the compression rate refers to the ratio of the number of bits $0$ and $1$ mapped to the alphabets, divided by the number of alphabets. For example, if the mapping is as $\text{LESSERAFIM} \mapsto 101_{(2)}$, the compression rate is calculated as follows:

$$ 0.3 = \dfrac{3}{10} = \dfrac{\left| \left\{ 1, 0, 1 \right\} \right|}{ \left| \left\{ L, E, S, S, E, R, A, F, I, M \right\} \right| } $$

With a large amount of data, it becomes difficult to efficiently store and transmit it. Thus, in information theory, there is interest in how efficiently a message $m \in \Sigma^{n}$ can be compressed. Here, efficiency includes not just compression but also restoration. If mapped as $m \mapsto 1$, it may be highly efficient compression, but restoring this $1$ back to its original message $m$ is impossible. Therefore, finding the minimal $\alpha$ that allows the existence of $f_{n}, g_{n}$ satisfying $m = g_{n}(f_{n}(m))$ is equivalent to efficiently compressing the message. This concept was mathematically organized by the father of information theory, Claude Shannon.

  • Noiseless coding theorem
  • Noisy channel coding theorem

This article aims to simplify the efficient compression of messages. Consider a scenario of sending a message of length $n$ consisting of $0$ and $1$. If the probability of $0$ appearing is $p$, then the probability of $1$ appearing is $1-p$.

Law of Large Numbers

Let $\left\{ X_{i} \right\}_{i=1}^{n}$ be independent random variables. If the average of $X_{i}$ is $\mu$, then for any $\epsilon \gt 0$, the following holds:

$$ \lim\limits_{n \to \infty} P\left( \left| \dfrac{1}{n}\sum\limits_{i=1}^{n}X_{i} - \mu \right| \ge \epsilon \right) = 0 $$

The law of large numbers implies that, if the number of trials $n$ is sufficiently large, the sample mean $\dfrac{1}{n}\sum\limits_{i=1}^{n}X_{i}$ will be sufficiently close to the population mean $\mu$. Thus, when $n$ is large, the number of $0$ in the message is $np$ and the number of $1$ is $n(1-p) = (n - np)$. The number of possible messages that can be formed with $0$ and $1$ is the number of ways to choose the $np$ slots for $0$ from $n$ slots, without regard to order, as follows:

$$ \dfrac{n!}{(np)! (n-np)!} $$

The number of bits needed to distinguish these messages by numbering them in binary is as follows:

$$ \begin{equation} \log_{2}\dfrac{n!}{(np)! (n-np)!} \end{equation} $$

Stirling’s Approximation

$$ \log_{2} n! \approx n \log_{2} n - n \log_{2}e $$

Approximating $(1)$ using Stirling’s formula yields the following:

$$ \begin{align*} & \log_{2} n! - \log_{2}(np)! - \log_{2}(n-np)! \\ &\approx \left( n \log_{2} n - n \log_{2}e \right) - \left( np \log_{2} (np) - np \log_{2}e \right) \\ &\qquad - \left( (n-np) \log_{2} (n-np) - (n-np) \log_{2}e \right) \\ &= n \log_{2} n - {\color{red}\cancel{\color{black}n \log_{2}e}} - np \log_{2} (np) + {\color{green}\cancel{\color{black}np \log_{2}e}} \\ &\qquad - (n-np) \log_{2} (n-np) + ({\color{red}\cancel{\color{black}n}}-{\color{green}\cancel{\color{black}np}}) \log_{2}e \\ &= n \log_{2} n - np \log_{2} n - np \log_{2} p - n\log_{2} n(1-p) + np\log_{2} n(1-p) \\ &= n \log_{2} n - np \log_{2} n - np \log_{2} p \\ &\qquad - n\log_{2} n - n\log_{2}(1-p) + np\log_{2} n + np\log_{2} (1-p) \\ &= - np \log_{2} p - n\log_{2}(1-p) + np\log_{2} (1-p) \\ &= - np \log_{2} p - n(1-p)\log_{2}(1-p) \\ &= n\Big(- p \log_{2} p - (1-p)\log_{2}(1-p) \Big) \end{align*} $$

The term in the parenthesis is the entropy. Therefore, we obtain the following:

$$ \log_{2}\dfrac{n!}{(np)! (n-np)!} \approx nH(p) $$

Thus, when $n$ is sufficiently large, coding with $nH(p)$ bits is efficient, and the efficient compression rate is $H(p) = \dfrac{nH(p)}{n}$. This result implies that the entropy of the random variable $X$ represents the compression efficiency of messages generated by independent trials of $X$.

To generalize, if the probability of a certain alphabet $x_{i}$ appearing is $p(x_{i})$, then the average number of times alphabet $x_{i}$ appears in the message is $np(x_{i})$. The number of messages of length $n$ made using $k \ge 2$ alphabets is as follows:

$$ \dfrac{n!}{\prod\limits_{1\le i \le k}(np(x_{i}))!} $$

By numbering these messages in binary,

$$ \begin{align*} &\log_{2} \dfrac{n!}{\prod\limits_{1\le i \le k}(np(x_{i}))!} \\ &= \log_{2}n! - \sum\limits_{1\le i \le k}\log_{2}(np(x_{i}))! \\ &\approx \left( n \log_{2} n - n \log_{2}e \right) - \sum\limits_{1\le i \le k}\Big( np(x_{i})\log_{2}(np(x_{i})) - np\log_{2}e \Big) \\ &= - n\sum\limits_{1\le i \le k} p(x_{i})\log_{2}p(x_{i}) \\ &= - nH \end{align*} $$


  1. 김영훈·허재성, 양자 정보 이론 (2020), p258~261 ↩︎