logo

Encoding and Decoding in Information Theory

Encoding and Decoding in Information Theory

양자정보이론
[ 펼치기 · 접기 ]

Definition1

Let’s define a set of alphabets as Σ\Sigma. An encoding scheme defined as below, with a compression rate of 0<α10 \lt \alpha \le 1, is called as follows:

fn:Σn{0,1}αn f_{n} : \Sigma^{n} \to \left\{ 0, 1 \right\}^{\lfloor \alpha n \rfloor}

Similarly, a decoding scheme with compression rate of α\alpha is defined as below:

gn:{0,1}αnΣn g_{n} : \left\{ 0, 1 \right\}^{\lfloor \alpha n \rfloor} \to \Sigma^{n}

Explanation

Simply put, the compression rate refers to the ratio of the number of bits 00 and 11 mapped to the alphabets, divided by the number of alphabets. For example, if the mapping is as LESSERAFIM101(2)\text{LESSERAFIM} \mapsto 101_{(2)}, the compression rate is calculated as follows:

0.3=310={1,0,1}{L,E,S,S,E,R,A,F,I,M} 0.3 = \dfrac{3}{10} = \dfrac{\left| \left\{ 1, 0, 1 \right\} \right|}{ \left| \left\{ L, E, S, S, E, R, A, F, I, M \right\} \right| }

With a large amount of data, it becomes difficult to efficiently store and transmit it. Thus, in information theory, there is interest in how efficiently a message mΣnm \in \Sigma^{n} can be compressed. Here, efficiency includes not just compression but also restoration. If mapped as m1m \mapsto 1, it may be highly efficient compression, but restoring this 11 back to its original message mm is impossible. Therefore, finding the minimal α\alpha that allows the existence of fn,gnf_{n}, g_{n} satisfying m=gn(fn(m))m = g_{n}(f_{n}(m)) is equivalent to efficiently compressing the message. This concept was mathematically organized by the father of information theory, Claude Shannon.

  • Noiseless coding theorem
  • Noisy channel coding theorem

This article aims to simplify the efficient compression of messages. Consider a scenario of sending a message of length nn consisting of 00 and 11. If the probability of 00 appearing is pp, then the probability of 11 appearing is 1p1-p.

Law of Large Numbers

Let {Xi}i=1n\left\{ X_{i} \right\}_{i=1}^{n} be independent random variables. If the average of XiX_{i} is μ\mu, then for any ϵ>0\epsilon \gt 0, the following holds:

limnP(1ni=1nXiμϵ)=0 \lim\limits_{n \to \infty} P\left( \left| \dfrac{1}{n}\sum\limits_{i=1}^{n}X_{i} - \mu \right| \ge \epsilon \right) = 0

The law of large numbers implies that, if the number of trials nn is sufficiently large, the sample mean 1ni=1nXi\dfrac{1}{n}\sum\limits_{i=1}^{n}X_{i} will be sufficiently close to the population mean μ\mu. Thus, when nn is large, the number of 00 in the message is npnp and the number of 11 is n(1p)=(nnp)n(1-p) = (n - np). The number of possible messages that can be formed with 00 and 11 is the number of ways to choose the npnp slots for 00 from nn slots, without regard to order, as follows:

n!(np)!(nnp)! \dfrac{n!}{(np)! (n-np)!}

The number of bits needed to distinguish these messages by numbering them in binary is as follows:

log2n!(np)!(nnp)! \begin{equation} \log_{2}\dfrac{n!}{(np)! (n-np)!} \end{equation}

Stirling’s Approximation

log2n!nlog2nnlog2e \log_{2} n! \approx n \log_{2} n - n \log_{2}e

Approximating (1)(1) using Stirling’s formula yields the following:

log2n!log2(np)!log2(nnp)!(nlog2nnlog2e)(nplog2(np)nplog2e)((nnp)log2(nnp)(nnp)log2e)=nlog2nnlog2enplog2(np)+nplog2e(nnp)log2(nnp)+(nnp)log2e=nlog2nnplog2nnplog2pnlog2n(1p)+nplog2n(1p)=nlog2nnplog2nnplog2pnlog2nnlog2(1p)+nplog2n+nplog2(1p)=nplog2pnlog2(1p)+nplog2(1p)=nplog2pn(1p)log2(1p)=n(plog2p(1p)log2(1p)) \begin{align*} & \log_{2} n! - \log_{2}(np)! - \log_{2}(n-np)! \\ &\approx \left( n \log_{2} n - n \log_{2}e \right) - \left( np \log_{2} (np) - np \log_{2}e \right) \\ &\qquad - \left( (n-np) \log_{2} (n-np) - (n-np) \log_{2}e \right) \\ &= n \log_{2} n - {\color{red}\cancel{\color{black}n \log_{2}e}} - np \log_{2} (np) + {\color{green}\cancel{\color{black}np \log_{2}e}} \\ &\qquad - (n-np) \log_{2} (n-np) + ({\color{red}\cancel{\color{black}n}}-{\color{green}\cancel{\color{black}np}}) \log_{2}e \\ &= n \log_{2} n - np \log_{2} n - np \log_{2} p - n\log_{2} n(1-p) + np\log_{2} n(1-p) \\ &= n \log_{2} n - np \log_{2} n - np \log_{2} p \\ &\qquad - n\log_{2} n - n\log_{2}(1-p) + np\log_{2} n + np\log_{2} (1-p) \\ &= - np \log_{2} p - n\log_{2}(1-p) + np\log_{2} (1-p) \\ &= - np \log_{2} p - n(1-p)\log_{2}(1-p) \\ &= n\Big(- p \log_{2} p - (1-p)\log_{2}(1-p) \Big) \end{align*}

The term in the parenthesis is the entropy. Therefore, we obtain the following:

log2n!(np)!(nnp)!nH(p) \log_{2}\dfrac{n!}{(np)! (n-np)!} \approx nH(p)

Thus, when nn is sufficiently large, coding with nH(p)nH(p) bits is efficient, and the efficient compression rate is H(p)=nH(p)nH(p) = \dfrac{nH(p)}{n}. This result implies that the entropy of the random variable XX represents the compression efficiency of messages generated by independent trials of XX.

To generalize, if the probability of a certain alphabet xix_{i} appearing is p(xi)p(x_{i}), then the average number of times alphabet xix_{i} appears in the message is np(xi)np(x_{i}). The number of messages of length nn made using k2k \ge 2 alphabets is as follows:

n!1ik(np(xi))! \dfrac{n!}{\prod\limits_{1\le i \le k}(np(x_{i}))!}

By numbering these messages in binary,

log2n!1ik(np(xi))!=log2n!1iklog2(np(xi))!(nlog2nnlog2e)1ik(np(xi)log2(np(xi))nplog2e)=n1ikp(xi)log2p(xi)=nH \begin{align*} &\log_{2} \dfrac{n!}{\prod\limits_{1\le i \le k}(np(x_{i}))!} \\ &= \log_{2}n! - \sum\limits_{1\le i \le k}\log_{2}(np(x_{i}))! \\ &\approx \left( n \log_{2} n - n \log_{2}e \right) - \sum\limits_{1\le i \le k}\Big( np(x_{i})\log_{2}(np(x_{i})) - np\log_{2}e \Big) \\ &= - n\sum\limits_{1\le i \le k} p(x_{i})\log_{2}p(x_{i}) \\ &= - nH \end{align*}


  1. 김영훈·허재성, 양자 정보 이론 (2020), p258~261 ↩︎