Let’s define a set of alphabets as Σ. An encoding scheme defined as below, with a compression rate of 0<α≤1, is called as follows:
fn:Σn→{0,1}⌊αn⌋
Similarly, a decoding scheme with compression rate of α is defined as below:
gn:{0,1}⌊αn⌋→Σn
Explanation
Simply put, the compression rate refers to the ratio of the number of bits 0 and 1 mapped to the alphabets, divided by the number of alphabets. For example, if the mapping is as LESSERAFIM↦101(2), the compression rate is calculated as follows:
0.3=103=∣{L,E,S,S,E,R,A,F,I,M}∣∣{1,0,1}∣
With a large amount of data, it becomes difficult to efficiently store and transmit it. Thus, in information theory, there is interest in how efficiently a message m∈Σn can be compressed. Here, efficiency includes not just compression but also restoration. If mapped as m↦1, it may be highly efficient compression, but restoring this 1 back to its original message m is impossible. Therefore, finding the minimal α that allows the existence of fn,gn satisfying m=gn(fn(m)) is equivalent to efficiently compressing the message. This concept was mathematically organized by the father of information theory, Claude Shannon.
Noiseless coding theorem
Noisy channel coding theorem
This article aims to simplify the efficient compression of messages. Consider a scenario of sending a message of length n consisting of 0 and 1. If the probability of 0 appearing is p, then the probability of 1 appearing is 1−p.
Let {Xi}i=1n be independent random variables. If the average of Xi is μ, then for any ϵ>0, the following holds:
n→∞limP(n1i=1∑nXi−μ≥ϵ)=0
The law of large numbers implies that, if the number of trials n is sufficiently large, the sample mean n1i=1∑nXi will be sufficiently close to the population mean μ. Thus, when n is large, the number of 0 in the message is np and the number of 1 is n(1−p)=(n−np). The number of possible messages that can be formed with 0 and 1 is the number of ways to choose the np slots for 0 from n slots, without regard to order, as follows:
(np)!(n−np)!n!
The number of bits needed to distinguish these messages by numbering them in binary is as follows:
The term in the parenthesis is the entropy. Therefore, we obtain the following:
log2(np)!(n−np)!n!≈nH(p)
Thus, when n is sufficiently large, coding with nH(p) bits is efficient, and the efficient compression rate is H(p)=nnH(p). This result implies that the entropy of the random variable X represents the compression efficiency of messages generated by independent trials of X.
To generalize, if the probability of a certain alphabet xi appearing is p(xi), then the average number of times alphabet xi appears in the message is np(xi). The number of messages of length n made using k≥2 alphabets is as follows: