Conditional Entropy
📂Probability TheoryConditional Entropy
Definition
A joint probability mass function p or a joint probability density function f is given for the random variable X1,⋯,Xn. The conditional entropy of X1,⋯,Xn given that Xk is given can be stated as H(X1,⋯,Xn∣Xk).
Discrete
H(X1,⋯,Xn∣Xk):=−x1∑⋯xn∑p(x1,⋯,xn)log2p(xk)p(x1,⋯,xn)
Continuous
H(X1,⋯,Xn∣Xk):=−∫R⋯∫Rf(x1,⋯,xn)log2f(xk)f(x1,⋯,xn)dx1⋯dxn
- The expression between X1⋯Xn doesn’t have Xk, although it’s messy and not written precisely. However, between x1,⋯,xn, there is xk.
Theorem
- [1] For two random variables X,Y, the following holds:
H(X,Y)=H(X)+H(Y∣X)
Especially, if X and Y are independent
H(X∣Y)=H(X)H(Y∣X)=H(Y)
- [2] Chain Rule:
H(X1,⋯,Xn)==H(X1)+H(Xk∣X1,⋯,Xk−1)H(X1)+H(X2∣X1)+H(X3∣X1,X2)+⋯+H(Xn)+H(Xk∣X1,⋯,Xn−1)
Explanation
Simply put, it is the entropy when additional conditions are given from the joint entropy. If we intuitively understand the formula,
H(Y∣X)=H(X,Y)−H(X)
can be seen as the uncertainty that has been resolved by the information of X from the original disorder of H(X,Y). The chain rule is a generalization of this.