logo

Joint Entropy 📂Probability Theory

Joint Entropy

Definition

Let us assume we have the joint probability mass function X1,,XnX_{1}, \cdots , X_{n} or joint probability density function ff given for random variables.

Discrete

H(X1,,Xn):=x1xnp(x1,,xn)log2p(x1,,xn) H \left( X_{1}, \cdots , X_{n} \right) := - \sum_{x_{1}} \cdots \sum_{x_{n}} p \left( x_{1} , \cdots , x_{n} \right) \log_{2} p \left( x_{1} , \cdots , x_{n} \right)

Continuous

H(X1,,Xn):=RRf(x1,,xn)log2f(x1,,xn)dx1dxn H \left( X_{1}, \cdots , X_{n} \right) := - \int_{\mathbb{R}} \cdots \int_{\mathbb{R}} f \left( x_{1} , \cdots , x_{n} \right) \log_{2} f \left( x_{1} , \cdots , x_{n} \right) d x_{1} \cdots d x_{n}

Theorem

Joint entropy has the following properties:

  • [1] Inequality: 0maxk=1n{H(Xk)}H(X1,,Xn)k=1nH(Xk) 0 \le \max_{k=1 \cdots n} \left\{ H \left( {X_{k}} \right) \right\} \le H \left( X_{1} , \cdots , X_{n} \right) \le \sum_{k=1}^{n} H \left( X_{k} \right) If X1,,XnX_{1} , \cdots , X_{n} are mutually independent, the last inequality becomes an equality. That is, H(X1,,Xn)=k=1nH(Xk) H \left( X_{1} , \cdots , X_{n} \right) = \sum_{k=1}^{n} H \left( X_{k} \right)
  • [2] Symmetry: H(X,Y)=H(Y,X) H \left( X, Y \right) = H \left( Y, X \right)

Explanation

maxk=1n{H(Xk)}H(X1,,Xn) \max_{k=1 \cdots n} \left\{ H \left( {X_{k}} \right) \right\} \le H \left( X_{1} , \cdots , X_{n} \right) An important point to note in the definition is that as the number of random variables increases, the entropy can grow but cannot decrease. This matches the intuition that as the number of random variables increases, the level of disorder also increases.

The significance of joint entropy is merely an extension of entropy. However, it’s essential to be familiar with its definition. The reason why it naturally leads to conditional entropy and appears differently from when discussing expectations of random variables is precisely because of log2\log_{2}.