Joint Entropy

Definition

Let us assume we have the joint probability mass function $X_{1}, \cdots , X_{n}$ or joint probability density function $f$ given for random variables.

Discrete

$H \left( X_{1}, \cdots , X_{n} \right) := - \sum_{x_{1}} \cdots \sum_{x_{n}} p \left( x_{1} , \cdots , x_{n} \right) \log_{2} p \left( x_{1} , \cdots , x_{n} \right)$

Continuous

$H \left( X_{1}, \cdots , X_{n} \right) := - \int_{\mathbb{R}} \cdots \int_{\mathbb{R}} f \left( x_{1} , \cdots , x_{n} \right) \log_{2} f \left( x_{1} , \cdots , x_{n} \right) d x_{1} \cdots d x_{n}$

Theorem

Joint entropy has the following properties:

[1] Inequality: $0 \le \max_{k=1 \cdots n} \left\{ H \left( {X_{k}} \right) \right\} \le H \left( X_{1} , \cdots , X_{n} \right) \le \sum_{k=1}^{n} H \left( X_{k} \right)$ If $X_{1} , \cdots , X_{n}$ are mutually independent, the last inequality becomes an equality. That is, $H \left( X_{1} , \cdots , X_{n} \right) = \sum_{k=1}^{n} H \left( X_{k} \right)$
[2] Symmetry: $H \left( X, Y \right) = H \left( Y, X \right)$

Explanation

$\max_{k=1 \cdots n} \left\{ H \left( {X_{k}} \right) \right\} \le H \left( X_{1} , \cdots , X_{n} \right)$ An important point to note in the definition is that as the number of random variables increases, the entropy can grow but cannot decrease. This matches the intuition that as the number of random variables increases, the level of disorder also increases.

The significance of joint entropy is merely an extension of entropy. However, it’s essential to be familiar with its definition. The reason why it naturally leads to conditional entropy and appears differently from when discussing expectations of random variables is precisely because of $\log_{2}$ .