Categorical Distribution 📂Probability Distribution

Categorical Distribution

Definition¹

Given a sample space with $k (\ge 2)$ categories, $\Omega = \left\{ 1, 2, \dots, k \right\}$ , and a probability vector $\mathbf{p} = (p_{1}, \dots, p_{k})$ , the discrete probability distribution with the following probability mass function is called the Categorical distribution.

$p(x = i) = p_{i}, \qquad x \in \left\{ 1, 2, \dots, k \right\}$

Description

The probability of each of the $k$ categories occurring is represented by $\mathbf{p} = (p_{1}, \dots, p_{k})$ . Therefore, $\mathbf{p}$ must satisfy the following condition.

$\sum_{i=1}^{k} p_{i} = 1, \qquad p_{i} \ge 0$

If the Bernoulli distribution is compared to “flipping a coin once,” the Categorical distribution can be compared to “rolling a die once.”

$\Omega = \Big\{$ $\includegraphics[height=2em]{https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FrbQJE%2FbtsMXyblpbo%2FZMTO1PeHbafLH3g97P0q41%2Fimg.png}, \includegraphics[height=2em]{https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FxAbA3%2FbtsMW8KMCtK%2FodmS8gakkTAp7dP2Lk6JO0%2Fimg.png}, \includegraphics[height=2em]{https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FcgXAjB%2FbtsMYNLR5E3%2FIED729aUwdNa093xix0sz1%2Fimg.png}, \includegraphics[height=2em]{https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FdITnZE%2FbtsMX1qB6Eo%2FLh1bDQ0SkBl4k0PVHtjZDK%2Fimg.png}, \includegraphics[height=2em]{https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FnR6Ka%2FbtsMW7kMTqC%2FM4VQF9U2wgbCfcRKfE1KdK%2Fimg.png}, \includegraphics[height=2em]{https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fcsx5aP%2FbtsMXPKIgHh%2FbnXAmN8iHjFOwzhyjtcdTK%2Fimg.png}$ $\Big\}$

$\mathbf{p} = \left( \dfrac{1}{6}, \dfrac{1}{6}, \dfrac{1}{6}, \dfrac{1}{6}, \dfrac{1}{6}, \dfrac{1}{6} \right)$

The following notation is used.

$\operatorname{Cat}(k; p_{1}, \dots, p_{k}) = \operatorname{Cat}(k; \mathbf{p})$

The Categorical distribution can be considered a generalization of categories from the Bernoulli distribution to $k$ categories. Further generalizing to $n$ trials leads to the Multinomial distribution.

Category Trials	$1$ times	$n$ times
$2$ items	Bernoulli distribution	Binomial distribution
$k$ items	Categorical distribution	Multinomial distribution

The probability mass function can also be expressed as follows.

$p(j) = \prod\limits_{i=1}^{k} p_{i}^{\delta_{ji}} = \sum\limits_{i=1}^{k} \delta_{ji} p_{i}, \qquad j \in \left\{ 1, 2, \dots, k \right\}$

$\delta_{ji}$ refers to the Kronecker delta.

Meanwhile, the sample space can be viewed as the standard basis of Euclidean space, and each realization can be considered as a one-hot vector. In this case, with a random vector $\mathbf{x} = (x_{1}, \dots, x_{k})$ satisfying the probability mass function, the Categorical distribution can be expressed as $\operatorname{Cat}(\mathbf{x}; \mathbf{p})$ .

$x_{i} \in \left\{ 0, 1 \right\}, \qquad \sum_{i=1}^{k} x_{i} = 1$

$p(\mathbf{x}) = p(x_{1}, \dots, x_{k}) = \prod\limits_{i=1}^{k} p_{i}^{x_{i}}$

https://en.wikipedia.org/wiki/Categorical_distribution ↩︎

Categorical Distribution

Definition1

Description

Definition¹