logo

Categorical Distribution 📂Probability Distribution

Categorical Distribution

Definition1

Given a sample space with k(2)k (\ge 2) categories, Ω={1,2,,k}\Omega = \left\{ 1, 2, \dots, k \right\}, and a probability vector p=(p1,,pk)\mathbf{p} = (p_{1}, \dots, p_{k}), the discrete probability distribution with the following probability mass function is called the Categorical distribution.

p(x=i)=pi,x{1,2,,k} p(x = i) = p_{i}, \qquad x \in \left\{ 1, 2, \dots, k \right\}

Description

The probability of each of the kk categories occurring is represented by p=(p1,,pk)\mathbf{p} = (p_{1}, \dots, p_{k}). Therefore, p\mathbf{p} must satisfy the following condition.

i=1kpi=1,pi0 \sum_{i=1}^{k} p_{i} = 1, \qquad p_{i} \ge 0

If the Bernoulli distribution is compared to “flipping a coin once,” the Categorical distribution can be compared to “rolling a die once.”

Ω={\Omega = \Big\{ ,,,,, \includegraphics[height=2em]{https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FrbQJE%2FbtsMXyblpbo%2FZMTO1PeHbafLH3g97P0q41%2Fimg.png}, \includegraphics[height=2em]{https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FxAbA3%2FbtsMW8KMCtK%2FodmS8gakkTAp7dP2Lk6JO0%2Fimg.png}, \includegraphics[height=2em]{https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FcgXAjB%2FbtsMYNLR5E3%2FIED729aUwdNa093xix0sz1%2Fimg.png}, \includegraphics[height=2em]{https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FdITnZE%2FbtsMX1qB6Eo%2FLh1bDQ0SkBl4k0PVHtjZDK%2Fimg.png}, \includegraphics[height=2em]{https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FnR6Ka%2FbtsMW7kMTqC%2FM4VQF9U2wgbCfcRKfE1KdK%2Fimg.png}, \includegraphics[height=2em]{https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fcsx5aP%2FbtsMXPKIgHh%2FbnXAmN8iHjFOwzhyjtcdTK%2Fimg.png} }\Big\}

p=(16,16,16,16,16,16) \mathbf{p} = \left( \dfrac{1}{6}, \dfrac{1}{6}, \dfrac{1}{6}, \dfrac{1}{6}, \dfrac{1}{6}, \dfrac{1}{6} \right)

The following notation is used.

Cat(k;p1,,pk)=Cat(k;p) \operatorname{Cat}(k; p_{1}, \dots, p_{k}) = \operatorname{Cat}(k; \mathbf{p})

The Categorical distribution can be considered a generalization of categories from the Bernoulli distribution to kk categories. Further generalizing to nn trials leads to the Multinomial distribution.

Category Trials
11 timesnn times
22 itemsBernoulli distributionBinomial distribution
kk itemsCategorical distributionMultinomial distribution

The probability mass function can also be expressed as follows.

p(j)=i=1kpiδji=i=1kδjipi,j{1,2,,k} p(j) = \prod\limits_{i=1}^{k} p_{i}^{\delta_{ji}} = \sum\limits_{i=1}^{k} \delta_{ji} p_{i}, \qquad j \in \left\{ 1, 2, \dots, k \right\}

δji\delta_{ji} refers to the Kronecker delta.

Meanwhile, the sample space can be viewed as the standard basis of Euclidean space, and each realization can be considered as a one-hot vector. In this case, with a random vector x=(x1,,xk)\mathbf{x} = (x_{1}, \dots, x_{k}) satisfying the probability mass function, the Categorical distribution can be expressed as Cat(x;p)\operatorname{Cat}(\mathbf{x}; \mathbf{p}).

xi{0,1},i=1kxi=1 x_{i} \in \left\{ 0, 1 \right\}, \qquad \sum_{i=1}^{k} x_{i} = 1

p(x)=p(x1,,xk)=i=1kpixi p(\mathbf{x}) = p(x_{1}, \dots, x_{k}) = \prod\limits_{i=1}^{k} p_{i}^{x_{i}}