Let a random vector composed of n∈N and k∈N counts of random variables be denoted as (X1,⋯,Xk).
i=1∑kXi=n&i=1∑kpi=1
For p=(p1,⋯,pk)∈[0,1]k that satisfies this, a multivariate probability distributionMk(n,p) with the following probability mass function is called the Multinomial Distribution.
p(x1,⋯,xk)=x1!⋯xk!n!p1x1⋯pkxk,x1,⋯,xk∈N0
To interpret the definition as it is, (X1,⋯,Xk) is a random vector indicating how many elements are actually in each category when n elements have a probability pi of falling into the i category among k categories, having a probability mass function of
p(x1,⋯,xk)==P(X1=x1,⋯,Xk=xk)x1!⋯xk!n!p1x1⋯pkxk
Especially, when k=2, it becomes a generalization of the binomial distribution itself.
Basic Properties
Mean and Covariance
[1]: If X:=(X1,⋯,Xk)∼Mk(n,p), the expected value of the i component Xi is
E(Xi)=npi
and the covariance matrix is as follows.
Cov(X)=np1(1−p1)−p2p1⋮−pkp1−p1p2p2(1−p2)⋮−pkp2⋯⋯⋱⋯−p1pk−p2p2⋮pk(1−pk)
Theorem
Lumping Property
For i=j, Xi+Xj follows the binomial distributionBin(n,pi+pj).
Xi+Xj∼Bin(n,pi+pj)
This is called the Lumping Property.
Proof
Mean
Looking at each component Xi alone, it’s essentially a binomial distribution regarding whether it falls into category i with a probability pi or not, hence Xi∼Bin(n,pi), and its expected value is E(Xi)=npi.
In the case of n=1, that is, when considering only a single trial, Xi+Xj is exactly 1 when the outcome of that trial belongs to either the i or j category, and follows a Bernoulli distribution Bin(1,pi+pj) which is 0 in all other cases.
Since n trials are conducted independently, the following is obtained according to the addition of binomial distributions.
Xi+Xj∼Bin(j=1∑n1,pi+pj)=Bin(n,pi+pj)