logo

Hypergeometric Distribution 📂Probability Distribution

Hypergeometric Distribution

Definition 1

For natural numbers $n, N, D \in \mathbb{N}$, the discrete probability distribution with the following probability mass function is called the hypergeometric distribution. $$ p(x) = {\frac{ \binom{D}{x} \binom{N - D}{n - x} }{ \binom{N}{n} }} \qquad , x \in 0, 1, \cdots , n $$ Here $\binom{N}{n} = _{N} C _{n}$ denotes the binomial coefficient.

Description

Commonly in the hypergeometric distribution, $N$ denotes the size of the entire population, $n$ denotes the sample size, and $D$ denotes the number of items of interest in the population. For example, suppose a list of length $N$ consists of $D$ items of type $1$ and $N - D$ items of type $0$. $$ \left( 0 , 0, 1, \cdots , 1, 0, 1 \right) $$ When $n$ items are drawn from this list without replacement (nonreplacement sampling), the sum, the random variable $X$, follows the hypergeometric distribution.

From a sampling perspective, the distribution contrasted with the hypergeometric is the binomial distribution. If we call “tossing a coin with probability of heads $p$ for $n$ independent trials and counting the number of heads” a binomial experiment, that corresponds to sampling with replacement because a head can occur again even if it already occurred. In contrast, in sampling without replacement an item once drawn cannot be drawn again, and thus a different probability distribution—the hypergeometric distribution—is required.

First, the number of ways to choose $n$ items from a list of length $N$ is $_{N} C _{n}$. The number of ways to choose $x$ items from $D$ items of type $1$ is $_{D} C _{x}$, and the number of ways to choose $n - x$ items from $N - D$ items of type $0$ is $_{N - D} C _{n - x}$. In other words, the probability that $x$ items of type $1$ are chosen from the $D$ and $n - x$ items of type $0$ are chosen from the $N - D$ is expressed directly by the hypergeometric distribution’s probability mass function as follows. $$ p(x) = {\frac{ 1 }{ \underbrace{\binom{N}{n}}_{\text{total}} }} \overbrace{\binom{D}{x}}^{\text{number of } 1} \cdot \underbrace{\binom{N - D}{n - x}}_{\text{number of } 0} $$

Mean and variance of the binomial distribution: $X \sim \text{Bin}(n,p)$ then $$ \begin{align*} E(X) =& np \\ \Var(X) =& np(1-p) \end{align*} $$

In fact, if we set $p = \frac{D}{N}$, comparing the mean and variance with those of the binomial distribution shows that everything else is the same except the variance is multiplied by a factor $\frac{N - n}{N - 1}$.

Basic properties

Mean and variance

Let $p = \frac{D}{N}$. If $X \sim \operatorname{HG}(N, D, n)$ then $$ \begin{align*} E \left( X \right) =& n \frac{D}{N} = n p \\ \Var \left( X \right) =& n {\frac{ D }{ N }} {\frac{ N - D }{ N }} {\frac{ N - n }{ N - 1 }} = np(1 - p) \frac{N - n}{N - 1} \end{align*} $$


  1. Hogg et al. (2018). Introduction to Mathematical Statistcs(8th Edition): p162. ↩︎