Derivation of the Finite Population Correction Factor
Formula
Given a population variance $\sigma^{2}$ and a random sample $X_{1} , \cdots , X_{N}$, the variance of the sample mean $\overline{X}_{N}$ for the full sample is $\sigma^{2} / N$. For a sample of size $n \le N$ drawn by sampling without replacement, the sample mean has variance
$\text{s.e.} \left( \overline{X}_{n} \right) = \sigma^{2} / n$
and the factor $\text{FPC} = \left( N - n \right) / \left( N - 1 \right)$ that multiplies the square term of the standard error $\text{s.e.} \left( \overline{X}_{n} \right) = \sigma^{2} / n$ is called the finite population correction factor .
$$ \Var \left( \overline{X}_{n} \right) = \text{s.e.} \left( \overline{X}_{n} \right) \cdot \text{FPC} = {\frac{ \sigma^{2} }{ n }} \cdot {\frac{ N - n }{ N - 1 }} $$
Explanation
The FPC is the factor used when sampling without replacement from a finite population. Numerically it lies between $0$ and $1$ and can be viewed as adjusting $\overline{X}_{n}$’s variance so that it is not excessively large.
This formula is used when deriving the test statistic for the Kruskal–Wallis $H$ test (see ../2693).
Derivation
There is a more algebraically tidy proof1, but to grasp the intuition it is more helpful for study to see a derivation via the hypergeometric distribution2.
Mean and variance of the binomial distribution: $X \sim \text{Bin}(n,p)$ then $$ \begin{align*} E(X) =& np \\ \Var(X) =& np(1-p) \end{align*} $$
Let the mean and variance of the hypergeometric distribution be $p = \frac{D}{N}$. If $X \sim \operatorname{HG}(N, D, n)$ then $$ \begin{align*} E \left( X \right) =& n p \\ \Var \left( X \right) =& np(1 - p) \frac{N - n}{N - 1} \end{align*} $$
Suppose that among the entire population only $n$ items are of interest and these are marked in a particular color. The distribution of the number of marked items drawn depends on the sampling scheme: if the $k$-th draw is marked the probability is $1$, and if it is not marked the probability is $0$. Consider the random variables $Y_{k} : \Omega \to \left\{ 0 , 1 \right\}$ for each draw; their sum $Y = \sum_{k=1}^{N} Y_{k}$ follows a binomial distribution under sampling with replacement, and a hypergeometric distribution under sampling without replacement.
$$ \overline{X}_{n} = {\frac{ 1 }{ n }} \sum_{k=1}^{N} X_{k} Y_{k} $$
Comparing the variance of the binomial and that of the hypergeometric, the only difference is whether the FPC $(N-n)/(N-1)$ is multiplied. The fact that the binomial variance is $n p ( p - 1 )$ and the hypergeometric variance is $n p ( p - 1 ) (N - n) / (N - 1)$ is the result when every draw is $X_{k} = 1$; if this is replaced by the original $X_{k}$, then only the component multiplied by the FPC changes to $\sigma^{2} / n$.
■
P VN, Explanation of finite population correction factor?, URL (version: 2022-10-06): https://stats.stackexchange.com/q/514259 ↩︎
chl, Explanation of finite population correction factor?, URL (version: 2010-12-05): https://stats.stackexchange.com/q/5164 ↩︎