Test of Homogeneity of Population 📂Statistical Test

Test of Homogeneity of Population

Hypothesis Testing ¹

Assume that the categories in multinomial experiments are obtained from $R$ populations, each with $C$ categorical data. When the probability that an element in the $i = 1 , \cdots , R$-th population falls into the $j = 1 , \cdots , C$-th category is $p_{ij}$, denote the proportion vector of the $i$-th population as $\mathbf{p}_{i} = \left( p_{i1} , \cdots , p_{iC} \right)$. The following hypothesis testing using the Pearson chi-square test statistic is called the test of independence.

$H_{0}$: The proportions of elements are the same in all populations: $$\mathbf{p}_{1} = \cdots = \mathbf{p}_{R}$$
$H_{1}$: There is at least one population where the proportions of elements are not the same.

Test Statistic

The test statistic uses the Pearson chi-square test statistic. $$ \mathcal{X}^{2} = \sum_{i=1}^{R} \sum_{j=1}^{C} {{ \left( O_{ij} - E_{ij} \right)^{2} } \over { E_{ij} }} $$ Here, the frequency $O_{ij}$ of data belonging to the $j$-th category in the $i$-th population is called the observed frequency. The product of the sample size $r_{i}$ of the $i$-th population and the sample size $c_{j}$ of the $j$-th category multiplied by $r_{i} c_{j}$ and divided by the sample size $n$ is the expected frequency. This test statistic follows a chi-square distribution where the degrees of freedom are the product of the differences of the number of populations $R$ and categories $C$ minus $1$, giving $(R-1)(C-1)$.

Explanation

Degrees of Freedom

$$ \begin{align*} p_{i} = \sum_{j1=1}^{C} p_{ij} = {\frac{ r_{i} }{ n }} \implies & \sum_{i=1}^{R} p_{i} = 1 \\ p_{j} = \sum_{i=1}^{R} p_{ij} = {\frac{ c_{j} }{ n }} \implies & \sum_{j=1}^{C} p_{j} = 1 \end{align*} $$ Considering each marginal probability, the constraints imply that knowing $R-1$ probabilities in each population and $C - 1$ in each category determines the rest. Hence, the degrees of freedom should be their product, $(R - 1) (C - 1)$.

X\Y	y1	y2
x1	0000	0000
x2	0000	0000
x3	0000	0000

For example, the above contingency table shows data from $R = 3$ populations and $C = 2$ categories, with degrees of freedom $(3-1)(2-1) = 2$.

Homogeneity ²

In fact, the process of testing for homogeneity is exactly the same as testing for independence and can be seen as an extension of it, except the null and alternative hypotheses and their interpretations differ. In testing independence, there are two categories where row sums $r_{i}$ and column sums $c_{j}$ are calculated randomly based on their relation. However, in testing homogeneity, the sample size $r_{i}$ for each population is fixed, and the vector of proportions $\mathbf{p}_{i}$ determines $O_{ij}$ randomly. The consistency of this vector of proportions $\mathbf{p}_{i}$ regardless of $i$ indicates no specific difference by population, ultimately corresponding to independence between population choice and category proportions.

For a simple example, if we want to determine if there is a grade difference based on gender in a university course, we can create a contingency table as follows to conduct a homogeneity test.

Gender	A	B	C	D	F	Total
Male	29	52	26	17	6	130
Female	6	75	98	62	9	250

$H_{0}$: Grade ratios are the same regardless of gender.
$H_{1}$: Grade ratios vary regardless of gender.

As seen, because there are five grade bands from A to F, it is difficult to interpret the difference by fitting it merely into a ratio like 13:25. To confirm if the grade ratios of males and females are statistically the same at the significance level $\alpha = 0.05$, we compare $R = 2$ against $R = 5$, with the degrees of freedom $(2-1)(5-1) = 4$ from the chi-square distribution. The actual calculation yields $\mathcal{X}^{2} \approx 54.889$, which greatly exceeds $\chi_{1 - \alpha}^{2} (4)$, indicating a grade difference by gender at the significance level $5\%$.

It is important to note that such statistical interpretation should only conclude ’not the same.’ While it is apparent in this hypothetical example that male grades are better, such remarks like ‘Men perform better here’ arise not from the homogeneity test but from the intuition like ‘Maybe A is the best and F is the worst.’ Even if such information is given, the null and alternative hypotheses in the homogeneity test did not mention superiority, which is consistent with the actual mathematical background.