logo

Test of Homogeneity of Population 📂Statistical Test

Test of Homogeneity of Population

Hypothesis Testing 1

Assume that the categories in multinomial experiments are obtained from RR populations, each with CC categorical data. When the probability that an element in the i=1,,Ri = 1 , \cdots , R-th population falls into the j=1,,Cj = 1 , \cdots , C-th category is pijp_{ij}, denote the proportion vector of the ii-th population as pi=(pi1,,piC)\mathbf{p}_{i} = \left( p_{i1} , \cdots , p_{iC} \right). The following hypothesis testing using the Pearson chi-square test statistic is called the test of independence.

  • H0H_{0}: The proportions of elements are the same in all populations: p1==pR\mathbf{p}_{1} = \cdots = \mathbf{p}_{R}
  • H1H_{1}: There is at least one population where the proportions of elements are not the same.

Test Statistic

The test statistic uses the Pearson chi-square test statistic. X2=i=1Rj=1C(OijEij)2Eij \mathcal{X}^{2} = \sum_{i=1}^{R} \sum_{j=1}^{C} {{ \left( O_{ij} - E_{ij} \right)^{2} } \over { E_{ij} }} Here, the frequency OijO_{ij} of data belonging to the jj-th category in the ii-th population is called the observed frequency. The product of the sample size rir_{i} of the ii-th population and the sample size cjc_{j} of the jj-th category multiplied by ricjr_{i} c_{j} and divided by the sample size nn is the expected frequency. This test statistic follows a chi-square distribution where the degrees of freedom are the product of the differences of the number of populations RR and categories CC minus 11, giving (R1)(C1)(R-1)(C-1).

Explanation

Degrees of Freedom

pi=j1=1Cpij=rin    i=1Rpi=1pj=i=1Rpij=cjn    j=1Cpj=1 \begin{align*} p_{i} = \sum_{j1=1}^{C} p_{ij} = {\frac{ r_{i} }{ n }} \implies & \sum_{i=1}^{R} p_{i} = 1 \\ p_{j} = \sum_{i=1}^{R} p_{ij} = {\frac{ c_{j} }{ n }} \implies & \sum_{j=1}^{C} p_{j} = 1 \end{align*} Considering each marginal probability, the constraints imply that knowing R1R-1 probabilities in each population and C1C - 1 in each category determines the rest. Hence, the degrees of freedom should be their product, (R1)(C1)(R - 1) (C - 1).

X\Yy1y2
x100000000
x200000000
x300000000

For example, the above contingency table shows data from R=3R = 3 populations and C=2C = 2 categories, with degrees of freedom (31)(21)=2(3-1)(2-1) = 2.

Homogeneity 2

In fact, the process of testing for homogeneity is exactly the same as testing for independence and can be seen as an extension of it, except the null and alternative hypotheses and their interpretations differ. In testing independence, there are two categories where row sums rir_{i} and column sums cjc_{j} are calculated randomly based on their relation. However, in testing homogeneity, the sample size rir_{i} for each population is fixed, and the vector of proportions pi\mathbf{p}_{i} determines OijO_{ij} randomly. The consistency of this vector of proportions pi\mathbf{p}_{i} regardless of ii indicates no specific difference by population, ultimately corresponding to independence between population choice and category proportions.

For a simple example, if we want to determine if there is a grade difference based on gender in a university course, we can create a contingency table as follows to conduct a homogeneity test.

GenderABCDFTotal
Male295226176130
Female67598629250
  • H0H_{0}: Grade ratios are the same regardless of gender.
  • H1H_{1}: Grade ratios vary regardless of gender.

As seen, because there are five grade bands from A to F, it is difficult to interpret the difference by fitting it merely into a ratio like 13:25. To confirm if the grade ratios of males and females are statistically the same at the significance level α=0.05\alpha = 0.05, we compare R=2R = 2 against R=5R = 5, with the degrees of freedom (21)(51)=4(2-1)(5-1) = 4 from the chi-square distribution. The actual calculation yields X254.889\mathcal{X}^{2} \approx 54.889, which greatly exceeds χ1α2(4)\chi_{1 - \alpha}^{2} (4), indicating a grade difference by gender at the significance level 5%5\%.

It is important to note that such statistical interpretation should only conclude ’not the same.’ While it is apparent in this hypothetical example that male grades are better, such remarks like ‘Men perform better here’ arise not from the homogeneity test but from the intuition like ‘Maybe A is the best and F is the worst.’ Even if such information is given, the null and alternative hypotheses in the homogeneity test did not mention superiority, which is consistent with the actual mathematical background.

See Also


  1. Mendenhall. (2012). Introduction to Probability and Statistics (13th Edition): p610. ↩︎

  2. 경북대학교 통계학과. (2008). 엑셀을 이용한 통계학: p274. ↩︎