Test of Independence
Hypothesis Testing 1
In a multinomial experiment, assume we have categorical data with two characteristics and , where has categories and has categories obtained from independent trials. The following hypothesis test using the Pearson chi-square test statistic is called the test of independence.
- : The two categories are independent.
- : The two categories are dependent.
Test Statistic
The test statistic is the Pearson chi-square test statistic. Here, the frequency of the data belonging to the -th category of and the -th category of is called the observed frequency. The product of the number of samples belonging to the -th category of , , and the number of samples belonging to the -th category of , , divided by the sample size , , is called the expected frequency. This test statistic follows a chi-square distribution, where the degrees of freedom is the product of the number of categories of and , and , minus each, giving .
Explanation
Degrees of Freedom
For simplicity, let the ratio of the frequency of data belonging to the -th category of and the -th category of by the sample size be . Considering each marginal probability, we see that given these constraints, knowing probabilities from each category determines the remaining ones. Therefore, the degrees of freedom must be the product, .
X\Y | y1 | y2 |
---|---|---|
x1 | 0000 | 0000 |
x2 | 0000 | 0000 |
x3 | 0000 | 0000 |
For example, the above contingency table represents data where has categories and has categories, giving a degrees of freedom of .
Independence
Despite the seemingly complex explanations involving contingency tables and products of probabilities, a test of independence is essentially an extension of a goodness-of-fit test. In a goodness-of-fit test, the null hypothesis is that ’the given data was sampled to conform to theoretical probabilities’; in a test of independence, it means the theoretical probabilities simply manifest as the product of marginal probabilities for each category.
Independence of Events: Assume a probability space .
- For , denote as the conditional probability of with respect to .
- If , i.e., , then are said to be independent.
In practice, the independence of events is defined such that the probability of both events occurring is the product of the probabilities of each event occurring separately. In the context of a test of independence, if the null hypothesis is true, then should hold. Significant differences between the observed frequency and the expected frequency mean a large value for the chi-square test statistic—indicating that the theoretical probability assumed under true independence leads to a smaller expected value . Thus, one would need to question the null hypothesis itself.
Example
Mendel’s Law of Inheritance
(Continuing from the goodness-of-fit test post)
Mendel’s law of inheritance states that in terms of color, yellow is dominant and green is recessive, and in terms of shape, round is dominant and wrinkled is recessive. When purebred yellow round peas are crossed with purebred green wrinkled peas, the first generation exhibits the dominant traits (yellow round). The second generation appears in a 3:1 ratio for both yellow to green and round to wrinkled, resulting in combined ratio of 9:3:3:1 overall2.
Round Yellow | Wrinkled Yellow | Round Green | Wrinkled Green | |
---|---|---|---|---|
Observed Frequency | 773 | 231 | 238 | 59 |
Assuming the observed frequencies of the second-generation peas are as shown, 3 the total sample size is and the number of categories is . In a test of independence, a contingency table would be constructed as follows:
Color\Shape | Round | Wrinkled |
---|---|---|
Yellow | 773 | 231 |
Green | 238 | 59 |
Let’s check if the two characteristics, color and shape, are independent at a significance level of . In hypothesis testing, the degrees of freedom for a chi-square distribution is not but , as opposed to , so that it’s .
- : The experimental results show that color and shape are independent.
- : The experimental results show that color and shape are dependent.
The null and alternative hypotheses are as stated above, and the Pearson chi-square test statistic is calculated as For the given significance level , the upper bound of the acceptance region for the chi-square distribution with degrees of freedom is , and since , the null hypothesis cannot be rejected. This means that while we cannot support Mendel’s law of inheritance based on this data through a goodness-of-fit test, we can claim that the two characteristics are independent.
See Also
Mendenhall. (2012). Introduction to Probability and Statistics (13th Edition): p602. ↩︎
http://legacy.biotechlearn.org.nz/themes/mendel_and_inheritance/images/inheritance_of_multiple_traits_in_peas ↩︎
경북대학교 통계학과. (2008). 엑셀을 이용한 통계학: p269. ↩︎