logo

One-way Analysis of Variance 📂Statistical Test

One-way Analysis of Variance

Hypothesis Testing 1

In the experimental design where there are $k$ treatments, suppose that we obtain $n_{j}$ samples from each treatment for a total of $n = n_{1} + \cdots + n_{k}$ samples. Assume that the samples from the $j = 1 , \cdots , k$-th treatment are independent and randomly follow a normal distribution with $N \left( \mu_{j} , \sigma_{j}^{2} \right)$, and that the population variance of each normal distribution is the same, denoted by $\sigma^{2} = \sigma_{1}^{2} = \cdots = \sigma_{k}^{2}$. In one-way ANOVA for comparing the population means among groups, the hypothesis testing is as follows.

  • $H_{0}$: $\mu_{1} = \cdots = \mu_{k}$
  • $H_{1}$: At least one $\mu_{j}$ is different from the other population means.

Test Statistic

Suppose we have an ANOVA table under a completely randomized design.

SourcedfSSMSF
Treatments$k-1$SSTMSTMST/MSE
Error$n-k$SSEMSE
Total$n-1$TSS

The test statistic is as follows. $$ F = {\frac{ \text{MST} }{ \text{MSE} }} = {\frac{ \text{SST} / (k - 1) }{ \text{SSE} / (n - k) }} $$ This test statistic follows an F-distribution $F \left( k - 1 , n - k \right)$ with degrees of freedom $(k-1), (n-k)$ assuming the null hypothesis is true.

Explanation

Let the treatment means be $\bar{x}_{j} := \sum_{i} x_{ij} / n_{j}$ and the overall mean be $\bar{x} := \sum_{ij} x_{ij} / n$. $$ \begin{align*} \text{SST} =& \sum_{j=1}^{k} n_{j} \left( \bar{x}_{j} - \bar{x} \right)^{2} \\ \text{SSE} =& \left( n_{1} - 1 \right) s_{1}^{2} + \cdots + \left( n_{k} - 1 \right) s_{k}^{2} \\ \text{MST} =& {\frac{ \text{SST} }{ k - 1 }} \\ \text{MSE} =& {\frac{ \text{SSE} }{ n - k }} \\ F =& {\frac{ \text{MST} }{ \text{MSE} }} = {\frac{ \text{SST}/ (k - 1) }{ \text{SSE} / (n - k) }} \end{align*} $$ For the derivation of the test statistic, refer to the F-test in ANOVA.

Example

One-way ANOVA is used to determine if there are differences in population means due to treatments. Let’s analyze the heights of three girl groups debuted by Korea’s K-POP agency STARSHIP under a completely randomized design. The null hypothesis is that the mean heights of the groups are the same, while the alternative hypothesis is that at least one group’s mean height is different.

  • SISTAR: {Bora: 164cm, Hyorin: 163cm, Soyou: 168cm, Dasom: 167cm}
  • WJSN: {Seola: 165cm, Bona: 163cm, Exy: 166cm, Soobin: 156cm, Luda: 158cm, Dawon: 167cm, Eunseo: 170cm, Yeoreum: 161cm, Dayoung: 161cm, Yeonjung: 166cm}
  • IVE: {Yujin: 173cm, Gaeul: 164cm, Rei: 169cm, Wonyoung: 173cm, Liz: 171cm, Leeseo: 165cm}
  • The heights of the group KiiiKiii, debuted in 2025, are undisclosed.
SISTARWJSNIVE
164165173
163163164
168166169
167156173
158171
167165
170
161
161
166

The overall average height is 165.5cm, and the group averages are SISTAR 165.5cm, WJSN 163.3cm, and IVE 169.2cm.

Although one can easily see that IVE has the highest average height, to determine if this difference is statistically significant, we need to complete the ANOVA table and conduct an F-test. The total number of members is the sample size $n = 4 + 10 + 6 = 20$, and the number of groups is $k = 3$.

SourcedfSSMSF
Treatments$2$SSTSST/$2$MST/MSE
Error$17$SSESSE/$17$
Total$19$TSS

$$ \begin{align*} \text{SST} =& 4 \cdot (165.5 - 165.5)^{2} + 10 \cdot (163.3 - 165.5)^{2} + 6 \cdot (169.2 - 165.5)^{2} &= 129.1 \\ \text{SSE} =& 3 \cdot 17 + 9 \cdot 168.1 + 5 \cdot 76.8 &= 261.9 \\ F =& {\frac{ 129.1 / 2 }{ 261.9 / 17 }} = {\frac{ 64.5 }{ 15.4 }} &= 4.19 \end{align*} $$

If the significance level is $\alpha = 5 \%$, the lower bound of the critical region is $F_{2,17} (0.05) = 3.59$, which is $F = 4.19 > 3.59 = F_{2,17} (0.05)$, thus the null hypothesis can be rejected. In other words, at least one group’s average height is different from the others.

Verification

These results can also be reproduced using Excel.

alt text

See Also

Experimental DesignParametric MethodsNon-parametric Methods
Randomized DesignOne-way ANOVAKruskal–Wallis $H$ Test
Randomized Block DesignTwo-way ANOVAFriedman $F_{r}$ Test

  1. Mendenhall. (2012). Introduction to Probability and Statistics (13th Edition): p455. ↩︎