logo

One-way Analysis of Variance 📂Statistical Test

One-way Analysis of Variance

Hypothesis Testing 1

In the experimental design where there are kk treatments, suppose that we obtain njn_{j} samples from each treatment for a total of n=n1++nkn = n_{1} + \cdots + n_{k} samples. Assume that the samples from the j=1,,kj = 1 , \cdots , k-th treatment are independent and randomly follow a normal distribution with N(μj,σj2)N \left( \mu_{j} , \sigma_{j}^{2} \right), and that the population variance of each normal distribution is the same, denoted by σ2=σ12==σk2\sigma^{2} = \sigma_{1}^{2} = \cdots = \sigma_{k}^{2}. In one-way ANOVA for comparing the population means among groups, the hypothesis testing is as follows.

  • H0H_{0}: μ1==μk\mu_{1} = \cdots = \mu_{k}
  • H1H_{1}: At least one μj\mu_{j} is different from the other population means.

Test Statistic

Suppose we have an ANOVA table under a completely randomized design.

SourcedfSSMSF
Treatmentsk1k-1SSTMSTMST/MSE
Errornkn-kSSEMSE
Totaln1n-1TSS

The test statistic is as follows. F=MSTMSE=SST/(k1)SSE/(nk) F = {\frac{ \text{MST} }{ \text{MSE} }} = {\frac{ \text{SST} / (k - 1) }{ \text{SSE} / (n - k) }} This test statistic follows an F-distribution F(k1,nk)F \left( k - 1 , n - k \right) with degrees of freedom (k1),(nk)(k-1), (n-k) assuming the null hypothesis is true.

Explanation

Let the treatment means be xˉj:=ixij/nj\bar{x}_{j} := \sum_{i} x_{ij} / n_{j} and the overall mean be xˉ:=ijxij/n\bar{x} := \sum_{ij} x_{ij} / n. SST=j=1knj(xˉjxˉ)2SSE=(n11)s12++(nk1)sk2MST=SSTk1MSE=SSEnkF=MSTMSE=SST/(k1)SSE/(nk) \begin{align*} \text{SST} =& \sum_{j=1}^{k} n_{j} \left( \bar{x}_{j} - \bar{x} \right)^{2} \\ \text{SSE} =& \left( n_{1} - 1 \right) s_{1}^{2} + \cdots + \left( n_{k} - 1 \right) s_{k}^{2} \\ \text{MST} =& {\frac{ \text{SST} }{ k - 1 }} \\ \text{MSE} =& {\frac{ \text{SSE} }{ n - k }} \\ F =& {\frac{ \text{MST} }{ \text{MSE} }} = {\frac{ \text{SST}/ (k - 1) }{ \text{SSE} / (n - k) }} \end{align*} For the derivation of the test statistic, refer to the F-test in ANOVA.

Example

One-way ANOVA is used to determine if there are differences in population means due to treatments. Let’s analyze the heights of three girl groups debuted by Korea’s K-POP agency STARSHIP under a completely randomized design. The null hypothesis is that the mean heights of the groups are the same, while the alternative hypothesis is that at least one group’s mean height is different.

  • SISTAR: {Bora: 164cm, Hyorin: 163cm, Soyou: 168cm, Dasom: 167cm}
  • WJSN: {Seola: 165cm, Bona: 163cm, Exy: 166cm, Soobin: 156cm, Luda: 158cm, Dawon: 167cm, Eunseo: 170cm, Yeoreum: 161cm, Dayoung: 161cm, Yeonjung: 166cm}
  • IVE: {Yujin: 173cm, Gaeul: 164cm, Rei: 169cm, Wonyoung: 173cm, Liz: 171cm, Leeseo: 165cm}
  • The heights of the group KiiiKiii, debuted in 2025, are undisclosed.
SISTARWJSNIVE
164165173
163163164
168166169
167156173
158171
167165
170
161
161
166

The overall average height is 165.5cm, and the group averages are SISTAR 165.5cm, WJSN 163.3cm, and IVE 169.2cm.

Although one can easily see that IVE has the highest average height, to determine if this difference is statistically significant, we need to complete the ANOVA table and conduct an F-test. The total number of members is the sample size n=4+10+6=20n = 4 + 10 + 6 = 20, and the number of groups is k=3k = 3.

SourcedfSSMSF
Treatments22SSTSST/22MST/MSE
Error1717SSESSE/1717
Total1919TSS

SST=4(165.5165.5)2+10(163.3165.5)2+6(169.2165.5)2=129.1SSE=317+9168.1+576.8=261.9F=129.1/2261.9/17=64.515.4=4.19 \begin{align*} \text{SST} =& 4 \cdot (165.5 - 165.5)^{2} + 10 \cdot (163.3 - 165.5)^{2} + 6 \cdot (169.2 - 165.5)^{2} &= 129.1 \\ \text{SSE} =& 3 \cdot 17 + 9 \cdot 168.1 + 5 \cdot 76.8 &= 261.9 \\ F =& {\frac{ 129.1 / 2 }{ 261.9 / 17 }} = {\frac{ 64.5 }{ 15.4 }} &= 4.19 \end{align*}

If the significance level is α=5%\alpha = 5 \%, the lower bound of the critical region is F2,17(0.05)=3.59F_{2,17} (0.05) = 3.59, which is F=4.19>3.59=F2,17(0.05)F = 4.19 > 3.59 = F_{2,17} (0.05), thus the null hypothesis can be rejected. In other words, at least one group’s average height is different from the others.

Verification

These results can also be reproduced using Excel.

alt text

See Also

Experimental DesignParametric MethodsNon-parametric Methods
Randomized DesignOne-way ANOVAKruskal–Wallis HH Test
Randomized Block DesignTwo-way ANOVAFriedman FrF_{r} Test

  1. Mendenhall. (2012). Introduction to Probability and Statistics (13th Edition): p455. ↩︎