Assume that there are k treatments in experimental design, and that in each treatment there are nj samples, totaling n=n1+⋯+nk samples. Assume that each sample in the j=1,⋯,kth treatment is independently and randomly drawn from a normal distributionN(μj,σj2), and their population variance is equal such that σ2=σ12=⋯=σk2. In analysis of variance designed to compare population means, the process of hypothesis testing is as follows:
H0: μ1=⋯=μk
H1: At least one μj is different from the others.
The test statistic is as follows:
F=MSEMST=SSE/(n−k)SST/(k−1)
This test statistic under the null hypothesis assumes it follows an F-distributionF(k−1,n−k) with degrees of freedom (k−1),(n−k).
Explanation
Whether it’s a one-way ANOVA or a two-way ANOVA, the mathematical derivations are very similar, differing only in the presence of blocks. For convenience, in this post, I’ll focus only on the theoretical background of one-way ANOVA under a completely randomized design.
Let’s explore how the results derived from the ANOVA table lead to hypothesis testing in the process of deriving the test statistic. Given that it requires a lot of prior knowledge in linear algebra and mathematical statistics, undergraduates might skip it, but graduate students are encouraged to challenge themselves.
Let the treatment mean be xˉj:=∑ixij/nj, and the overall mean be xˉ:=∑ijxij/n.
SST=SSE=MST=MSE=F=j=1∑knj(xˉj−xˉ)2(n1−1)s12+⋯+(nk−1)sk2k−1SSTn−kSSEMSEMST=SSE/(n−k)SST/(k−1)
The ANOVA table for one-way ANOVA appears as above. Assuming the null hypothesis is true, for some μ, we can set μ=μ1=⋯=μk, and define the z-scoreZij as follows:
Zij:=(σxij−μ)2∼N(0,1)
The sum of squares for Zij can be expanded as follows:
===j=1∑ki=1∑njZij2j=1∑ki=1∑nj(σxij−μ)2σ21j=1∑ki=1∑nj[(xij−xˉj)+(xˉj−xˉ)+(xˉ−μ)]2σ21j=1∑ki=1∑nj[(xij−xˉj)2+(xˉj−xˉ)2+(xˉ−μ)2]+σ22j=1∑ki=1∑nj[(xij−xˉj)(xˉj−xˉ)+(xˉj−xˉ)(xˉ−μ)+(xˉ−μ)(xij−xˉj)]
From which the last line is
==i=1∑nj(xij−xˉj)i=1∑njxij−njnj1i=1∑njxij0
thus all terms reduce to 0 and vanish, and the sum of squares for Zij can be expressed as:
j=1∑ki=1∑njZij2=j=1∑ki=1∑nj(σxij−xˉj)2+j=1∑ki=1∑nj(σxˉj−xˉ)2+j=1∑ki=1∑nj(σxˉ−μ)2
Now, label the three sigmas on the right side as Q1,Q2,Q3 in order.
j=1∑ki=1∑njZij2=Q1+Q2+Q3
Let’s define three symmetric matricesA1,A2,A3 using the identity matrixIn of size n×n, an all-ones matrixJn, and a block diagonal matrixdiag where every element is 1 as follows:
A1:=A2:=A3:=In−diag(n11Jn1,⋯,nk1Jnk)diag(n11Jn1,⋯,nk1Jnk)−n1Jnn1Jn
Note that each block of the block diagonal matrix sets the stage for the (nj−1)sj2 of each treatment j. From their definitions, the sum of these three matrices is A1+A2+A3=In, and it is not hard to see that starting from the matrix’s rank being 1, the rank of these three matrices is:
rankA1=rankA2=rankA3=n−kk−11
Let’s define the vector Z∈Rn×1 for nj dimensional vectors (xj1,⋯,xjnj)∈Rnj×1 as follows:
Z:=z1⋮zk=(x11,⋯,x1n1)⋮(x1k,⋯,xknk)=x11⋮xknk
Thus, based on the vector notation, the sum of squares in Zij can be represented as follows:
j=1∑ki=1∑njZij2==Q1+Q2+Q3ZTA1Z+ZTA2Z+ZTA3Z
Cochran’s Theorem: Let the sampleX=(X1,⋯,Xn) be following an iidnormal distribution as X1,⋯,Xn∼iidN(0,σ2). Given a symmetric matrixA1,⋯,Ak∈Rn×n having a rankrj, if a random variableQ1,⋯,Qk is expressed in the form of a quadratic formQi:=XTAiX and the sum of squares of samples equals ∑i=1nXi2=∑j=1kQj, then the following is true:
∀j,σ2Qj∼χ2(rj)∧∀j1=j2,Qj1⊥Qj2⟺j=1∑krj=n
In other words, the fact that Qj are independent and follow a chi-squared distributionχ2(rj) is equivalent to the sum of the ranks rj being equal to the sample size n.
Each component in Z is independently observed from a standard normal distribution N(0,12) and given ∑l=13rankAl=n, so according to Cochran’s theorem, Q1 and Q2 follow independent chi-squared distribution as follows:
Q1=12Q1∼Q2=12Q2∼χ2(rankA1)=χ2(n−k)χ2(rankA2)=χ2(k−1)
Meanwhile, these Q1 and Q2 can be represented for SSE and SST as follows:
Q1=Q2=j=1∑ki=1∑nj(σxij−xˉj)2=j=1∑ki=1∑nj(σxˉj−xˉ)2=σ21[(n1−1)s12+⋯+(nk−1)sk2]=σ21j=1∑knj(xˉj−xˉ)2=σ21SSEσ21SST
F====∼MSEMSTSSE/(n−k)SST/(k−1)(SSE/σ2)/(n−k)(SST/σ2)/(k−1)Q1/(n−k)Q2/(k−1)F(k−1,n−k)
Hence, it is confirmed that the test statistic F under the assumption that the null hypothesis is true follows an F-distribution.