Assume that two independent populations, each following a normal distribution N(μ1,σ12) and N(μ2,σ22) with σ12=σ2=σ22, i.e., the population variances are unknown but assumed to be equal. When the samples are small, meaning the number of samples is n1,n2<30, the hypothesis testing for the difference between two population means D0 is as follows:
H0: μ1−μ2=D0. In other words, the difference in population means is D0.
H1: not μ1−μ2=D0. In other words, the difference in population means is not D0.
test statistic
The test statistic, using the sample standard deviation s1,s2 is as follows:
Here, sp2 is the pooled sample variance, calculated as follows:
This test statistic follows a t-distribution, with its degrees of freedom df calculated based on the floor function ⌊⋅⌋ as follows:
Pooled Sample Variance: When the population variances are unknown but assumed to be equal, the unbiased estimator for the population variance is as follows:
Satterthwaite’s Approximation: Let k=1,⋯,n, and assume Yk∼χrk2 and ak∈R. If for some ν>0k=1∑nakYk∼νχν2
then, the estimator for ν^ can be used as follows:
According to Satterthwaite’s approximation, the denominator on the right-hand side follows a chi-squared distribution with degrees of freedom df, the numerator follows a standard normal distribution, and t approximately follows a t-distribution with degrees of freedom df. When the random variable Y follows the t-distribution t(df), rejecting H0 at the significance level α for P(Y≥tα)=α sufficient to satisfy tα is equivalent to:
This means that relying on the null hypothesis that μ1−μ2=D0 is too far from D0 to be credible.
