Let’s assume that we have n independent random variablesYk∼χrk2, each following a chi-squared distribution with degrees of freedom rk. As is well-known, the sum of these, ∑k=1nYk, follows a chi-squared distribution with degrees of freedom ∑k=1nrk. This insight can be particularly useful when looking at the denominator of V/rW, which follows a t-distribution. Unfortunately, there’s an issue when this is directly applied to a Pooled Sample, which means a mix of heterogeneous populations. For instance, if there are given ratios or, more generally, weights a1,⋯,an∈R,
understanding the distribution of k=1∑nakYk becomes quite challenging. While it seems to follow a chi-squared distribution, determining its exact degrees of freedom is difficult. To address this, Satterthwaite proposed a pretty good statistic under the assumption that ∑akYk follows a chi-squared distribution. A key application of the Satterthwaite approximation is the hypothesis testing for the difference between two population means with small samples.
Formula
Let’s say Yk∼χrk2 for k=1,⋯,n and ak∈R. If we assume that for some ν>0,
k=1∑nakYk∼νχν2,
then we can use the following ν^ as an estimator.
ν^=∑krkak2Yk2(∑kakYk)2
k=1∑nakYk∼νχν2
Since the mean of the chi-squared distribution χν2 is ν,
Ek=1∑nakYk=1
it follows that, since each of Yk are EYk=rk and E(χν2/ν)=1, from the 1th moment,
1===E(k=1∑nakYk)k=1∑nakEYkk=1∑nakrk
it is. Given that the mean of χν2 is ν and its variance is 2ν, from the 2rd moment,
E(k=1∑nakYk)2====E[(νχν2)2]ν21E[(χν2)2]ν21[2ν+ν2]ν2+1
it is. Organizing this in terms of ν yields the following estimator:
ν^=(∑k=1nakYk)2−12
This is a pretty good estimator, but the denominator can become problematic, diverging or even becoming negative as (∑k=1nakYk)2 approaches 1. To overcome this risk, let’s delve deeper into (∑k=1nakYk)2.
Correction
Since it was (1) from E∑k=1nakYk=1, by the property of variances EZ2=VarZ+(EZ)2, we get
=E(k=1∑nakYk)2===ν2+1Var(k=1∑nakYk)+(Ek=1∑nakYk)2(Ek=1∑nakYk)2[(E∑k=1nakYk)2Var(∑k=1nakYk)+1]12⋅[(E∑k=1nakYk)2Var(∑k=1nakYk)+1]
By organizing this obtained
ν2+1=(E∑k=1nakYk)2Var(∑k=1nakYk)+1
with respect to ν, we get
ν=Var(∑k=1nakYk)2(E∑k=1nakYk)2
Direct calculation of the denominator’s Var(∑k=1nakYk) yields VarYk=2(EYk)2/rk, so
Var(k=1∑nakYk)===k=1∑nak2VarYkk=1∑nak2rk2(EYk)22k=1∑nak2rk(EYk)2
When inserting this directly, 2 gets simplified, yielding the following estimator:
ν^=∑k=1nak2rk(Yk)2(∑k=1nakYk)2