(R3): The probability density function f is twice differentiable with respect to θ.
(R4): The integral ∫f(x;θ)dx allows two differentiations across the integral sign with respect to θ.
Given a random sampleX1,⋯,Xn with probability density function f(x;θ) and parameter θ that satisfies the regular conditions (R0)~(R4), if the statistic Y:=u(X1,⋯,Xn) equals E(Y)=k(θ) then
Var(Y)≥nI(θ)[k’(θ)]2
Here, the right side nI(θ)[k’(θ)]2 is called the Rao-Cramér Lower Bound.
The proof is for the continuous case, but the discrete case is not significantly different.
Writing k(θ)=E(Y) in integral form gives
k(θ)=∫R⋯∫Ru(x1,⋯,xn)f(x1;θ)⋯f(xn;θ)dx1⋯dxn
Differentiating both sides with respect to θ, treating f as a function of θ, yields logg=gg′ from the differentiation of the log function, therefore
k’(θ)===∫R⋯∫Ru(x1,⋯,xn)[k=1∑nf(xk;θ)1∂θ∂f(xk;θ)]×f(x1;θ)⋯f(xn;θ)dx1⋯dxn∫R⋯∫Ru(x1,⋯,xn)[k=1∑nff′]×f(x1;θ)⋯f(xn;θ)dx1⋯dxn∫R⋯∫Ru(x1,⋯,xn)[k=1∑n∂θ∂logf(xk;θ)]×f(x1;θ)⋯f(xn;θ)dx1⋯dxn
Defining a new random variable Z:=k=1∑n∂θ∂logf(xk;θ), the above equation simplifies to
k’(θ)=E(YZ)
Here, since Z represents the sum of score functions, according to Bartlett’s identity and the variance form of Fisher information,
E(Z)=Var(Z)=0nI(θ)
Expressing k’(θ) in the Covariance Form, the standard deviation of Y,Z and the Pearson correlation coefficientρ yield
k’(θ)===E(YZ)E(Y)E(Z)+ρσYσZE(Y)⋅0+ρσYnI(θ)
Squaring both sides and rearranging for ρ2 gives
σY2nI(θ)[k’(θ)]2≤ρ2ρ2≤1, and multiplying both sides by σY2=Var(Y) results in
nI(θ)[k’(θ)]2≤Var(Y)
■
Corollary
If k(θ)=θ, that is, Y, is an unbiased estimator,
⟹⟹k(θ)=θk’(θ)=1[k’(θ)]2=1
then
Var(Y)≥nI(θ)1
Hogg et al. (2013). Introduction to Mathematical Statistcs(7th Edition): p337. ↩︎