logo

Rao-Blackwell-Kolmogorov Theorem 📂Mathematical Statistics

Rao-Blackwell-Kolmogorov Theorem

Theorem 1

Regular Conditions:

  • (R0): The probability density function ff is injective with respect to θ\theta. In formula, it satisfies: θθ    f(xk;θ)f(xk;θ) \theta \ne \theta ' \implies f \left( x_{k} ; \theta \right) \ne f \left( x_{k} ; \theta ' \right)
  • (R1): The probability density function ff has the same support for all θ\theta.
  • (R2): The true value θ0\theta_{0} is an interior point of Ω\Omega.
  • (R3): The probability density function ff is twice differentiable with respect to θ\theta.
  • (R4): The integral f(x;θ)dx\int f (x; \theta) dx allows two differentiations across the integral sign with respect to θ\theta.

Given a random sample X1,,XnX_{1}, \cdots , X_{n} with probability density function f(x;θ)f(x; \theta) and parameter θ\theta that satisfies the regular conditions (R0)~(R4), if the statistic Y:=u(X1,,Xn)Y := u \left( X_{1} , \cdots , X_{n} \right) equals E(Y)=k(θ)E(Y) = k(\theta) then Var(Y)[k(θ)]2nI(θ) \operatorname{Var} (Y) \ge {{ \left[ k’(\theta) \right]^{2} } \over { n I (\theta) }} Here, the right side [k(θ)]2nI(θ){{ \left[ k’(\theta) \right]^{2} } \over { n I (\theta) }} is called the Rao-Cramér Lower Bound.


Proof

The proof is for the continuous case, but the discrete case is not significantly different.


Writing k(θ)=E(Y)k(\theta) = E(Y) in integral form gives k(θ)=RRu(x1,,xn)f(x1;θ)f(xn;θ)dx1dxn k(\theta) = \int_{\mathbb{R}} \cdots \int_{\mathbb{R}} u \left( x_{1}, \cdots , x_{n} \right) f \left( x_{1}; \theta \right) \cdots f \left( x_{n}; \theta \right) d x_{1} \cdots d x_{n} Differentiating both sides with respect to θ\theta, treating ff as a function of θ\theta, yields logg=gg\displaystyle \log g = {{ g' } \over { g }} from the differentiation of the log function, therefore k(θ)=RRu(x1,,xn)[k=1n1f(xk;θ)f(xk;θ)θ]×f(x1;θ)f(xn;θ)dx1dxn=RRu(x1,,xn)[k=1nff]×f(x1;θ)f(xn;θ)dx1dxn=RRu(x1,,xn)[k=1nlogf(xk;θ)θ]×f(x1;θ)f(xn;θ)dx1dxn \begin{align*} k’(\theta) =& \int_{\mathbb{R}} \cdots \int_{\mathbb{R}} u \left( x_{1}, \cdots , x_{n} \right) \left[ \sum_{k=1}^{n} {{ 1 } \over { f \left( x_{k} ; \theta \right) }} {{ \partial f \left( x_{k} ; \theta \right) } \over { \partial \theta }} \right] \\ & \times f \left( x_{1}; \theta \right) \cdots f \left( x_{n}; \theta \right) d x_{1} \cdots d x_{n} \\ =& \int_{\mathbb{R}} \cdots \int_{\mathbb{R}} u \left( x_{1}, \cdots , x_{n} \right) \left[ \sum_{k=1}^{n} {{ f ' } \over { f }} \right] \\ & \times f \left( x_{1}; \theta \right) \cdots f \left( x_{n}; \theta \right) d x_{1} \cdots d x_{n} \\ =& \int_{\mathbb{R}} \cdots \int_{\mathbb{R}} u \left( x_{1}, \cdots , x_{n} \right) \left[ \sum_{k=1}^{n} {{ \partial \log f \left( x_{k} ; \theta \right) } \over { \partial \theta }} \right] \\ & \times f \left( x_{1}; \theta \right) \cdots f \left( x_{n}; \theta \right) d x_{1} \cdots d x_{n} \end{align*} Defining a new random variable Z:=k=1nlogf(xk;θ)θ\displaystyle Z := \sum_{k=1}^{n} {{ \partial \log f \left( x_{k} ; \theta \right) } \over { \partial \theta }}, the above equation simplifies to k(θ)=E(YZ) k’(\theta) = E(YZ)

Barlett’s First Identity: E[logf(X;θ)θ]=0 E \left[ {{ \partial \log f ( X ; \theta ) } \over { \partial \theta }} \right] = 0

Variance Form of Fisher Information: Var(logL(θ;X)θ)=nI(θ) \operatorname{Var} \left( {{ \partial \log L ( \theta ; \mathbf{X} ) } \over { \partial \theta }} \right) = n I (\theta)

Here, since ZZ represents the sum of score functions, according to Bartlett’s identity and the variance form of Fisher information, E(Z)=0Var(Z)=nI(θ) \begin{align*} E(Z) =& 0 \\ \operatorname{Var}(Z) =& n I (\theta) \end{align*} Expressing k(θ)k’(\theta) in the Covariance Form, the standard deviation of Y,ZY,Z and the Pearson correlation coefficient ρ\rho yield k(θ)=E(YZ)=E(Y)E(Z)+ρσYσZ=E(Y)0+ρσYnI(θ) \begin{align*} k’(\theta) =& E(YZ) \\ =& E(Y)E(Z) + \rho \sigma_{Y} \sigma_{Z} \\ =& E(Y) \cdot 0 + \rho \sigma_{Y} \sqrt{n I(\theta)} \end{align*} Squaring both sides and rearranging for ρ2\rho^{2} gives [k(θ)]2σY2nI(θ)ρ2 {{ \left[ k’(\theta) \right]^{2} } \over { \sigma_{Y}^{2} n I (\theta) }} \le \rho^{2} ρ21\rho^{2} \le 1, and multiplying both sides by σY2=Var(Y)\sigma_{Y}^{2} = \operatorname{Var} (Y) results in [k(θ)]2nI(θ)Var(Y) {{ \left[ k’(\theta) \right]^{2} } \over { n I (\theta) }} \le \operatorname{Var} (Y)

Corollary

If k(θ)=θk(\theta) = \theta, that is, YY, is an unbiased estimator, k(θ)=θ    k(θ)=1    [k(θ)]2=1 \begin{align*} & k(\theta) = \theta \\ \implies& k’(\theta) = 1 \\ \implies& \left[ k’(\theta) \right]^{2} = 1 \end{align*} then Var(Y)1nI(θ) \operatorname{Var} (Y) \ge {{ 1 } \over { n I (\theta) }}


  1. Hogg et al. (2013). Introduction to Mathematical Statistcs(7th Edition): p337. ↩︎