Rao-Blackwell-Kolmogorov Theorem
Theorem 1
- (R0): The probability density function $f$ is injective with respect to $\theta$. In formula, it satisfies: $$ \theta \ne \theta’ \implies f \left( x_{k} ; \theta \right) \ne f \left( x_{k} ; \theta’ \right) $$
- (R1): The probability density function $f$ has the same support for all $\theta$.
- (R2): The true value $\theta_{0}$ is an interior point of $\Omega$.
- (R3): The probability density function $f$ is twice differentiable with respect to $\theta$.
- (R4): The integral $\int f (x; \theta) dx$ allows two differentiations across the integral sign with respect to $\theta$.
Given a random sample $X_{1}, \cdots , X_{n}$ with probability density function $f(x; \theta)$ and parameter $\theta$ that satisfies the regular conditions (R0)~(R4), if the statistic $Y := u \left( X_{1} , \cdots , X_{n} \right)$ equals $E(Y) = k(\theta)$ then $$ \operatorname{Var} (Y) \ge {{ \left[ k’(\theta) \right]^{2} } \over { n I (\theta) }} $$ Here, the right side ${{ \left[ k’(\theta) \right]^{2} } \over { n I (\theta) }}$ is called the Rao-Cramér Lower Bound.
- $nI(\theta)$ is the Fisher information of a random sample size $n$.
Proof
The proof is for the continuous case, but the discrete case is not significantly different.
Writing $k(\theta) = E(Y)$ in integral form gives $$ k(\theta) = \int_{\mathbb{R}} \cdots \int_{\mathbb{R}} u \left( x_{1}, \cdots , x_{n} \right) f \left( x_{1}; \theta \right) \cdots f \left( x_{n}; \theta \right) d x_{1} \cdots d x_{n} $$ Differentiating both sides with respect to $\theta$, treating $f$ as a function of $\theta$, yields $\displaystyle \log g = {{ g' } \over { g }}$ from the differentiation of the log function, therefore $$ \begin{align*} k’(\theta) =& \int_{\mathbb{R}} \cdots \int_{\mathbb{R}} u \left( x_{1}, \cdots , x_{n} \right) \left[ \sum_{k=1}^{n} {{ 1 } \over { f \left( x_{k} ; \theta \right) }} {{ \partial f \left( x_{k} ; \theta \right) } \over { \partial \theta }} \right] \\ & \times f \left( x_{1}; \theta \right) \cdots f \left( x_{n}; \theta \right) d x_{1} \cdots d x_{n} \\ =& \int_{\mathbb{R}} \cdots \int_{\mathbb{R}} u \left( x_{1}, \cdots , x_{n} \right) \left[ \sum_{k=1}^{n} {{ f ' } \over { f }} \right] \\ & \times f \left( x_{1}; \theta \right) \cdots f \left( x_{n}; \theta \right) d x_{1} \cdots d x_{n} \\ =& \int_{\mathbb{R}} \cdots \int_{\mathbb{R}} u \left( x_{1}, \cdots , x_{n} \right) \left[ \sum_{k=1}^{n} {{ \partial \log f \left( x_{k} ; \theta \right) } \over { \partial \theta }} \right] \\ & \times f \left( x_{1}; \theta \right) \cdots f \left( x_{n}; \theta \right) d x_{1} \cdots d x_{n} \end{align*} $$ Defining a new random variable $\displaystyle Z := \sum_{k=1}^{n} {{ \partial \log f \left( x_{k} ; \theta \right) } \over { \partial \theta }}$, the above equation simplifies to $$ k’(\theta) = E(YZ) $$
Barlett’s First Identity: $$ E \left[ {{ \partial \log f ( X ; \theta ) } \over { \partial \theta }} \right] = 0 $$
Variance Form of Fisher Information: $$ \operatorname{Var} \left( {{ \partial \log L ( \theta ; \mathbf{X} ) } \over { \partial \theta }} \right) = n I (\theta) $$
Here, since $Z$ represents the sum of score functions, according to Bartlett’s identity and the variance form of Fisher information, $$ \begin{align*} E(Z) =& 0 \\ \operatorname{Var}(Z) =& n I (\theta) \end{align*} $$ Expressing $k’(\theta)$ in the Covariance Form, the standard deviation of $Y,Z$ and the Pearson correlation coefficient $\rho$ yield $$ \begin{align*} k’(\theta) =& E(YZ) \\ =& E(Y)E(Z) + \rho \sigma_{Y} \sigma_{Z} \\ =& E(Y) \cdot 0 + \rho \sigma_{Y} \sqrt{n I(\theta)} \end{align*} $$ Squaring both sides and rearranging for $\rho^{2}$ gives $$ {{ \left[ k’(\theta) \right]^{2} } \over { \sigma_{Y}^{2} n I (\theta) }} \le \rho^{2} $$ $\rho^{2} \le 1$, and multiplying both sides by $\sigma_{Y}^{2} = \operatorname{Var} (Y)$ results in $$ {{ \left[ k’(\theta) \right]^{2} } \over { n I (\theta) }} \le \operatorname{Var} (Y) $$
■
Corollary
If $k(\theta) = \theta$, that is, $Y$, is an unbiased estimator, $$ \begin{align*} & k(\theta) = \theta \\ \implies& k’(\theta) = 1 \\ \implies& \left[ k’(\theta) \right]^{2} = 1 \end{align*} $$ then $$ \operatorname{Var} (Y) \ge {{ 1 } \over { n I (\theta) }} $$
Hogg et al. (2013). Introduction to Mathematical Statistcs(7th Edition): p337. ↩︎