Consistent Estimator 📂Mathematical Statistics

Consistent Estimator

Definition ¹

Let the random variable $X$ have a cumulative distribution function $F ( x ; \theta), \theta \in \Theta$. When $X_{1} , \cdots , X_{n}$ is drawn from $X$, the statistic $T_{n}$ satisfies the following for the parameter $\theta$, it is said to be a Consistent Estimator.

$$ T_{n} \overset{P}{\to} \theta \quad \text{as } n \to \infty $$

$\overset{P}{\to}$ is probabilistic convergence.

Explanation

If the unbiased estimator discusses the estimator from the concept of the expected value, the consistent estimator discusses whether the statistic itself converges to the parameter in terms of the concept of limits in [analysis]…(../1186), to be more precise, through the uniform convergence of a sequence of functions.

$$ \begin{align*} {{ 1 } \over { n - 1 }} \sum_{k=1}^{n} \left( X_{k}^{2} - \overline{X}_{n} \right)^{2} \overset{P}{\to}& \sigma^{2} \qquad \cdots 🤔 ? \\ {{ 1 } \over { n }} \sum_{k=1}^{n} \left( X_{k}^{2} - \overline{X}_{n} \right)^{2} \overset{P}{\to}& \sigma^{2} \qquad \cdots 🤔 ! \end{align*} $$ As a simple example, looking at the following theorem, in the process of proving, the denominator of the sample variance $S_{n}$ defined not as the degree of freedom $(n-1)$ but as $n$ still poses no problem as a consistent estimator. This is mathematically explaining the intuition that ‘after all, if $n$ grows, wouldn’t $n$ and $(n-1)$ be essentially the same?’, but to justify this intuition with the following theorem, the existence of skewness is necessary.

Theorem

Consistency of Sample Variance

If $X_{1} , \cdots , X_{n}$ is a random sample following the probability distribution $\left( \mu, \sigma^{2} \right)$, that is, $X_{1} , \cdots , X_{n} \overset{\text{iid}}{\sim} \left( \mu, \sigma^{2} \right)$, and if skewness exists, then the sample variance $S_{n}^{2}$ is a consistent estimator for the population variance $\sigma^{2}$: $$ S_{n} \overset{P}{\to} \sigma^{2} \quad \text{as } n \to \infty $$

Proof ²

Since $X_{1} , \cdots , X_{n}$ are iid random samples, hence independent, the sample variance $S_{n}$ can be represented as follows. $$ \begin{align*} S_{n}^{2} =& {{ 1 } \over { n - 1 }} \sum_{k=1}^{n} \left( X_{k}^{2} - \overline{X}_{n} \right)^{2} \\ =& {{ n } \over { n - 1 }} \left[ {{ 1 } \over { n }} \sum_{k=1}^{n} X_{k}^{2} - \overline{X}_{n}^{2} \right] \end{align*} $$

Weak Law of Large Numbers: If $\left\{ X_{k} \right\}_{k=1}^{n}$ are iid random variables following the probability distribution $\left( \mu, \sigma^2 \right) $, then $$ \overline{X}_n \overset{P}{\to} \mu $$

Continuous Mapping Theorem: $$X_{n} \overset{P}{\to} X \implies g \left( X_{n} \right) \overset{P}{\to} g(X) $$ Continuity and Limits: For the function $f:X \to Y$, the following conditions are equivalent.
$f : X \to Y$ is continuous.
$\forall x \in X,\ \displaystyle \lim_{n \to \infty} p_{n} = p \implies \lim_{n \to \infty} f(p_{n}) = f(p)$

Since a polynomial function $\lambda (x) = x^{2}$ whose value is a square is a continuous function, the following holds when $n \to \infty$, according to the Continuous Mapping Theorem and Weak Law of Large Numbers. $$ \overline{X}_{n}^{2} \overset{P}{\to} \mu^{2} $$

The Continuous Mapping Theorem can be challenging to understand at the undergraduate level and can be passed over as similar to the properties of continuous functions discussed in Introduction to Analysis.

Definition and Equivalent Conditions of Probabilistic Convergence: When a random variable $X$ and a sequence of random variables $\left\{ X_{n} \right\}$ satisfy the following, it is said that $X_{n}$ converges in probability to $X$ when $n \to \infty$, represented as $X_{n} \overset{P}{\to} X$. $$ \forall \varepsilon > 0 , \lim_{n \to \infty} P \left[ \left| X_{n} - X \right| < \varepsilon \right] = 1 $$ The following expression, equivalent when used in formulas, is more frequently preferred. $$ \forall \varepsilon > 0 , \lim_{n \to \infty} P \left[ \left| X_{n} - X \right| \ge \varepsilon \right] = 0 $$

Chebyshev’s Inequality: If the variance $\sigma^2 < \infty$ of a random variable $X$ exists, then for $\mu := E(X)$ and any positive number $K>0$, $$ \displaystyle P(|X-\mu| \ge K\sigma) \le {1 \over K^2} $$

Since the premise of the theorem includes the existence of skewness, implying that the fourth central moment $E \left( X_{1}^{4} \right) < \infty$ exists, the variance of $\sum X_{k}^{2}$ can generally be represented as proportional to the population variance of $X_{1}$’s sample variance by some constant $c^{2} > 0$ as $c^{2} \sigma^{4}$. Rewritten in formula, it is $$ {{ 1 } \over { n }} \sum_{k=1}^{n} X_{k}^{2} \sim \left( E \left( X_{1}^{2} \right) , {{ c^{2} \sigma^{4} } \over { n }} \right) $$ and for any given $\varepsilon > 0$, according to Chebyshev’s inequality, some positive $K := n \varepsilon / c \sigma^{2}$ exists such that $$ \begin{align*} & \forall \varepsilon > 0, P \left( \left| {{ 1 } \over { n }} \sum_{k=1}^{n} X_{k}^{2} - E \left( X_{1}^{2} \right) \right| \ge K c \sigma^{2} \right) \le {{ 1 } \over { K^{2} }} \\ \implies & \forall \varepsilon > 0, P \left( \left| {{ 1 } \over { n }} \sum_{k=1}^{n} X_{k}^{2} - E \left( X_{1}^{2} \right) \right| \ge \varepsilon \right) \le {{ c^{2} \sigma^{4} } \over { n^{2} \varepsilon^{2} }} \\ \implies & \forall \varepsilon > 0, \lim_{n \to \infty} P \left( \left| {{ 1 } \over { n }} \sum_{k=1}^{n} X_{k}^{2} - E \left( X_{1}^{2} \right) \right| \ge \varepsilon \right) = 0 \\ \implies & {{ 1 } \over { n }} \sum_{k=1}^{n} X_{k}^{2} \overset{P}{\to} E \left( X_{1}^{2} \right) \end{align*} $$ Summarizing, $$ \begin{align*} S_{n}^{2} =& {{ n } \over { n - 1 }} \left[ {{ 1 } \over { n }} \sum_{k=1}^{n} X_{k}^{2} - \overline{X}_{n}^{2} \right] \\ \overset{P}{\to}& 1 \cdot \left[ E \left( X_{1}^{2} \right) - \mu^{2} \right] = \sigma^{2} \end{align*} $$ and $S_{n}^{2}$ is a consistent estimator for the population variance $\sigma^{2}$. Here, in the part about $n / (n-1) \to 1$, it can be understood that in fact, the sample variance could be divided by an appropriate constant $a \ne n$ as $(n+a)$ without problem as a consistent estimator.

■

Hogg et al. (2013). Introduction to Mathematical Statistcs(7th Edition): p298. ↩︎
Hogg et al. (2018). Introduction to Mathematical Statistcs(8th Edition): p325. ↩︎