Probability Convergence in Mathematical Statistics
Definition 1
A random variable $X$ and a sequence of random variables $\left\{ X_{n} \right\}$ are said to converge in probability to $X$ as $n \to \infty$ if they satisfy the following, and it is denoted by $X_{n} \overset{P}{\to} X$. $$ \forall \varepsilon > 0 , \lim_{n \to \infty} P \left[ \left| X_{n} - X \right| < \varepsilon \right] = 1 $$
Explanation
The condition for convergence in probability is exactly as it’s defined in terms of probabilities, which simply means that as $n$ increases, the two random variables are likely to be equal with a very small error $\varepsilon$. This is precisely what is meant by convergence in probability. In equations, this is represented by the following equivalent but more convenient expression. $$ \forall \varepsilon > 0 , \lim_{n \to \infty} P \left[ \left| X_{n} - X \right| \ge \varepsilon \right] = 0 $$ As is known, a random variable is a function from a sample space to real numbers, and comparing two functions in terms of their difference $\varepsilon$ makes it analogous to the uniform convergence of functions in analytical terms. This analogy extends to the fact that if there is uniform convergence, there is pointwise convergence, just as if there is convergence in probability, there is convergence in distribution. If the sudden appearance of epsilon is unwelcome, it’s time to get familiar with it or give up on mathematical statistics. In statistics, saying $n$ increases is not just about sending some number to infinity, it mathematically represents the assumption of having a sufficiently large sample size, and if one cannot discuss the sample size in theoretical statistical development using probability theory, then there’s essentially nothing to be done. No matter how awkward analysis may seem to the reader, an effort should be made to at least read and understand Part 1. of the proof [3] presented in this post. Here are some intuitive properties of probability convergence.
Theorem
Let’s assume $X_{n} \overset{P}{\to} X$.
- [1] Continuous Mapping Theorem: For a continuous function $g$, $$ g\left( X_{n} \right) \overset{P}{\to} g (X) $$
- [2]: Convergence in probability implies convergence in distribution. That is, $$ X_{n} \overset{P}{\to} X \implies X_{n} \overset{D}{\to} X $$
- [3]: If $a \in \mathbb{R}$ is constant and $ Y_{n} \overset{P}{\to} Y$, $$ aX_{n} \overset{P}{\to} a X \\ X_{n} + Y_{n} \overset{P}{\to} X + Y \\ X_{n} Y_{n} \overset{P}{\to} XY $$
Proof
[1]
There are proofs beyond the undergraduate level, and it’s not necessary to delve into the depth of mathematical statistics to understand. It’s acceptable to just accept and move on.
■
[2]
■
[3]
Part 1. $aX_{n} \overset{P}{\to} a X $
Although it can also be directly concluded from the Continuous Mapping Theorem, we choose to conduct a direct deduction as an example of an analytical proof. It’s trivially true if $a = 0$, so let us assume $a \ne 0$.
If we define $\varepsilon > 0$, then by dividing $|a|$ inside the probability of $P$, we obtain the following equation. $$ \begin{align*} P \left( \left| a X_{n} - aX \right| \ge \varepsilon \right) =& P \left( |a| \left| X_{n} - X \right| \ge \varepsilon \right) \\ =& P \left( \left| X_{n} - X \right| \ge {{ \varepsilon } \over { |a| }} \right) \end{align*} $$ Given the assumption that $X_{n} \overset{P}{\to} X$ as $n \to \infty$, the last term converges to $0$ as $n \to \infty$, hence taking the limit for the first term yields the following. $$ \lim_{n \to \infty} P \left( \left| a X_{n} - aX \right| \ge \varepsilon \right) = 0 $$
Part 2. $X_{n} + Y_{n} \overset{P}{\to} X + Y$
It’s not too difficult as long as you don’t confuse the direction of the inequality. According to the Triangle Inequality, $$ \left| \left( X_{n} - X \right) + \left( Y_{n} - Y \right) \right| \le \left| X_{n} - X \right| + \left| Y_{n} - Y \right| $$ Following the diagram below the inclusion relation of the two events $$ \color{blue}{\left( \left| X_{n} - X \right| + \left| Y_{n} - Y \right| \ge \varepsilon \right) } \subset \color{orange}{ \left[ \left( \left| X_{n} - X \right| \ge \varepsilon / 2 \right) \cup \left( \left| Y_{n} - Y \right| \ge \varepsilon / 2 \right) \right] } $$ is evident. Now assuming $\varepsilon \le \left| \left( X_{n} - X \right) + \left( Y_{n} - Y \right) \right|$, $$ \begin{align*} P \left[ \left| \left( X_{n} + Y_{n} \right) - \left( X + Y \right) \right| \ge \varepsilon \right] =& P \left[ \left| \left( X_{n} - X \right) + \left( Y_{n} - Y \right) \right| \ge \varepsilon \right] \\ \le & P \left[ \color{blue}{ \left| X_{n} - X \right| + \left| Y_{n} - Y \right| \ge \varepsilon } \right] \\ \le & P \left[ \color{orange}{ \left( \left| X_{n} - X \right| \ge \varepsilon / 2 \right) \cup \left( \left| Y_{n} - Y \right| \ge \varepsilon / 2 \right) } \right] \\ \le & P \left[ \left| X_{n} - X \right| \ge \varepsilon / 2 \right] + P \left[ \left| Y_{n} - Y \right| \ge \varepsilon / 2 \right] \end{align*} $$ since the last term converges to $0$ as $n \to \infty$, we obtain the following. $$ \lim_{n \to \infty} P \left[ \left| \left( X_{n} + Y_{n} \right) - \left( X + Y \right) \right| \ge \varepsilon \right] \le 0 $$
Part 3. $X_{n} Y_{n} \overset{P}{\to} XY$
$$ g(x) := x^{2} $$ is a continuous function, so by theorem [1] $X_{n}^{2} \overset{P}{\to} X^{2}$, and $$ \begin{align*} X_{n} Y_{n} =& {{ 1 } \over { 2 }} X_{n}^{2} + {{ 1 } \over { 2 }} Y_{n}^{2} - {{ 1 } \over { 2 }} \left( X_{n} - Y_{n} \right)^{2} \\ \overset{P}{\to}& {{ 1 } \over { 2 }} X^{2} + {{ 1 } \over { 2 }} Y^{2} - {{ 1 } \over { 2 }}\left( X - Y \right)^{2} \\ =& XY \end{align*} $$
■
Rigorous Definition
Hogg et al. (2013). Introduction to Mathematical Statistcs(7th Edition): p295. ↩︎