Proof of the Continuity Mapping Theorem 📂Probability Theory

Proof of the Continuity Mapping Theorem

Theorem ¹

The following is a measure-theoretic description of the continuous mapping theorem.

For metric spaces $\left( S , d \right)$ and $\left( S' , d’ \right)$, let us say $g : S \to S'$ is continuous from $C_{g} \subset S$. For a random element $X$ in $S$, concerning a sequence of random elements converging to $X$ in $\left\{ X_{n} \right\}_{n \in \mathbb{N}}$, the following holds if $P \left( X \in C_{g} \right) = 1$. $$ X_{n} \overset{D}{\to} X \implies g \left( X_{n} \right) \overset{D}{\to} g(X) \\ X_{n} \overset{P}{\to} X \implies g \left( X_{n} \right) \overset{P}{\to} g(X) \\ X_{n} \overset{\text{a.s.}}{\to} X \implies g \left( X_{n} \right) \overset{\text{a.s.}}{\to} g(X) $$

$C_{g} \subset S$ denotes the set of points where the function $g$ is continuous.
$\overset{P}{\to}$, $\overset{D}{\to}$, $\overset{\text{a.s.}}{\to}$ each represent convergence in probability, convergence in distribution, and almost sure convergence, respectively.

Explanation

The property that convergence holds even when applying continuous functions is a phenomenon that can be commonly observed in mathematics, no matter how convergence is defined. However, the term Continuous Mapping Theorem is predominantly used in the field of probability theory. A famous corollary is Slutsky’s Theorem, which is introduced only at an undergraduate level of mathematical statistics as a statement.

Slutsky’s Theorem²: For a constant $a,b$ and a random variable $A_{n}, B_{n} ,X_{n} , X$, if $A_{n} \overset{P}{\to} a $, $ B_{n} \overset{P}{\to} b $, $ X_{n} \overset{D}{\to} X $, then $$ A_{n} + B_{n} X_{n} \overset{D}{\to} a + b X $$

Although it is assumed that the fact itself can be used in basic subjects like undergraduate mathematical statistics, it is not easy to find a proof that is easy to understand without background knowledge, hence a proof involving measure theory was introduced. If you are an undergraduate student lacking understanding of real analysis, it is normal not to understand the proof, and there is no need to be disappointed. Instead, it is sufficient to think that you need to learn more difficult mathematics and to use it as a fact for the time being.

Proof

Convergence in Distribution

It can be obtained as a corollary to the portmanteau theorem.

■

Convergence in Probability

Fix $\varepsilon > 0$ and define the following set $C_{g}^{\delta} \subset C_{g}$ for any $\delta > 0$. $$ C_{g}^{\delta}:= \left\{ x \in C_{g} \mid \exists y : y \in B \left( x;\delta \right) \land g(y) \notin B ' \left( g(x) ; \varepsilon \right) \right\} $$ This set collects points $x$ where the function $g$ is continuous within a radius $\delta$, while it is possible to choose $y$ sufficiently far from both $g(y)$ and $g(x)$. Naturally, as $\delta > 0$ decreases, the likelihood of existence of such $y$ within the radius also decreases, and it is trivially $\displaystyle \lim_{\delta \to 0} C_{g}^{\delta} = \emptyset$. Now, assume $d’ \left( g(X) , g \left( X_{n} \right) \right) \ge \varepsilon$ for argument’s sake. At least one of the following three must be true:

(1): $d \left( X , X_{n} \right) > \delta$: Initially, $X$ and $X_{n}$ are too far apart, so whether $g$ is continuous or not, $g(X)$ and $ g \left( X_{n} \right) $ are also far apart.
(2): $X \in C_{g}^{\delta}$: Although $X$ is continuous, for $X_{n}$ within radius $\delta$, the distance between $g(X_{n})$ and $g (X)$ is far.
(3): $X \notin C_{g}$: $X$ is not continuous, hence $g(X)$ and $g \left( X_{n} \right)$ are far apart.

Presenting this using probability $$ P \left( d’ \left( g \left( X_{n} \right) , g(X) \right) > \varepsilon \right) \le P \left( d \left( X_{n} , X \right) \ge \delta \right) + P \left( X \in C_{g}^{\delta} \right) + P \left( X \notin C_{g} \right) $$ the right-hand side terms are

(1): since $X_{n} \overset{P}{\to} X$ by premise, for all $\delta >0$ $$ \lim_{n \to \infty} P \left( d \left( X_{n} , X \right) \ge \delta \right) = 0 $$
(2): since it was said $\displaystyle \lim_{\delta \to 0} C_{g}^{\delta} = \emptyset$ above $$ \lim_{\delta \to 0} P \left( X \in C_{g}^{\delta} \right) = 0 $$
(3): since $P \left( X \in C_{g} \right) = 1$ by premise $$ P \left( X \notin C_{g} \right) = P \left( X \in C_{g}^{c} \right) = 0 $$

Summarizing $$ \lim_{n \to \infty} P \left( d’ \left( g \left( X_{n} \right) , g (X) \right) > \varepsilon \right) = 0 $$

■

Almost Sure Convergence

For points $\omega \in C_{g}$ where $g$ is continuous, $$ \lim_{n \to \infty} X_{n} (\omega) = X (\omega) \implies \lim_{n \to \infty} g \left( X_{n} (\omega) \right) = g \left( X (\omega) \right) $$ Viewing as events and showing the inclusion $$ \left[ \lim_{n \to \infty} X_{n} (\omega) = X (\omega) \right] \subset \left[ \lim_{n \to \infty} g \left( X_{n} (\omega) \right) = g \left( X (\omega) \right) \right] $$ since by premise $X_{n} \overset{\text{a.s.}}{\to} X$, namely $\displaystyle P \left( \lim_{n \to \infty } X_{n} = X , X \in C_{g} \right) = 1$ $$ \begin{align*} P \left[ \lim_{n \to \infty} g \left( X_{n} (\omega) \right) = g \left( X (\omega) \right) \right] \ge & P \left[ \lim_{n \to \infty} g \left( X_{n} (\omega) \right) = g \left( X (\omega) \right) , X \in C_{g} \right] \\ \ge & P \left[ \lim_{n \to \infty} X_{n} (\omega) = X (\omega) , X \in C_{g} \right] \\ =& 1 \end{align*} $$

■

https://en.wikipedia.org/wiki/Continuous_mapping_theorem ↩︎
Hogg et al. (2013). Introduction to Mathematical Statistcs(7th Edition): p306. ↩︎