Convergence of Distributions Defined by Measure Theory 📂Probability Theory

Convergence of Distributions Defined by Measure Theory

Definition

Let’s define a measurable space $(S,\mathcal{S})$ with respect to the Borel sigma field $\mathcal{S}:= \mathcal{B}(S)$ of a metric space $S$.

When random variables $X$ and stochastic processes $\left\{ X_{n} \right\}_{n \in \mathbb{N}}$ defined in a probability space $(\Omega, \mathcal{F}, P)$ are $n \to \infty$, for all $f \in C_{b}(S)$, if the following is satisfied, then it is said to Converge in Distribution $X$ and is denoted as $X_{n} \overset{D}{\to} X$. $$ \int_{\Omega} f(X_{n}) dP \to \int_{\Omega} f(X) dP $$

$C_{b}(S)$ represents the set of bounded continuous functions defined by $S$. $$ C_{b}(S) := \left\{ f:S \to \mathbb{R} \mid f\text{ is bounded and continuous} \right\} $$

Theorem

[1]: If the sequence $\left\{ P_{n} \right\}_{n \in \mathbb{N}}$ of probability measures defined in $(S,\mathcal{S})$ satisfies $$ P_{n} \left( X^{-1} (B) \right) := P \left( X_{n}^{-1} (B) \right) $$ for all Borel sets $B \in \mathcal{B} \left( \mathbb{R} \right)$, then the following holds. $$ X_{n} \overset{D}{\to} X \iff P_{n} \overset{W}{\to} P $$
[2]: It is equivalent that $X_{n} \overset{D}{\to} X$ and every subsequence $\left\{ X_{n '} \right\} \subset \left\{ X_{n} \right\}$ of all $\left\{ X_{n} \right\}$ has a subsequence $\left\{ X_{n ''} \right\} \subset \left\{ X_{n '} \right\}$ satisfying $X_{n ''} \overset{D}{\to} X$. In formulaic terms, it is expressed as follows. $$ X_{n} \overset{D}{\to} X \iff \forall \left\{ X_{n '} \right\} \subset \left\{ X_{n} \right\}, \exists \left\{ X_{n ''} \right\} \subset \left\{ X_{n '} \right\} : X_{n ''} \overset{D}{\to} X $$
[3] Continuous Mapping Theorem: Define $C_{h} : = \left\{ x \in S : h \text{ is continuous at } x \right\}$ as the set of points where $h$ is continuous for measurable functions $h : (S , \mathcal{S}) \to (S ' , \mathcal{S} ')$. If $X_{n} \overset{D}{\to} X$ and $P(X \in C_{h}) = 1$, then $h(X_{n}) \overset{D}{\to} h(X)$. Expressed in formulas, it is as follows. $$ X_{n} \overset{D}{\to} X \land P(X \in C_{h}) = 1 \implies h(X_{n}) \overset{D}{\to} h(X) $$

Description

[1]: As introduced in the theorem, a $P_{n}$ defined this way is called an Induced Probability Measure. It’s important to note that $X_{n} \overset{D}{\to} X$ distinguishes between the convergence of probability ‘variables’ and $P_{n} \overset{W}{\to} P$, the convergence of probability ‘measures’.
[2]: At first glance, this theorem might seem forced, but it becomes an important property when considered alongside the concept of relative compactness.
[3]: Continuous Mapping Theorem is actually generalizable to almost sure convergence, in addition to probability convergence. Given that $h$ is also a function, and probability variables are functions as well, one should be able to naturally consider the use of composite functions $h \circ X$. Contemplating whether the following formula makes sense for $A \in \mathcal{S} ' $ and grasping the process of understanding it is necessary. $$ P \left( h(X)^{-1} (A) \right) = P \left( X \in h^{-1}(A) \right) $$ The notation that distributions are the same for all $f \in C_{b}(S)$ is sometimes used as $\overset{D}{=}$. Its definition is as follows for all $A \in \mathcal{S} ' $ and continuous functions $h:S \to S'$. $$ h(X) \overset{D}{=} h(Y) \overset{\text{def}}{\iff} P \left( h(X)^{-1}(A) \right) = P \left( h(Y)^{-1}(A) \right) $$ Thinking back on the expression of convergence, it proceeds as follows. $$ h\left( X_{n} \right) \overset{D}{\to} h(X) \iff P \left( h\left( X_{n} \right)^{-1}(A) \right) \to P \left( h(X)^{-1}(A) \right) $$

Proof

[1]

For all $f \in C_{b}(S)$ $$ \begin{align*} P_{n} \overset{W}{\to} P \iff & \int_{S} f dP_{n} \to \int_{S} f dP \\ \iff & \int_{\Omega} f(X_{n}) dP \to \int_{\Omega} f(X) dP \\ \iff & X_{n} \overset{D}{\to} X \end{align*} $$

■

[2]

Assume that there exists $f \in C_{b}(S)$ for which $X_{n} \overset{D}{\to} X$ does not hold $$ \int_{\Omega} f(X_{n}) dP \to \int_{\Omega} f(X) dP $$. That is, assuming $$ \left| \int_{\Omega} f(X_{n '}) dP - \int_{\Omega} f(X) dP \right| > \varepsilon $$ that there exists a subsequence index $\left\{ n' \right\}$ satisfying $\varepsilon > 0$. However, this is a contradiction because there always exists a subsequence of subsequence indexes $\left\{ n'' \right\}$ satisfying $$ \int_{\Omega} f(X_{n ''}) dP \to \int_{\Omega} f(X) dP $$

It is trivially true if we set $(\impliedby)$ as $\left\{ n'' \right\} = \left\{ n \right\}$.

■

[3]

Let’s denote the probability measure induced by $X$ as $P_{X}(A) := P \left( X^{-1}(A) \right) = P(X \in A)$. $$ \overline{h^{-1}(B)} \subset h^{-1}(B) \cup C_{h}^{c} $$ Considering all closed sets $B$ in $S'$, the above inclusion relationship holds. For an arbitrary $x \in \overline{h^{-1}(B)}$, since $h$ preserves closure for the continuous part, containing $h^{-1}(B)$ and the preimage of the non-continuous part includes $C_{h}^{c}$. Since the closure $\overline{h^{-1}(B)}$ is a closed set in $S$ $$ \begin{align*} & \limsup_{n \to \infty} P \left( h ( X_{n} ) \in B \right) \\ =& \limsup_{n \to \infty} P \left( X_{n} \in h^{-1} (B) \right) \\ =& \limsup_{n \to \infty} P_{X} \left( h ( X_{n} )^{-1}(B) \right) \\ =& \limsup_{n \to \infty} P_{X} \left( \left[ X_{n}^{-1} \circ h^{-1} \right] (B) \right) \\ =& \limsup_{n \to \infty} P_{X} \left( X_{n}^{-1} \left( h^{-1} (B) \right) \right) \\ =& \limsup_{n \to \infty} P_{n} \left( h^{-1} (B)\right) \\ \le & \limsup_{n \to \infty} P_{n} \left( \overline{h^{-1} (B)} \right) \end{align*} $$

Portmanteau Theorem: Let’s say the space $S$ is both a metric space $( S , \rho)$ and a measurable space $(S,\mathcal{B}(S))$. The following are all equivalent.
(1): $P_{n} \overset{W}{\to} P$
(2): For all bounded, uniformly continuous functions $f$, $\displaystyle \int_{S} f dP_{n} \to \int_{S}f d P$
(3): For all closed sets $F$, $\displaystyle \limsup_{n\to\infty} P_{n}(F) \le P(F)$
(4): For all open sets $G$, $\displaystyle P(G) \le \liminf_{n\to\infty} P_{n}(G)$
(5): For all $P(\partial A) = 0$, $\displaystyle \lim_{n\to\infty} P_{n}(A) = P(A)$ of every $A$

Following [1], if $X_{n} \overset{D}{\to} X$, then $P_{n} \overset{W}{\to} P_{X}$, and by the assumption $P_{X}(X \in C_{h}^{c}) = 0$ and $(1) \implies (3)$ of the Portmanteau theorem $$ \begin{align*} \limsup_{n \to \infty} P_{X} \left( h ( X_{n} )^{-1}(B) \right) \le & \limsup_{n \to \infty} P_{n} \left( \overline{h^{-1} (B)} \right) \\ \le & P_{X} \left( \overline{h^{-1} (B)} \right) \\ \le & P_{X} \left( h^{-1} (B) \cup C_{h}^{c} \right) \\ \le & P _{X}\left( h^{-1} (B) \right) + P_{X} \left( C_{h}^{c} \right) \\ \le & P_{X} \left( h^{-1} (B) \right) \\ \le & P_{X} \left( X^{-1} \left( h^{-1} (B) \right) \right) \\ \le & P_{X} \left( \left( h(X) \right)^{-1} (B) \right) \end{align*} $$ Showing $\displaystyle P_{X} \left( \left( h(X) \right)^{-1} (B) \right) \le \liminf_{n \to \infty} P_{X} \left( h ( X_{n} )^{-1}(B) \right)$ by the same method $$ \lim_{n \to \infty} P_{X} \left( h ( X_{n} )^{-1}(B) \right) = P_{X} \left( \left( h(X) \right)^{-1} (B) \right) $$

■