Convergence of Distributions in Mathematical Statistics
Definition 1
Given a random variable $X$ and a sequence of random variables $\left\{ X_{n} \right\}$, if the following condition is satisfied when $n \to \infty$, we say that $X$ converges in distribution to $X_{n}$ and represent it as $X_{n} \overset{D}{\to} X$. $$ \lim_{n \to \infty} F_{X_{n}} (x) = F_{X} (x) \qquad, \forall x \in C_{F_{X}} $$
- $F_{X}$ is the cumulative distribution function of the random variable $X$.
- $C_{F_{X}}$ represents the set of points where the function $F_{X}$ is continuous.
Explanation
Convergence in Distribution is a concept defined in the sense of distribution’s convergence, similar to probability convergence. Converging for each $x \in C_{F_{X}}$ means something similar to pointwise convergence of functions in analysis, and this similarity extends to the fact that if there is uniform convergence, there is pointwise convergence, as is the case with probability convergence leading to distribution convergence.
It is important to note that although we talk about distribution convergence, as precisely indicated in $X_{n} \overset{D}{\to} X$, ‘distribution convergence’ also refers to the convergence of random variables. The pointwise convergence in the continuous part of the distribution function means not exactly that the random variable converges, but one of its properties, the distribution, converges. This, of course, is a much looser premise than the convergence of the random variable itself. Even if there is no difference from the perspective of distribution, it does not mean that the random variable essentially converges.
Indeed, $X_{n} \overset{D}{\to} X$ and $Y_{n} \overset{D}{\to} Y$ do not guarantee that $X_{n} + Y_{n}$ converges in distribution to $X + Y$. Unlike probability convergence, distribution convergence only requires a light condition of pointwise convergence of the cumulative distribution function, so it lacks even these common properties.
Theorems
Let $X_{n} \overset{D}{\to} X$ be stated.
- [1] Continuous Mapping Theorem: For a continuous function $g$, $$ g\left( X_{n} \right) \overset{D}{\to} g (X) $$
- [2]: If it converges in probability, it converges in distribution. That is, $$ X_{n} \overset{P}{\to} X \implies X_{n} \overset{D}{\to} X $$
- [3]: If it converges in distribution, it is probability bounded.
- [4] Slutsky’s Theorem2: For a constant $a,b$ and a random variable $A_{n}, B_{n} ,X_{n} , X$, if $a_{n} \overset{P}{\to} a $, $ B_{n} \overset{P}{\to} b $, and $ X_{n} \overset{D}{\to} X $, then $$ A_{n} + B_{n} X_{n} \overset{D}{\to} a + b X $$
Limit Distributions
Meanwhile, if $X_{n} \overset{D}{\to} X$, then the distribution of $X$ is also called the asymptotic or limiting distribution of $\left\{ X_{n} \right\}$. For convenience, the distribution of $X$ is sometimes used directly, for instance, if $X \sim N(0,1)$, it can be represented as follows3. $$ X_{n} \overset{D}{\to} N(0,1) $$
Examples
[a] Deriving Poisson distribution as the limit distribution of binomial distribution: Let $X_{n} \sim B(n,p)$.
If $\mu \approx np$, then $$ X_{n} \overset{D}{\to} \text{Poi} (\mu) $$ [b] Deriving standard normal distribution as the limit distribution of binomial distribution: If $X_i \sim B(1,p)$ and $Y_n = X_1 + X_2 + \cdots + X_n$, then $Y_n \sim B(n,p)$, $$ { { Y_n - np } \over {\sqrt{ np(1-p) } } }\overset{D}{\to} N(0,1) $$ [c] Deriving standard normal distribution as the limit distribution of Poisson distribution: If $X_{n} \sim \text{Poi} \left( n \right)$ and $\displaystyle Y_{n} := {{ X_{n} - n } \over { \sqrt{n} }}$, $$ Y_{n} \overset{D}{\to} N(0,1) $$ [d] Deriving standard normal distribution as the limit distribution of Student t-distribution: If $T_n \sim t(n)$, $$ T_n \ \overset{D}{\to} N(0,1) $$
Why Limit Distributions are Needed
From these asymptotic distributions, it’s clear that distribution convergence is insufficient to be called the convergence of the random variable itself. For example, in the binomial distribution, even if a sufficiently large distribution $n$ is given and it can be approximated to the normal distribution, the essence of the variable itself cannot mimic the normal distribution. No matter how large $n$ becomes, binomial distribution remains binomial, and normal distribution remains normal. However, since the distributions resemble each other, they might appear indistinguishable at a glance.
Nonetheless, the reason we consider distribution convergence is that being indistinguishable to that extent is sufficient, and sometimes, there’s no room left to compromise in conditions. As mentioned earlier, no matter how much it changes, a discrete probability distribution can never become a continuous probability distribution. Still, if we can immediately use a discrete probability distribution as a continuous one by introducing the concept of weak convergence, there’s no reason not to consider it.
Proofs
[1][4]
■
[2]
■
[3]
■
[a]
■
[b]
■
[c]
■
[d]
■