Inequalities of Martingales
Theorem
Let $\left\{ (X_{n} , \mathcal{F}_{n}) \right\}$ be a supermartingale.
- [1]: For all $\lambda > 0$, $$ \begin{align*} \lambda P \left( \max_{n \le N} X_{n} \ge \lambda \right) \le & E X_{1} - \int_{(\max_{n \le N} X_{n} < \lambda)} X_{N} dP \\ \le & E X_{1} + E X_{N}^{-} \text{ a.s.} \end{align*} $$
- [2]: For all $\lambda > 0$, $$ \begin{align*} \lambda P \left( \min_{n \le N} X_{n} \le - \lambda \right) \le & - \int_{(\min_{n \le N} X_{n} \le - \lambda)} X_{N} dP \\ \le & E X_{N}^{-} \text{ a.s.} \end{align*} $$
Let $\left\{ (X_{n} , \mathcal{F}_{n}) \right\}$ be a submartingale.
- [3]: For all $\lambda > 0$, $$ \begin{align*} \lambda P \left( \max_{n \le N} X_{n} \ge \lambda \right) \le & \int_{(\max_{n \le N} X_{n} \ge \lambda)} X_{N} dP \\ \le & E \left| X_{N} \right| \text{ a.s.} \end{align*} $$
- [4]: For all $\lambda > 0$, $$ \lambda P \left( \min_{n \le N} X_{n} \le - \lambda \right) \le E X_{n}^{+} - E X_{1} \text{ a.s.} $$
Let $\left\{ (X_{n} , \mathcal{F}_{n}) \right\}$ be a martingale.
- [5]: For all $\lambda > 0$, $$ P \left( \max_{n \le N} | X_{n} | \ge \lambda \right) \le {{1} \over {\lambda^{p}}} E |X_{N}|^{p} \text{ a.s.} $$
Explanation
The inequalities introduced here can be similarly used by dividing both sides by $\lambda$ to specify bounds of probability. Notably, since a martingale is both a supermartingale and a submartingale, all the above inequalities can be utilized.
- [3]: According to the Markov inequality, we already know $\displaystyle P(\max_{n \le N} X_{n} \ge \lambda ) \le {{ 1 } \over { \lambda }} E \left| \max_{n \le N} X_{n} \right|$, but with the condition being a submartingale - adding information, we can consider the reduction to bounds by just looking at $N$ as in $\displaystyle P(\max_{n \le N} X_{n} \ge \lambda ) \le {{ 1 } \over { \lambda }} E \left| X_{N} \right|$.
- [4]: Conditions of submartingale-supermartingale contrast with [1] and can be considered as a pair.
- [5]: This formula, similar to the generalized Markov inequality $\displaystyle P(|X|^{p} \ge C^{p}) \le {{ 1 } \over { C^{p} }} E | X |^{p}$, allows for reducing probability bounds as $p$ increases given $\lambda>1$, but one must remember that $X_{N}$ moments of $p$ must exist beforehand.
Proof
Strategy [1], [2]: Considering the bounded $N$, we start from the following theorem.
Optional Sampling Theorem: If $\left\{ ( X_{n} , \mathcal{F}_{n} ) \right\}$ is a supermartingale and $\tau$ and $\sigma$ are bounded stopping times such that $\sigma \le \tau$ and for $\mathcal{F}_{n}$, then $$ E \left( X_{\tau} | \mathcal{F}_{\sigma} \right) \le X_{\sigma} \text{ a.s.} $$
In the proof, the defined $T$ is the moment when $X_{n}$ exceeds $\lambda$ for the first time, considering abandoning the count if it exceeds $N$ and simply taking $N$. Understanding this is the crux, and this forms the basis for using a trick to split the integration domain.
[1]
Part 1. Definition of stopping time $T$
Define the random variable $ T:= \begin{cases} \inf \left\{ n \le N: X_{n} \ge \lambda \right\} &, n < N \\ N &, \text{otherwise} \end{cases}$. Now for all $k \in \mathbb{N}$, check if $(T = k) \in \mathcal{F}_{k}$ to demonstrate that $T$ is a stopping time. For $k = n < N$, $$ (T = n) = \left( X_{1} < \lambda , \cdots , X_{n-1} < \lambda, X_{n} \ge \lambda \right) \in \mathcal{F}_{n} $$ and for $k = N$, it is sufficient to check up to the $k = N-1$th to see if it is $X_{k} < \lambda$, so $$ (T = N) = \left( X_{1} < \lambda , \cdots , X_{n-1} < \lambda \right) \in \mathcal{F}_{N-1} \subset \mathcal{F}_{N} $$
Part 2.
According to the optional sampling theorem, almost surely, $E \left( X_{T} | \mathcal{F}_{1} \right) \le X_{1}$ hence $$ \begin{align*} \int_{\Omega} X_{1} dP \ge& \int_{\Omega} E \left( X_{T} | \mathcal{F}_{1} \right) dP \\ =& \int_{\Omega} X_{T} dP \\ =& \int_{(\max_{n \le N} X_{n} \ge \lambda)} X_{T} dP + \int_{(\max_{n \le N} X_{n} < \lambda)} X_{T} dP\text{ a.s.} \end{align*} $$ The last line of the above development divides the integration domain into $(\max_{n \le N} X_{n} \ge \lambda)$ and $(\max_{n \le N} X_{n} < \lambda)$. Then, by the definition of $T$, on $(\max_{n \le N} X_{n} \ge \lambda)$ it is $X_{T} \ge \lambda$ and on $(\max_{n \le N} X_{n} < \lambda)$ it is $X_{T} = X_{N}$, so $$ \begin{align*} \int_{\Omega} X_{1} dP \ge& \int_{(\max_{n \le N} X_{n} \ge \lambda)} X_{T} dP + \int_{(\max_{n \le N} X_{n} < \lambda)} X_{T} dP \\ =& \int_{(\max_{n \le N} X_{n} \ge \lambda)} \lambda dP + \int_{(\max_{n \le N} X_{n} < \lambda)} X_{N} dP \\ =& \lambda P \left( \max_{n \le N} X_{n} \ge \lambda \right) + \int_{(\max_{n \le N} X_{n} < \lambda)} X_{N} dP \text{ a.s.} \end{align*} $$
Part 3.
Simplifying the terms yields $$ \begin{align*} \lambda P \left( \max_{n \le N} X_{n} \ge \lambda \right) \le & & \int_{\Omega} X_{1} dP - \int_{(\max_{n \le N} X_{n} < \lambda)} X_{N} dP \\ =& E X_{1} - \int_{(\max_{n \le N} X_{n} < \lambda)} X_{N}^{+} dP + \int_{(\max_{n \le N} X_{n} < \lambda)} X_{N}^{-} dP \text{ a.s.} \end{align*} $$ Since $X_{N}^{+} \ge 0$, eliminate $\displaystyle - \int_{(\max_{n \le N} X_{n} < \lambda)} X_{N}^{+} dP$ and $$ \int_{(\max_{n \le N} X_{n} < \lambda)} X_{N}^{-} dP \le \int_{\Omega} X_{N}^{-} dP = E X_{N}^{-} $$ thus $$ \lambda P \left( \max_{n \le N} X_{n} \ge \lambda \right) \le E X_{1} + E X_{N}^{-} \text{ a.s.} $$
■
[2]
Part 1. Definition of stopping time $T$
Define the random variable $ T:= \begin{cases} \inf \left\{ n \le N: X_{n} \le - \lambda \right\} &, n < N \\ N &, \text{otherwise} \end{cases}$. Now for all $k \in \mathbb{N}$, check if $(T = k) \in \mathcal{F}_{k}$ to demonstrate that $T$ is a stopping time. For $k = n < N$, $$ (T = n) = \left( X_{1} > - \lambda , \cdots , X_{n-1} > - \lambda, X_{n} \le - \lambda \right) \in \mathcal{F}_{n} $$ and for $k = N$, it is sufficient to check up to the $k = N-1$th to see if it is $X_{k} > - \lambda$, so $$ (T = N) = \left( X_{1} > - \lambda , \cdots , X_{n-1} > - \lambda \right) \in \mathcal{F}_{N-1} \subset \mathcal{F}_{N} $$ and $T \le N$ is true.
Part 2.
Property of Conditional Expectation: For all sigma fields $\mathcal{G}$, $E \left[ E ( X | \mathcal{G} ) \right] = E(X)$
According to the optional sampling theorem, almost surely $E \left( X_{N} | \mathcal{F}_{T} \right) \le X_{n}$, and taking the expectation $E$ on both sides similar to Part 2. of proof [1], $$ \begin{align*} E X_{N} \le & E X_{T} \\ =& \int_{\Omega} X_{T} dP \\ =& \int_{(\min_{n \le N} X_{n} \le - \lambda )} X_{T} dP + \int_{(\min_{n \le N} X_{n} > - \lambda )} X_{T} dP \\ \le & - \lambda P \left( \min_{n \le N} X_{n} \le - \lambda \right)+ \int_{(\min_{n \le N} X_{n} > - \lambda )} X_{N} dP \end{align*} $$ But since $\displaystyle E X_{N} = \int_{(\min_{n \le N} X_{n} \le - \lambda )} X_{N} dP + \int_{(\min_{n \le N} X_{n} > - \lambda )} X_{N} dP$, $$ \begin{align*} & \int_{(\min_{n \le N} X_{n} \le - \lambda )} X_{N} dP + \int_{(\min_{n \le N} X_{n} > - \lambda )} X_{N} dP \\ \le & - \lambda P \left( \min_{n \le N} X_{n} \le - \lambda \right)+ \int_{(\min_{n \le N} X_{n} > - \lambda )} X_{N} dP \end{align*} $$ eliminating $\displaystyle \int_{(\min_{n \le N} X_{n} > - \lambda )} X_{N} dP$ from both sides yields $$ \int_{(\min_{n \le N} X_{n} \le - \lambda )} X_{N} dP \le - \lambda P \left( \min_{n \le N} X_{n} \le - \lambda \right) $$
Part 3.
Reversing the sign gives $$ \begin{align*} \lambda P \left( \min_{n \le N} X_{n} \le - \lambda \right) \le & - \int_{(\min_{n \le N} X_{n} \le - \lambda )} X_{N} dP \\ =& - \int_{(\min_{n \le N} X_{n} \le - \lambda )} X_{N}^{+} dP + \int_{(\min_{n \le N} X_{n} \le - \lambda )} X_{N}^{-} dP \end{align*} $$ Similarly to Part 3. of proof [1], since $X_{N}^{+} \ge 0$, eliminate $\displaystyle - \int_{(\min_{n \le N} X_{n} \le - \lambda )} X_{N}^{+} dP$ and $$ \int_{(\min_{n \le N} X_{n} \le - \lambda )} X_{N}^{-} dP \le \int_{\Omega} X_{N}^{-} dP = E X_{N}^{-} $$ thus $$ \lambda P \left( \min_{n \le N} X_{n} \le - \lambda \right) \le E X_{N}^{-} \text{ a.s.} $$
■
[3]
$$ \begin{align*} \lambda P \left( \max_{n \le N} X_{n} \ge \lambda \right) =& \lambda P \left( - \max_{n \le N} X_{n} \le - \lambda \right) \\ =& \lambda P \left( \min_{n \le N} (-X_{n}) \le - \lambda \right) \end{align*} $$ Since $\left\{ \left( -X_{n} , \mathcal{F}_{n} \right) \right\}$ is a supermartingale, according to [2] $$ \begin{align*} \lambda P \left( \min_{n \le N} (-X_{n}) \le - \lambda \right) \le & - \int_{(\min_{n \le N} X_{n} \le - \lambda)} (-X_{N}) dP \\ \le & \int_{(\max_{n \le N} X_{n} \ge \lambda)} X_{N} dP \\ \le & \int_{(X_{N} \ge 0)} X_{N} dP \\ \le & E X_{N}^{+} \\ \le & E \left| X_{N} \right| \text{ a.s.} \end{align*} $$
■
[4]
Since $\left\{ \left( -X_{n} , \mathcal{F}_{n} \right) \right\}$ is a supermartingale, according to [1] $$ \begin{align*} \lambda P \left( \min_{n \le N} X_{n} \le - \lambda \right) =& \lambda P \left( \max_{n \le N} \left( -X_{n} \right) \ge \lambda \right) \\ \le & E \left( -X_{1} \right) + E \left( -X_{N} \right)^{-} \\ =& E \left( -X_{N} \right)^{-} - E X_{1} \\ =& E \left( -X_{N}^{+} + X_{N}^{-} \right)^{-} - E X_{1} \\ =& E \left( X_{N} \right)^{+} - E X_{1} \text{ a.s.} \end{align*} $$
■
[5]
According to the conditional Jensen’s inequality corollary, since $\left\{ \left( |X_{n}|^{p} , \mathcal{F}_{n} \right) \right\}$ is a submartingale, according to [3] $$ \begin{align*} & \lambda^{p} P \left( \max_{n \le N} | X_{n} |^{p} \ge \lambda^{p} \right) \le E \left| X_{N} \right|^{p} \text{ a.s.} \\ \iff & P \left( \max_{n \le N} | X_{n} |^{p} \ge \lambda^{p} \right) \le {{ 1 } \over { \lambda^{p} }} E \left| X_{N} \right|^{p} \text{ a.s.} \\ \iff & P \left( \max_{n \le N} | X_{n} | \ge \lambda \right) \le {{ 1 } \over { \lambda^{p} }} E \left| X_{N} \right|^{p} \text{ a.s.} \end{align*} $$
■