Summary of Measure Theory and Probability Theory
Overview
This is a summary of definitions and concepts for those who have already studied measure theory and probability. It is intended to be viewed when definitions are confusing or unrecognizable, and when a general review is needed.
Measure Theory
Algebras
An algebra of sets on nonempty set $X$ is a nonempty collection $\mathcal{A}$ of subsets of $X$ is colsed under finite unions ans complements.
$\sigma$-algebra is an algebra that is closed under countable unions.
Note:
- $\mathcal{A}$ is also closed under intersections, because $E_{1} \cap E_{2} = \left( E_{1} \cup E_{2} \right)^{c} \in \mathcal{A}$ for $E_{1}, E_{2} \in \mathcal{A}$.
- $\varnothing$, $X$ $\in \mathcal{A}$, since if $E \in \mathcal{A}$ we have $\varnothing = E \cap E^{c} \in \mathcal{A}$ and $X = E \cup E^{c} \in \mathcal{A}$.
If $X$ any topological space, the $\sigma$-algebra generated by the family of open sets in $X$ is called the Borel $\sigma$-algebra on $X$ and is denoted by $\mathcal{B}_{X}$.
- Borel $\sigma$-algebra is unique smallest $\sigma$-algebra containing all open sets.
Let $\mathcal{E}$ be a $\sigma$-algebra on $X$, then $(X, \mathcal{E})$ is called a measurable space and $E \in \mathcal{E}$ is called ($\mathcal{E}$-)measurable set.
In the following, we shall consider a fixed measurable space $(X, \mathcal{E})$.
Measurable Functions
A function $f : X \to \mathbb{R}$ is said to be ($\mathcal{E}$-)measurable, if for every real number $\alpha \in \mathbb{R}$ the set $\left\{ x \in X : f(x) \gt \alpha \right\}$ belongs to $\mathcal{E}$.
Generalization
Let $(X, \mathcal{E})$ and $(Y, \mathcal{F})$ be a measurable spaces. A function $f : X \to Y$ is called $(\mathcal{E}, \mathcal{F})$-measurable, if $f^{-1}(F) = \left\{ x \in X : f(x) \in F \right\}$ belongs to $\mathcal{E}$ for all $F \in \mathcal{F}$.
Note: A $\mathcal{E}$-measurable function is equivalent to this definition in the case $(Y, \mathcal{F}) = (\mathbb{R}, \mathcal{B}_{\mathbb{R}})$.
Measures
A measure on $\mathcal{E}$ (or on $(X, \mathcal{E})$, or simply on $X$ if $\mathcal{E}$ is understood) is a function $\mu : \mathcal{E} \to [0, \infty]$ such that
- Null empty set: $\mu (\varnothing) = 0$.
- Countable additivity: If $\left\{ E_{j} \right\}$ is a sequence of disjoint sets in $\mathcal{E}$, then $\displaystyle \mu \left( \bigcup\limits_{j} E_{j} \right) = \sum\limits_{j} \mu (E_{j})$.
A triple $(X, \mathcal{E}, \mu)$ is called a measure space and we shall be working on a fixed measure space $(X, \mathcal{E}, \mu)$.
A Borel measure on $\mathbb{R}$ is a measure whose domain is the Borel $\sigma$-algebra $\mathcal{B}_{\mathbb{R}}$: $$ \mu : \mathcal{B}_{\mathbb{R}} \to [0, \infty] $$
For two measures $\mu$ and $\nu$ on each $(X, \mathcal{E})$ and $(Y, \mathcal{F})$, measure $\mu \times \nu$ is the product of $\mu$ and $\nu$ which is the unique measure on $\mathcal{E} \times \mathcal{F}$ such that $\mu \times \nu (E \times F) = \mu (E) \nu (F)$ for all rectangles $E \times F$.
The Integral
A real-valued function $f$ is simple if it has only a finite number of values.
A simple measurable function $\varphi$ can be represented in the form $$ \begin{equation} \varphi = \sum\limits_{j=1}^{n} a_{j}\chi_{E_{j}}, \text{ where } E_{j} = \varphi^{-1}(\left\{ a_{j} \right\}) \text{ and } \operatorname{range} (\varphi) = \left\{ a_{1}, \dots, a_{n} \right\}. \end{equation} $$ where $\chi_{E_{j}}$ is the characteristic function of $E_{j}$. We call this standard representation of $\varphi$.
If $\varphi$ simple measurable function with standard representation $(1)$, we define the integral of $\varphi$ with respect to measure $\mu$ by $$ \int \varphi d\mu := \sum\limits_{j=1}^{n} a_{j}\mu (E_{j}). $$ Notation: $$ \int \varphi d\mu = \int \varphi = \int \varphi(x) d\mu (x), \qquad \int = \int_{X}. $$
If $f$ is measurable function on $(X, \mathcal{E})$, we define the integral of $f$ with respect to $\mu$ by $$ \int f d\mu := \sup \left\{ \int \varphi d\mu : 0 \le \varphi \le f, \varphi \text{ is simple and measurable} \right\}. $$
The positive and negative parts of $f : X \to \mathbb{R}$ are defined repectively as $$ f^{+}(x) := \max \left( f(x), 0 \right)),\qquad f^{-1}(x) := \min \left(-f(x), 0 \right)). $$ If $\displaystyle \int f^{+}$ and $\displaystyle \int f^{-}$ are both finite, then we say that $f$ is integrable. Also $\left| f \right| = f^{+} - f^{-}$.
The set of real-valued integrable functions is a vector space and the integral is a linear functional on it. This vector space is denoted as: $$ L = L(X, \mathcal{E}, \mu) = L(X, \mu) = L(X) = L(\mu), \qquad L = L^{1} $$
$L^{p}$ space
For measure space $(X, \mathcal{E}, \mu)$ and $0 \lt p \lt \infty$, we define $$ L^{p}(X, \mathcal{E}, \mu) := \left\{ f : X \to \mathbb{R} \left| f \text{ is measurable and } \left( \int \left| f \right|^{p} d\mu \right)^{1/p} \lt \infty \right. \right\}. $$ $$ {} \\ {} \\ {} \\ $$
Probability Theory
Notation and Terminology
$$ \begin{array}{lll} \text{Analysts’ Term} && \text{Probabilists’ Term} \\ \hline \text{Measure space } (X, \mathcal{E}, \mu) \text{ such that } \mu (X) = 1 && \text{Probability space } (\Omega, \mathcal{F}, P) \\ \text{Measure } \mu : \mathcal{E} \to \mathbb{R} \text{ such that } \mu (X) = 1 && \text{Probability } P : \mathcal{F} \to \mathbb{R} \\ (\sigma\text{-)algebra $\mathcal{E}$ on $X$} && (\sigma\text{-)field $\mathcal{F}$ on $\Omega$} \\ \text{Mesurable set } E \in \mathcal{E} && \text{Event } E \in \mathcal{F} \\ \text{Measurable real-valued function } f : X \to \mathbb{R} && \text{Random variable } X : \Omega \to \mathbb{R} \\ \text{Integral of } f, {\displaystyle \int f d\mu} && \text{Expextation of } f, E(X) \\ f \text{ is } L^{p} && X \text{ has finite $p$th moment} \\ \text{Almost everywhere, a.e.} && \text{Almost surely, a.s.} \end{array} $$
$$ \begin{align*} \left\{ X \gt a \right\} &:= \left\{ w : X(w) \gt a \right\} \\ P\left( X \gt a \right) &:= P\left( \left\{ w : X(w) \gt a \right\} \right) \end{align*} $$
Basic Definitions
For measurable spacse $(\Omega, \mathcal{F})$ and $(\mathbb{R}, \mathcal{B}_{\mathbb{R}})$, $(\mathcal{F}, \mathcal{B}_{\mathbb{R}})$-mearsuable function $X : \Omega \to \mathbb{R}$ is called random variable. Namely, $$ X^{-1}(B) \in \mathcal{F}\qquad \forall B \in \mathcal{B}_{\mathbb{R}}. $$
A probability (or probability measure) on $(\Omega, \mathcal{F})$ is measure $P : \mathcal{F} \to \mathbb{R}$ such that $P(\Omega) = 1$.
If $X$ is a random variable,
- expectation: $\displaystyle E(X) := \int X dP$
- variance: $\sigma^{2}(X) := E\left[ (X - E(X))^{2} \right] = E(X^{2}) - E(X)^{2}$
The (probability) distribution of $X$ is a probability on $\mathbb{R}$, $P_{X} : \mathcal{B}_{\mathbb{R}} \to \mathbb{R}$ such that $$ P_{X}(B) := P(X^{-1}(B)). $$
The distribution fuction of $X$ is defined as $$ F_{X}(a) := P_{X}\left( (-\infty, a] \right) = P(X \le a). $$
For any finite sequence of random variables $\left\{ X_{i} \right\}_{i=1}^{n}$, random vector $(X_{1}, \dots, X_{n})$ is defined as a map from $\Omega \to \mathbb{R}^{n}$: $$ (X_{1}, \dots, X_{n})(x) := (X_{1}(x), \dots, X_{n}(x)). $$ Note: $(X_{1}, \dots, X_{n})^{-1}(B_{1} \times \cdots \times B_{n})= X_{1}^{-1}(B_{1}) \cap \cdots \cap X_{n}^{-1}(B_{n})$.
For $(X, Y) : \Omega \to \mathbb{R}^{2}$, $$ (X, Y)^{-1} (a, b) = \left\{ x \in \Omega : X(x) = a \right\} \cap \left\{ x \in \Omega : Y(x) = b \right\}. $$ Thus, for all Borel sets $B_{1}$ and $B_{2} \in \mathcal{B}_{\mathbb{R}}$ we have $$ (X, Y)^{-1}(B_{1} \times B_{2}) = (X, Y)^{-1}(B_{1}, B_{2}) = X^{-1}(B_{1}) \cap Y^{-1}(B_{2}) $$ and extending to $\mathbb{R}^{n}$ we obtain $$ \begin{equation} \begin{aligned} (X_{1}, \dots, X_{n})^{-1}(B_{1} \times \cdots \times B_{n}) &= (X_{1}, \dots, X_{n})^{-1}(B_{1}, \dots, B_{n}) \\ &= X_{1}^{-1}(B_{1}) \cap \cdots \cap X_{n}^{-1}(B_{n}). \end{aligned} \end{equation} $$
The joint distribution of $X_{1}, \dots, X_{n}$ is a probability distribution of $(X_{1}, \dots, X_{n})$: $$ P_{(X_{1}, \dots, X_{n})} : \mathcal{B}_{\mathbb{R}^{n}} \to \mathbb{R}, $$ $$ P_{(X_{1}, \dots, X_{n})}(B_{1} \times \cdots \times B_{n}) := P\left( (X_{1}, \dots, X_{n})^{-1}(B_{1} \times \cdots \times B_{n}) \right). $$
Independency
For an event $E$ such that $P(E) \gt 0$, a probability on $\Omega$ $$ P_{E}(F) = P(E|F) := P(E \cap F)/P(E) $$ is called conditional probability on $E$.
If $P_{E}(F) = P(F)$, then $F$ is said to be independent of $E$: $$ \text{$F$ is independent of $E$} \iff P(E \cap F) = P(E)P(F). $$ A collection $\left\{ E_{j} \right\}$ of events in $\Omega$ is indepencent if $$ P(E_{1} \cap \cdots \cap E_{n}) = P(E_{1}) P(E_{2}) \cdots P(E_{n}) = \prod \limits_{i=1}^{n} P(E_{j}) $$
A collection $\left\{ X_{j} \right\}$ of random variables on $\Omega$ is independent if the events $\left\{ X_{j}^{-1}(B_{j}) \right\}$ are independent for all Borel sets $B_{j} \in \mathcal{B}_{\mathbb{R}}$, namely $$ P\left(X_{1}^{-1}(B_{1}) \cap \cdots \cap X_{n}^{-1}(B_{n})\right) = \prod \limits_{j=1}^{n} P(X_{j}^{-1}(B_{j})). $$
We have from LHS by definition of distribution and $(2)$ $$ \begin{align*} P\left(X_{1}^{-1}(B_{1}) \cap \cdots \cap X_{n}^{-1}(B_{n})\right) &= P\left( (X_{1}, \dots, X_{n})^{-1}(B_{1} \times \cdots \times B_{n}) \right) \\ &= P_{(X_{1}, \dots, X_{n})} \left( B_{1} \times \cdots \times B_{n} \right). \end{align*} $$ By the way, we have from RHS by definition of product measure and distribution $$ \prod \limits_{j=1}^{n} P(X_{j}^{-1}(B_{j})) = \prod \limits_{j=1}^{n} P_{X_{j}}(B_{j}) = \left( \prod \limits_{j=1}^{n} P_{X_{j}} \right) \left( B_{1} \times \cdots \times B_{n} \right). $$ Therefore, if $\left\{ X_{j} \right\}$ are independent, then $$ P_{(X_{1}, \dots, X_{n})} = \prod\limits_{j=1}^{n}P_{X_{j}}. $$
$\left\{ X_{j} \right\}$ is an independent set of random variables if and only if the joint distribution of $\left\{ X_{j} \right\}$ is the product of their individual distributions.
References
- Robert G. Bartle, The Elements of Integration and Lebesgue Measure (1995)
- Gerald B. Folland, Real Analysis: Modern Techniques and Their Applications (1999)