Summary of Measure Theory and Probability Theory 📂Probability Theory

Summary of Measure Theory and Probability Theory

Overview

This is a summary of definitions and concepts for those who have already studied measure theory and probability. It is intended to be viewed when definitions are confusing or unrecognizable, and when a general review is needed.

Measure Theory

Algebras

An algebra of sets on nonempty set $X$ is a nonempty collection $\mathcal{A}$ of subsets of $X$ is colsed under finite unions ans complements.

$\sigma$ -algebra is an algebra that is closed under countable unions.

Note:

$\mathcal{A}$ is also closed under intersections, because $E_{1} \cap E_{2} = \left( E_{1} \cup E_{2} \right)^{c} \in \mathcal{A}$ for $E_{1}, E_{2} \in \mathcal{A}$ .
$\varnothing$ , $X$ $\in \mathcal{A}$ , since if $E \in \mathcal{A}$ we have $\varnothing = E \cap E^{c} \in \mathcal{A}$ and $X = E \cup E^{c} \in \mathcal{A}$ .

If $X$ any topological space, the $\sigma$ -algebra generated by the family of open sets in $X$ is called the Borel $\sigma$ -algebra on $X$ and is denoted by $\mathcal{B}_{X}$ .

Borel $\sigma$ -algebra is unique smallest $\sigma$ -algebra containing all open sets.

Let $\mathcal{E}$ be a $\sigma$ -algebra on $X$ , then $(X, \mathcal{E})$ is called a measurable space and $E \in \mathcal{E}$ is called ( $\mathcal{E}$ -)measurable set.

In the following, we shall consider a fixed measurable space $(X, \mathcal{E})$ .

Measurable Functions

A function $f : X \to \mathbb{R}$ is said to be ( $\mathcal{E}$ -)measurable, if for every real number $\alpha \in \mathbb{R}$ the set $\left\{ x \in X : f(x) \gt \alpha \right\}$ belongs to $\mathcal{E}$ .

Generalization

Let $(X, \mathcal{E})$ and $(Y, \mathcal{F})$ be a measurable spaces. A function $f : X \to Y$ is called $(\mathcal{E}, \mathcal{F})$ -measurable, if $f^{-1}(F) = \left\{ x \in X : f(x) \in F \right\}$ belongs to $\mathcal{E}$ for all $F \in \mathcal{F}$ .

Note: A $\mathcal{E}$ -measurable function is equivalent to this definition in the case $(Y, \mathcal{F}) = (\mathbb{R}, \mathcal{B}_{\mathbb{R}})$ .

Measures

A measure on $\mathcal{E}$ (or on $(X, \mathcal{E})$ , or simply on $X$ if $\mathcal{E}$ is understood) is a function $\mu : \mathcal{E} \to [0, \infty]$ such that

Null empty set: $\mu (\varnothing) = 0$ .
Countable additivity: If $\left\{ E_{j} \right\}$ is a sequence of disjoint sets in $\mathcal{E}$ , then $\displaystyle \mu \left( \bigcup\limits_{j} E_{j} \right) = \sum\limits_{j} \mu (E_{j})$ .

A triple $(X, \mathcal{E}, \mu)$ is called a measure space and we shall be working on a fixed measure space $(X, \mathcal{E}, \mu)$ .

A Borel measure on $\mathbb{R}$ is a measure whose domain is the Borel $\sigma$ -algebra $\mathcal{B}_{\mathbb{R}}$ : $\mu : \mathcal{B}_{\mathbb{R}} \to [0, \infty]$

For two measures $\mu$ and $\nu$ on each $(X, \mathcal{E})$ and $(Y, \mathcal{F})$ , measure $\mu \times \nu$ is the product of $\mu$ and $\nu$ which is the unique measure on $\mathcal{E} \times \mathcal{F}$ such that $\mu \times \nu (E \times F) = \mu (E) \nu (F)$ for all rectangles $E \times F$ .

The Integral

A real-valued function $f$ is simple if it has only a finite number of values.

A simple measurable function $\varphi$ can be represented in the form $\begin{equation} \varphi = \sum\limits_{j=1}^{n} a_{j}\chi_{E_{j}}, \text{ where } E_{j} = \varphi^{-1}(\left\{ a_{j} \right\}) \text{ and } \operatorname{range} (\varphi) = \left\{ a_{1}, \dots, a_{n} \right\}. \end{equation}$ where $\chi_{E_{j}}$ is the characteristic function of $E_{j}$ . We call this standard representation of $\varphi$ .

If $\varphi$ simple measurable function with standard representation $(1)$ , we define the integral of $\varphi$ with respect to measure $\mu$ by $\int \varphi d\mu := \sum\limits_{j=1}^{n} a_{j}\mu (E_{j}).$ Notation: $\int \varphi d\mu = \int \varphi = \int \varphi(x) d\mu (x), \qquad \int = \int_{X}.$

If $f$ is measurable function on $(X, \mathcal{E})$ , we define the integral of $f$ with respect to $\mu$ by $\int f d\mu := \sup \left\{ \int \varphi d\mu : 0 \le \varphi \le f, \varphi \text{ is simple and measurable} \right\}.$

The positive and negative parts of $f : X \to \mathbb{R}$ are defined repectively as $f^{+}(x) := \max \left( f(x), 0 \right)),\qquad f^{-1}(x) := \min \left(-f(x), 0 \right)).$ If $\displaystyle \int f^{+}$ and $\displaystyle \int f^{-}$ are both finite, then we say that $f$ is integrable. Also $\left| f \right| = f^{+} - f^{-}$ .

The set of real-valued integrable functions is a vector space and the integral is a linear functional on it. This vector space is denoted as: $L = L(X, \mathcal{E}, \mu) = L(X, \mu) = L(X) = L(\mu), \qquad L = L^{1}$

$L^{p}$ space

For measure space $(X, \mathcal{E}, \mu)$ and $0 \lt p \lt \infty$ , we define $L^{p}(X, \mathcal{E}, \mu) := \left\{ f : X \to \mathbb{R} \left| f \text{ is measurable and } \left( \int \left| f \right|^{p} d\mu \right)^{1/p} \lt \infty \right. \right\}.$ ${} \\ {} \\ {} \\$

Probability Theory

Notation and Terminology

$\begin{array}{lll} \text{Analysts’ Term} && \text{Probabilists’ Term} \\ \hline \text{Measure space } (X, \mathcal{E}, \mu) \text{ such that } \mu (X) = 1 && \text{Probability space } (\Omega, \mathcal{F}, P) \\ \text{Measure } \mu : \mathcal{E} \to \mathbb{R} \text{ such that } \mu (X) = 1 && \text{Probability } P : \mathcal{F} \to \mathbb{R} \\ (\sigma\text{-)algebra$

$\begin{align*} \left\{ X \gt a \right\} &:= \left\{ w : X(w) \gt a \right\} \\ P\left( X \gt a \right) &:= P\left( \left\{ w : X(w) \gt a \right\} \right) \end{align*}$

Basic Definitions

For measurable spacse $(\Omega, \mathcal{F})$ and $(\mathbb{R}, \mathcal{B}_{\mathbb{R}})$ , $(\mathcal{F}, \mathcal{B}_{\mathbb{R}})$ -mearsuable function $X : \Omega \to \mathbb{R}$ is called random variable. Namely, $X^{-1}(B) \in \mathcal{F}\qquad \forall B \in \mathcal{B}_{\mathbb{R}}.$

A probability (or probability measure) on $(\Omega, \mathcal{F})$ is measure $P : \mathcal{F} \to \mathbb{R}$ such that $P(\Omega) = 1$ .

If $X$ is a random variable,

expectation: $\displaystyle E(X) := \int X dP$
variance: $\sigma^{2}(X) := E\left[ (X - E(X))^{2} \right] = E(X^{2}) - E(X)^{2}$

The (probability) distribution of $X$ is a probability on $\mathbb{R}$ , $P_{X} : \mathcal{B}_{\mathbb{R}} \to \mathbb{R}$ such that $P_{X}(B) := P(X^{-1}(B)).$

The distribution fuction of $X$ is defined as $F_{X}(a) := P_{X}\left( (-\infty, a] \right) = P(X \le a).$

For any finite sequence of random variables $\left\{ X_{i} \right\}_{i=1}^{n}$ , random vector $(X_{1}, \dots, X_{n})$ is defined as a map from $\Omega \to \mathbb{R}^{n}$ : $(X_{1}, \dots, X_{n})(x) := (X_{1}(x), \dots, X_{n}(x)).$ Note: $(X_{1}, \dots, X_{n})^{-1}(B_{1} \times \cdots \times B_{n})= X_{1}^{-1}(B_{1}) \cap \cdots \cap X_{n}^{-1}(B_{n})$ .

For $(X, Y) : \Omega \to \mathbb{R}^{2}$ , $(X, Y)^{-1} (a, b) = \left\{ x \in \Omega : X(x) = a \right\} \cap \left\{ x \in \Omega : Y(x) = b \right\}.$ Thus, for all Borel sets $B_{1}$ and $B_{2} \in \mathcal{B}_{\mathbb{R}}$ we have $(X, Y)^{-1}(B_{1} \times B_{2}) = (X, Y)^{-1}(B_{1}, B_{2}) = X^{-1}(B_{1}) \cap Y^{-1}(B_{2})$ and extending to $\mathbb{R}^{n}$ we obtain $\begin{equation} \begin{aligned} (X_{1}, \dots, X_{n})^{-1}(B_{1} \times \cdots \times B_{n}) &= (X_{1}, \dots, X_{n})^{-1}(B_{1}, \dots, B_{n}) \\ &= X_{1}^{-1}(B_{1}) \cap \cdots \cap X_{n}^{-1}(B_{n}). \end{aligned} \end{equation}$

The joint distribution of $X_{1}, \dots, X_{n}$ is a probability distribution of $(X_{1}, \dots, X_{n})$ : $P_{(X_{1}, \dots, X_{n})} : \mathcal{B}_{\mathbb{R}^{n}} \to \mathbb{R},$ $P_{(X_{1}, \dots, X_{n})}(B_{1} \times \cdots \times B_{n}) := P\left( (X_{1}, \dots, X_{n})^{-1}(B_{1} \times \cdots \times B_{n}) \right).$

Independency

For an event $E$ such that $P(E) \gt 0$ , a probability on $\Omega$ $P_{E}(F) = P(E|F) := P(E \cap F)/P(E)$ is called conditional probability on $E$ .

If $P_{E}(F) = P(F)$ , then $F$ is said to be independent of $E$ : $\text{$ A collection $\left\{ E_{j} \right\}$ of events in $\Omega$ is indepencent if $P(E_{1} \cap \cdots \cap E_{n}) = P(E_{1}) P(E_{2}) \cdots P(E_{n}) = \prod \limits_{i=1}^{n} P(E_{j})$

A collection $\left\{ X_{j} \right\}$ of random variables on $\Omega$ is independent if the events $\left\{ X_{j}^{-1}(B_{j}) \right\}$ are independent for all Borel sets $B_{j} \in \mathcal{B}_{\mathbb{R}}$ , namely $P\left(X_{1}^{-1}(B_{1}) \cap \cdots \cap X_{n}^{-1}(B_{n})\right) = \prod \limits_{j=1}^{n} P(X_{j}^{-1}(B_{j})).$

We have from LHS by definition of distribution and $(2)$ $\begin{align*} P\left(X_{1}^{-1}(B_{1}) \cap \cdots \cap X_{n}^{-1}(B_{n})\right) &= P\left( (X_{1}, \dots, X_{n})^{-1}(B_{1} \times \cdots \times B_{n}) \right) \\ &= P_{(X_{1}, \dots, X_{n})} \left( B_{1} \times \cdots \times B_{n} \right). \end{align*}$ By the way, we have from RHS by definition of product measure and distribution $\prod \limits_{j=1}^{n} P(X_{j}^{-1}(B_{j})) = \prod \limits_{j=1}^{n} P_{X_{j}}(B_{j}) = \left( \prod \limits_{j=1}^{n} P_{X_{j}} \right) \left( B_{1} \times \cdots \times B_{n} \right).$ Therefore, if $\left\{ X_{j} \right\}$ are independent, then $P_{(X_{1}, \dots, X_{n})} = \prod\limits_{j=1}^{n}P_{X_{j}}.$

$\left\{ X_{j} \right\}$ is an independent set of random variables if and only if the joint distribution of $\left\{ X_{j} \right\}$ is the product of their individual distributions.

References

Robert G. Bartle, The Elements of Integration and Lebesgue Measure (1995)
Gerald B. Folland, Real Analysis: Modern Techniques and Their Applications (1999)