logo

Summary of Measure Theory and Probability Theory 📂Probability Theory

Summary of Measure Theory and Probability Theory

Overview

This is a summary of definitions and concepts for those who have already studied measure theory and probability. It is intended to be viewed when definitions are confusing or unrecognizable, and when a general review is needed.

Measure Theory

Algebras

An algebra of sets on nonempty set XX is a nonempty collection A\mathcal{A} of subsets of XX is colsed under finite unions ans complements.

σ\sigma-algebra is an algebra that is closed under countable unions.

Note:

  • A\mathcal{A} is also closed under intersections, because E1E2=(E1E2)cAE_{1} \cap E_{2} = \left( E_{1} \cup E_{2} \right)^{c} \in \mathcal{A} for E1,E2AE_{1}, E_{2} \in \mathcal{A}.
  • \varnothing, XX A\in \mathcal{A}, since if EAE \in \mathcal{A} we have =EEcA\varnothing = E \cap E^{c} \in \mathcal{A} and X=EEcAX = E \cup E^{c} \in \mathcal{A}.

If XX any topological space, the σ\sigma-algebra generated by the family of open sets in XX is called the Borel σ\sigma-algebra on XX and is denoted by BX\mathcal{B}_{X}.

  • Borel σ\sigma-algebra is unique smallest σ\sigma-algebra containing all open sets.

Let E\mathcal{E} be a σ\sigma-algebra on XX, then (X,E)(X, \mathcal{E}) is called a measurable space and EEE \in \mathcal{E} is called (E\mathcal{E}-)measurable set.

In the following, we shall consider a fixed measurable space (X,E)(X, \mathcal{E}).

Measurable Functions

A function f:XRf : X \to \mathbb{R} is said to be (E\mathcal{E}-)measurable, if for every real number αR\alpha \in \mathbb{R} the set {xX:f(x)>α}\left\{ x \in X : f(x) \gt \alpha \right\} belongs to E\mathcal{E}.

Generalization

Let (X,E)(X, \mathcal{E}) and (Y,F)(Y, \mathcal{F}) be a measurable spaces. A function f:XYf : X \to Y is called (E,F)(\mathcal{E}, \mathcal{F})-measurable, if f1(F)={xX:f(x)F}f^{-1}(F) = \left\{ x \in X : f(x) \in F \right\} belongs to E\mathcal{E} for all FFF \in \mathcal{F}.

Note: A E\mathcal{E}-measurable function is equivalent to this definition in the case (Y,F)=(R,BR)(Y, \mathcal{F}) = (\mathbb{R}, \mathcal{B}_{\mathbb{R}}).

Measures

A measure on E\mathcal{E} (or on (X,E)(X, \mathcal{E}), or simply on XX if E\mathcal{E} is understood) is a function μ:E[0,]\mu : \mathcal{E} \to [0, \infty] such that

  1. Null empty set: μ()=0\mu (\varnothing) = 0.
  2. Countable additivity: If {Ej}\left\{ E_{j} \right\} is a sequence of disjoint sets in E\mathcal{E}, then μ(jEj)=jμ(Ej)\displaystyle \mu \left( \bigcup\limits_{j} E_{j} \right) = \sum\limits_{j} \mu (E_{j}).

A triple (X,E,μ)(X, \mathcal{E}, \mu) is called a measure space and we shall be working on a fixed measure space (X,E,μ)(X, \mathcal{E}, \mu).

A Borel measure on R\mathbb{R} is a measure whose domain is the Borel σ\sigma-algebra BR\mathcal{B}_{\mathbb{R}}: μ:BR[0,] \mu : \mathcal{B}_{\mathbb{R}} \to [0, \infty]

For two measures μ\mu and ν\nu on each (X,E)(X, \mathcal{E}) and (Y,F)(Y, \mathcal{F}), measure μ×ν\mu \times \nu is the product of μ\mu and ν\nu which is the unique measure on E×F\mathcal{E} \times \mathcal{F} such that μ×ν(E×F)=μ(E)ν(F)\mu \times \nu (E \times F) = \mu (E) \nu (F) for all rectangles E×FE \times F.

The Integral

A real-valued function ff is simple if it has only a finite number of values.

A simple measurable function φ\varphi can be represented in the form φ=j=1najχEj, where Ej=φ1({aj}) and range(φ)={a1,,an}. \begin{equation} \varphi = \sum\limits_{j=1}^{n} a_{j}\chi_{E_{j}}, \text{ where } E_{j} = \varphi^{-1}(\left\{ a_{j} \right\}) \text{ and } \operatorname{range} (\varphi) = \left\{ a_{1}, \dots, a_{n} \right\}. \end{equation} where χEj\chi_{E_{j}} is the characteristic function of EjE_{j}. We call this standard representation of φ\varphi.

If φ\varphi simple measurable function with standard representation (1)(1), we define the integral of φ\varphi with respect to measure μ\mu by φdμ:=j=1najμ(Ej). \int \varphi d\mu := \sum\limits_{j=1}^{n} a_{j}\mu (E_{j}). Notation: φdμ=φ=φ(x)dμ(x),=X. \int \varphi d\mu = \int \varphi = \int \varphi(x) d\mu (x), \qquad \int = \int_{X}.

If ff is measurable function on (X,E)(X, \mathcal{E}), we define the integral of ff with respect to μ\mu by fdμ:=sup{φdμ:0φf,φ is simple and measurable}. \int f d\mu := \sup \left\{ \int \varphi d\mu : 0 \le \varphi \le f, \varphi \text{ is simple and measurable} \right\}.

The positive and negative parts of f:XRf : X \to \mathbb{R} are defined repectively as f+(x):=max(f(x),0)),f1(x):=min(f(x),0)). f^{+}(x) := \max \left( f(x), 0 \right)),\qquad f^{-1}(x) := \min \left(-f(x), 0 \right)). If f+\displaystyle \int f^{+} and f\displaystyle \int f^{-} are both finite, then we say that ff is integrable. Also f=f+f\left| f \right| = f^{+} - f^{-}.

The set of real-valued integrable functions is a vector space and the integral is a linear functional on it. This vector space is denoted as: L=L(X,E,μ)=L(X,μ)=L(X)=L(μ),L=L1 L = L(X, \mathcal{E}, \mu) = L(X, \mu) = L(X) = L(\mu), \qquad L = L^{1}

LpL^{p} space

For measure space (X,E,μ)(X, \mathcal{E}, \mu) and 0<p<0 \lt p \lt \infty, we define Lp(X,E,μ):={f:XRf is measurable and (fpdμ)1/p<}. L^{p}(X, \mathcal{E}, \mu) := \left\{ f : X \to \mathbb{R} \left| f \text{ is measurable and } \left( \int \left| f \right|^{p} d\mu \right)^{1/p} \lt \infty \right. \right\}. {} \\ {} \\ {} \\

Probability Theory

Notation and Terminology

Analysts’ TermProbabilists’ TermMeasure space (X,E,μ) such that μ(X)=1Probability space (Ω,F,P)Measure μ:ER such that μ(X)=1Probability P:FR(σ-)algebra E on X(σ-)field F on ΩMesurable set EEEvent EFMeasurable real-valued function f:XRRandom variable X:ΩRIntegral of f,fdμExpextation of f,E(X)f is LpX has finite pth momentAlmost everywhere, a.e.Almost surely, a.s. \begin{array}{lll} \text{Analysts’ Term} && \text{Probabilists’ Term} \\ \hline \text{Measure space } (X, \mathcal{E}, \mu) \text{ such that } \mu (X) = 1 && \text{Probability space } (\Omega, \mathcal{F}, P) \\ \text{Measure } \mu : \mathcal{E} \to \mathbb{R} \text{ such that } \mu (X) = 1 && \text{Probability } P : \mathcal{F} \to \mathbb{R} \\ (\sigma\text{-)algebra E\mathcal{E} on XX} && (\sigma\text{-)field F\mathcal{F} on Ω\Omega} \\ \text{Mesurable set } E \in \mathcal{E} && \text{Event } E \in \mathcal{F} \\ \text{Measurable real-valued function } f : X \to \mathbb{R} && \text{Random variable } X : \Omega \to \mathbb{R} \\ \text{Integral of } f, {\displaystyle \int f d\mu} && \text{Expextation of } f, E(X) \\ f \text{ is } L^{p} && X \text{ has finite ppth moment} \\ \text{Almost everywhere, a.e.} && \text{Almost surely, a.s.} \end{array}

{X>a}:={w:X(w)>a}P(X>a):=P({w:X(w)>a}) \begin{align*} \left\{ X \gt a \right\} &:= \left\{ w : X(w) \gt a \right\} \\ P\left( X \gt a \right) &:= P\left( \left\{ w : X(w) \gt a \right\} \right) \end{align*}

Basic Definitions

For measurable spacse (Ω,F)(\Omega, \mathcal{F}) and (R,BR)(\mathbb{R}, \mathcal{B}_{\mathbb{R}}), (F,BR)(\mathcal{F}, \mathcal{B}_{\mathbb{R}})-mearsuable function X:ΩRX : \Omega \to \mathbb{R} is called random variable. Namely, X1(B)FBBR. X^{-1}(B) \in \mathcal{F}\qquad \forall B \in \mathcal{B}_{\mathbb{R}}.

A probability (or probability measure) on (Ω,F)(\Omega, \mathcal{F}) is measure P:FRP : \mathcal{F} \to \mathbb{R} such that P(Ω)=1P(\Omega) = 1.

If XX is a random variable,

  • expectation: E(X):=XdP\displaystyle E(X) := \int X dP
  • variance: σ2(X):=E[(XE(X))2]=E(X2)E(X)2\sigma^{2}(X) := E\left[ (X - E(X))^{2} \right] = E(X^{2}) - E(X)^{2}

The (probability) distribution of XX is a probability on R\mathbb{R}, PX:BRRP_{X} : \mathcal{B}_{\mathbb{R}} \to \mathbb{R} such that PX(B):=P(X1(B)). P_{X}(B) := P(X^{-1}(B)).

The distribution fuction of XX is defined as FX(a):=PX((,a])=P(Xa). F_{X}(a) := P_{X}\left( (-\infty, a] \right) = P(X \le a).

For any finite sequence of random variables {Xi}i=1n\left\{ X_{i} \right\}_{i=1}^{n}, random vector (X1,,Xn)(X_{1}, \dots, X_{n}) is defined as a map from ΩRn\Omega \to \mathbb{R}^{n}: (X1,,Xn)(x):=(X1(x),,Xn(x)). (X_{1}, \dots, X_{n})(x) := (X_{1}(x), \dots, X_{n}(x)). Note: (X1,,Xn)1(B1××Bn)=X11(B1)Xn1(Bn)(X_{1}, \dots, X_{n})^{-1}(B_{1} \times \cdots \times B_{n})= X_{1}^{-1}(B_{1}) \cap \cdots \cap X_{n}^{-1}(B_{n}).

For (X,Y):ΩR2(X, Y) : \Omega \to \mathbb{R}^{2}, (X,Y)1(a,b)={xΩ:X(x)=a}{xΩ:Y(x)=b}. (X, Y)^{-1} (a, b) = \left\{ x \in \Omega : X(x) = a \right\} \cap \left\{ x \in \Omega : Y(x) = b \right\}. Thus, for all Borel sets B1B_{1} and B2BRB_{2} \in \mathcal{B}_{\mathbb{R}} we have (X,Y)1(B1×B2)=(X,Y)1(B1,B2)=X1(B1)Y1(B2) (X, Y)^{-1}(B_{1} \times B_{2}) = (X, Y)^{-1}(B_{1}, B_{2}) = X^{-1}(B_{1}) \cap Y^{-1}(B_{2}) and extending to Rn\mathbb{R}^{n} we obtain (X1,,Xn)1(B1××Bn)=(X1,,Xn)1(B1,,Bn)=X11(B1)Xn1(Bn). \begin{equation} \begin{aligned} (X_{1}, \dots, X_{n})^{-1}(B_{1} \times \cdots \times B_{n}) &= (X_{1}, \dots, X_{n})^{-1}(B_{1}, \dots, B_{n}) \\ &= X_{1}^{-1}(B_{1}) \cap \cdots \cap X_{n}^{-1}(B_{n}). \end{aligned} \end{equation}

The joint distribution of X1,,XnX_{1}, \dots, X_{n} is a probability distribution of (X1,,Xn)(X_{1}, \dots, X_{n}): P(X1,,Xn):BRnR, P_{(X_{1}, \dots, X_{n})} : \mathcal{B}_{\mathbb{R}^{n}} \to \mathbb{R}, P(X1,,Xn)(B1××Bn):=P((X1,,Xn)1(B1××Bn)). P_{(X_{1}, \dots, X_{n})}(B_{1} \times \cdots \times B_{n}) := P\left( (X_{1}, \dots, X_{n})^{-1}(B_{1} \times \cdots \times B_{n}) \right).

Independency

For an event EE such that P(E)>0P(E) \gt 0, a probability on Ω\Omega PE(F)=P(EF):=P(EF)/P(E) P_{E}(F) = P(E|F) := P(E \cap F)/P(E) is called conditional probability on EE.

If PE(F)=P(F)P_{E}(F) = P(F), then FF is said to be independent of EE: F is independent of E    P(EF)=P(E)P(F). \text{FF is independent of EE} \iff P(E \cap F) = P(E)P(F). A collection {Ej}\left\{ E_{j} \right\} of events in Ω\Omega is indepencent if P(E1En)=P(E1)P(E2)P(En)=i=1nP(Ej) P(E_{1} \cap \cdots \cap E_{n}) = P(E_{1}) P(E_{2}) \cdots P(E_{n}) = \prod \limits_{i=1}^{n} P(E_{j})

A collection {Xj}\left\{ X_{j} \right\} of random variables on Ω\Omega is independent if the events {Xj1(Bj)}\left\{ X_{j}^{-1}(B_{j}) \right\} are independent for all Borel sets BjBRB_{j} \in \mathcal{B}_{\mathbb{R}}, namely P(X11(B1)Xn1(Bn))=j=1nP(Xj1(Bj)). P\left(X_{1}^{-1}(B_{1}) \cap \cdots \cap X_{n}^{-1}(B_{n})\right) = \prod \limits_{j=1}^{n} P(X_{j}^{-1}(B_{j})).

We have from LHS by definition of distribution and (2)(2) P(X11(B1)Xn1(Bn))=P((X1,,Xn)1(B1××Bn))=P(X1,,Xn)(B1××Bn). \begin{align*} P\left(X_{1}^{-1}(B_{1}) \cap \cdots \cap X_{n}^{-1}(B_{n})\right) &= P\left( (X_{1}, \dots, X_{n})^{-1}(B_{1} \times \cdots \times B_{n}) \right) \\ &= P_{(X_{1}, \dots, X_{n})} \left( B_{1} \times \cdots \times B_{n} \right). \end{align*} By the way, we have from RHS by definition of product measure and distribution j=1nP(Xj1(Bj))=j=1nPXj(Bj)=(j=1nPXj)(B1××Bn). \prod \limits_{j=1}^{n} P(X_{j}^{-1}(B_{j})) = \prod \limits_{j=1}^{n} P_{X_{j}}(B_{j}) = \left( \prod \limits_{j=1}^{n} P_{X_{j}} \right) \left( B_{1} \times \cdots \times B_{n} \right). Therefore, if {Xj}\left\{ X_{j} \right\} are independent, then P(X1,,Xn)=j=1nPXj. P_{(X_{1}, \dots, X_{n})} = \prod\limits_{j=1}^{n}P_{X_{j}}.

{Xj}\left\{ X_{j} \right\} is an independent set of random variables if and only if the joint distribution of {Xj}\left\{ X_{j} \right\} is the product of their individual distributions.

References

  • Robert G. Bartle, The Elements of Integration and Lebesgue Measure (1995)
  • Gerald B. Folland, Real Analysis: Modern Techniques and Their Applications (1999)