This is a summary of definitions and concepts for those who have already studied measure theory and probability. It is intended to be viewed when definitions are confusing or unrecognizable, and when a general review is needed.
Measure Theory
Algebras
An algebra of sets on nonempty set X is a nonempty collection A of subsets of X is colsed under finite unions ans complements.
σ-algebra is an algebra that is closed under countable unions.
Note:
A is also closed under intersections, because E1∩E2=(E1∪E2)c∈A for E1,E2∈A.
∅, X∈A, since if E∈A we have ∅=E∩Ec∈A and X=E∪Ec∈A.
If X any topological space, the σ-algebra generated by the family of open sets in X is called the Borel σ-algebra on X and is denoted by BX.
Borel σ-algebra is unique smallest σ-algebra containing all open sets.
Let E be a σ-algebra on X, then (X,E) is called a measurable space and E∈E is called (E-)measurable set.
In the following, we shall consider a fixed measurable space (X,E).
Measurable Functions
A function f:X→R is said to be (E-)measurable, if for every real number α∈R the set {x∈X:f(x)>α} belongs to E.
Generalization
Let (X,E) and (Y,F) be a measurable spaces. A function f:X→Y is called (E,F)-measurable, if f−1(F)={x∈X:f(x)∈F} belongs to E for all F∈F.
Note: A E-measurable function is equivalent to this definition in the case (Y,F)=(R,BR).
Measures
A measure on E (or on (X,E), or simply on X if E is understood) is a function μ:E→[0,∞] such that
Null empty set:μ(∅)=0.
Countable additivity: If {Ej} is a sequence of disjoint sets in E, then μ(j⋃Ej)=j∑μ(Ej).
A triple (X,E,μ) is called a measure space and we shall be working on a fixed measure space (X,E,μ).
A Borel measure on R is a measure whose domain is the Borel σ-algebra BR:
μ:BR→[0,∞]
For two measures μ and ν on each (X,E) and (Y,F), measure μ×ν is the product of μ and ν which is the unique measure on E×F such that μ×ν(E×F)=μ(E)ν(F) for all rectangles E×F.
The Integral
A real-valued function f is simple if it has only a finite number of values.
A simple measurable function φ can be represented in the form
φ=j=1∑najχEj, where Ej=φ−1({aj}) and range(φ)={a1,…,an}.
where χEj is the characteristic function of Ej. We call this standard representation of φ.
If φ simple measurable function with standard representation (1), we define the integral of φ with respect to measure μ by
∫φdμ:=j=1∑najμ(Ej).Notation:∫φdμ=∫φ=∫φ(x)dμ(x),∫=∫X.
If f is measurable function on (X,E), we define the integral of f with respect to μ by
∫fdμ:=sup{∫φdμ:0≤φ≤f,φ is simple and measurable}.
The positive and negative parts of f:X→R are defined repectively as
f+(x):=max(f(x),0)),f−1(x):=min(−f(x),0)).
If ∫f+ and ∫f− are both finite, then we say that f is integrable. Also ∣f∣=f+−f−.
The set of real-valued integrable functions is a vector space and the integral is a linear functional on it. This vector space is denoted as:
L=L(X,E,μ)=L(X,μ)=L(X)=L(μ),L=L1
Lp space
For measure space (X,E,μ) and 0<p<∞, we define
Lp(X,E,μ):={f:X→Rf is measurable and (∫∣f∣pdμ)1/p<∞}.
Probability Theory
Notation and Terminology
Analysts’ TermMeasure space (X,E,μ) such that μ(X)=1Measure μ:E→R such that μ(X)=1(σ-)algebra E on XMesurable set E∈EMeasurable real-valued function f:X→RIntegral of f,∫fdμf is LpAlmost everywhere, a.e.Probabilists’ TermProbability space (Ω,F,P)Probability P:F→R(σ-)field F on ΩEvent E∈FRandom variable X:Ω→RExpextation of f,E(X)X has finite pth momentAlmost surely, a.s.
{X>a}:={w:X(w)>a}P(X>a):=P({w:X(w)>a})
\begin{align*}
\left\{ X \gt a \right\} &:= \left\{ w : X(w) \gt a \right\} \\
P\left( X \gt a \right) &:= P\left( \left\{ w : X(w) \gt a \right\} \right)
\end{align*}
{X>a}P(X>a):={w:X(w)>a}:=P({w:X(w)>a})
Basic Definitions
For measurable spacse (Ω,F)(\Omega, \mathcal{F})(Ω,F) and (R,BR)(\mathbb{R}, \mathcal{B}_{\mathbb{R}})(R,BR), (F,BR)(\mathcal{F}, \mathcal{B}_{\mathbb{R}})(F,BR)-mearsuable function X:Ω→RX : \Omega \to \mathbb{R}X:Ω→R is called random variable. Namely,
X−1(B)∈F∀B∈BR.
X^{-1}(B) \in \mathcal{F}\qquad \forall B \in \mathcal{B}_{\mathbb{R}}.
X−1(B)∈F∀B∈BR.
A probability (or probability measure) on (Ω,F)(\Omega, \mathcal{F})(Ω,F) is measure P:F→RP : \mathcal{F} \to \mathbb{R}P:F→R such that P(Ω)=1P(\Omega) = 1P(Ω)=1.
If XXX is a random variable,
expectation:E(X):=∫XdP\displaystyle E(X) := \int X dPE(X):=∫XdP
The (probability) distribution of XXX is a probability on R\mathbb{R}R, PX:BR→RP_{X} : \mathcal{B}_{\mathbb{R}} \to \mathbb{R}PX:BR→R such that
PX(B):=P(X−1(B)).
P_{X}(B) := P(X^{-1}(B)).
PX(B):=P(X−1(B)).
The distribution fuction of XXX is defined as
FX(a):=PX((−∞,a])=P(X≤a).
F_{X}(a) := P_{X}\left( (-\infty, a] \right) = P(X \le a).
FX(a):=PX((−∞,a])=P(X≤a).
For any finite sequence of random variables {Xi}i=1n\left\{ X_{i} \right\}_{i=1}^{n}{Xi}i=1n, random vector(X1,…,Xn)(X_{1}, \dots, X_{n})(X1,…,Xn) is defined as a map from Ω→Rn\Omega \to \mathbb{R}^{n}Ω→Rn:
(X1,…,Xn)(x):=(X1(x),…,Xn(x)).
(X_{1}, \dots, X_{n})(x) := (X_{1}(x), \dots, X_{n}(x)).
(X1,…,Xn)(x):=(X1(x),…,Xn(x)).Note:(X1,…,Xn)−1(B1×⋯×Bn)=X1−1(B1)∩⋯∩Xn−1(Bn)(X_{1}, \dots, X_{n})^{-1}(B_{1} \times \cdots \times B_{n})= X_{1}^{-1}(B_{1}) \cap \cdots \cap X_{n}^{-1}(B_{n})(X1,…,Xn)−1(B1×⋯×Bn)=X1−1(B1)∩⋯∩Xn−1(Bn).
For (X,Y):Ω→R2(X, Y) : \Omega \to \mathbb{R}^{2}(X,Y):Ω→R2,
(X,Y)−1(a,b)={x∈Ω:X(x)=a}∩{x∈Ω:Y(x)=b}.
(X, Y)^{-1} (a, b) = \left\{ x \in \Omega : X(x) = a \right\} \cap \left\{ x \in \Omega : Y(x) = b \right\}.
(X,Y)−1(a,b)={x∈Ω:X(x)=a}∩{x∈Ω:Y(x)=b}.
Thus, for all Borel sets B1B_{1}B1 and B2∈BRB_{2} \in \mathcal{B}_{\mathbb{R}}B2∈BR we have
(X,Y)−1(B1×B2)=(X,Y)−1(B1,B2)=X−1(B1)∩Y−1(B2)
(X, Y)^{-1}(B_{1} \times B_{2}) = (X, Y)^{-1}(B_{1}, B_{2}) = X^{-1}(B_{1}) \cap Y^{-1}(B_{2})
(X,Y)−1(B1×B2)=(X,Y)−1(B1,B2)=X−1(B1)∩Y−1(B2)
and extending to Rn\mathbb{R}^{n}Rn we obtain
(X1,…,Xn)−1(B1×⋯×Bn)=(X1,…,Xn)−1(B1,…,Bn)=X1−1(B1)∩⋯∩Xn−1(Bn).
\begin{equation}
\begin{aligned}
(X_{1}, \dots, X_{n})^{-1}(B_{1} \times \cdots \times B_{n})
&= (X_{1}, \dots, X_{n})^{-1}(B_{1}, \dots, B_{n}) \\
&= X_{1}^{-1}(B_{1}) \cap \cdots \cap X_{n}^{-1}(B_{n}).
\end{aligned}
\end{equation}
(X1,…,Xn)−1(B1×⋯×Bn)=(X1,…,Xn)−1(B1,…,Bn)=X1−1(B1)∩⋯∩Xn−1(Bn).
The joint distribution of X1,…,XnX_{1}, \dots, X_{n}X1,…,Xn is a probability distribution of (X1,…,Xn)(X_{1}, \dots, X_{n})(X1,…,Xn):
P(X1,…,Xn):BRn→R,
P_{(X_{1}, \dots, X_{n})} : \mathcal{B}_{\mathbb{R}^{n}} \to \mathbb{R}, P(X1,…,Xn):BRn→R,P(X1,…,Xn)(B1×⋯×Bn):=P((X1,…,Xn)−1(B1×⋯×Bn)).
P_{(X_{1}, \dots, X_{n})}(B_{1} \times \cdots \times B_{n}) := P\left( (X_{1}, \dots, X_{n})^{-1}(B_{1} \times \cdots \times B_{n}) \right).
P(X1,…,Xn)(B1×⋯×Bn):=P((X1,…,Xn)−1(B1×⋯×Bn)).
Independency
For an event EEE such that P(E)>0P(E) \gt 0P(E)>0, a probability on Ω\OmegaΩPE(F)=P(E∣F):=P(E∩F)/P(E)
P_{E}(F) = P(E|F) := P(E \cap F)/P(E)
PE(F)=P(E∣F):=P(E∩F)/P(E)
is called conditional probability on EEE.
If PE(F)=P(F)P_{E}(F) = P(F)PE(F)=P(F), then FFF is said to be independent of EEE:
F is independent of E⟺P(E∩F)=P(E)P(F).
\text{FFF is independent of EEE} \iff P(E \cap F) = P(E)P(F).
F is independent of E⟺P(E∩F)=P(E)P(F).
A collection {Ej}\left\{ E_{j} \right\}{Ej} of events in Ω\OmegaΩ is indepencent if
P(E1∩⋯∩En)=P(E1)P(E2)⋯P(En)=∏i=1nP(Ej)
P(E_{1} \cap \cdots \cap E_{n}) = P(E_{1}) P(E_{2}) \cdots P(E_{n}) = \prod \limits_{i=1}^{n} P(E_{j})
P(E1∩⋯∩En)=P(E1)P(E2)⋯P(En)=i=1∏nP(Ej)
A collection {Xj}\left\{ X_{j} \right\}{Xj} of random variables on Ω\OmegaΩ is independent if the events {Xj−1(Bj)}\left\{ X_{j}^{-1}(B_{j}) \right\}{Xj−1(Bj)} are independent for all Borel sets Bj∈BRB_{j} \in \mathcal{B}_{\mathbb{R}}Bj∈BR, namely
P(X1−1(B1)∩⋯∩Xn−1(Bn))=∏j=1nP(Xj−1(Bj)).
P\left(X_{1}^{-1}(B_{1}) \cap \cdots \cap X_{n}^{-1}(B_{n})\right) = \prod \limits_{j=1}^{n} P(X_{j}^{-1}(B_{j})).
P(X1−1(B1)∩⋯∩Xn−1(Bn))=j=1∏nP(Xj−1(Bj)).
We have from LHS by definition of distribution and (2)(2)(2)P(X1−1(B1)∩⋯∩Xn−1(Bn))=P((X1,…,Xn)−1(B1×⋯×Bn))=P(X1,…,Xn)(B1×⋯×Bn).
\begin{align*}
P\left(X_{1}^{-1}(B_{1}) \cap \cdots \cap X_{n}^{-1}(B_{n})\right)
&= P\left( (X_{1}, \dots, X_{n})^{-1}(B_{1} \times \cdots \times B_{n}) \right) \\
&= P_{(X_{1}, \dots, X_{n})} \left( B_{1} \times \cdots \times B_{n} \right).
\end{align*}
P(X1−1(B1)∩⋯∩Xn−1(Bn))=P((X1,…,Xn)−1(B1×⋯×Bn))=P(X1,…,Xn)(B1×⋯×Bn).
By the way, we have from RHS by definition of product measure and distribution
∏j=1nP(Xj−1(Bj))=∏j=1nPXj(Bj)=(∏j=1nPXj)(B1×⋯×Bn).
\prod \limits_{j=1}^{n} P(X_{j}^{-1}(B_{j}))
= \prod \limits_{j=1}^{n} P_{X_{j}}(B_{j})
= \left( \prod \limits_{j=1}^{n} P_{X_{j}} \right) \left( B_{1} \times \cdots \times B_{n} \right).
j=1∏nP(Xj−1(Bj))=j=1∏nPXj(Bj)=(j=1∏nPXj)(B1×⋯×Bn).
Therefore, if {Xj}\left\{ X_{j} \right\}{Xj} are independent, then
P(X1,…,Xn)=∏j=1nPXj.
P_{(X_{1}, \dots, X_{n})} = \prod\limits_{j=1}^{n}P_{X_{j}}.
P(X1,…,Xn)=j=1∏nPXj.
{Xj}\left\{ X_{j} \right\}{Xj} is an independent set of random variables if and only if the joint distribution of {Xj}\left\{ X_{j} \right\}{Xj} is the product of their individual distributions.
References
Robert G. Bartle, The Elements of Integration and Lebesgue Measure (1995)
Gerald B. Folland, Real Analysis: Modern Techniques and Their Applications (1999)