logo

Properties of Conditional Expectation 📂Probability Theory

Properties of Conditional Expectation

Theorem

Given a probability space (Ω,F,P)( \Omega , \mathcal{F} , P):

  • [1] From measure theory: If measurable functions ff, gg are F\mathcal{F}-measurable, then there exists a Borel function h:RRh : \mathbb{R} \to \mathbb{R} satisfying g=h(f)g = h (f).
  • [2] Application in probability theory: If random variables XX, YY are σ(X)\sigma (X)-measurable, then there exists a Borel function h:RRh : \mathbb{R} \to \mathbb{R} satisfying E(YX)=h(X)E(Y | X) = h(X).
  • [3]: If XX is F\mathcal{F}-measurable, then E(XF)=X a.s.E(X|\mathcal{F}) =X \text{ a.s.}.
  • [4]: For sigma field G={,Ω}\mathcal{G} = \left\{ \emptyset , \Omega \right\}, E(XG)=E(X) a.s.E(X|\mathcal{G}) = E(X) \text{ a.s.}.
  • [5]: For constant cc and all sigma fields G\mathcal{G}, E(cF)=c a.s.E(c|\mathcal{F}) = c \text{ a.s.}.
  • [6]: For constant cc, E(cXG)=cE(XG) a.s.E(cX | \mathcal{G}) = c E(X | \mathcal{G}) \text{ a.s.}.
  • [7]: E(X+YG)=E(XG)+E(YG) a.s.E(X+Y | \mathcal{G}) = E(X | \mathcal{G}) + E(Y| \mathcal{G}) \text{ a.s.}.
  • [8]: If X0 a.s.X \ge 0 \text{ a.s.}, then E(XG)0 a.s.E(X | \mathcal{G}) \ge 0 \text{ a.s.}.
  • [9]: If XY a.s.X \ge Y \text{ a.s.}, then E(XG)E(YG) a.s.E(X | \mathcal{G}) \ge E(Y | \mathcal{G}) \text{ a.s.}.
  • [10]: E(XG)E(XG) a.s.\left| E( X | \mathcal{G} ) \right| \le E ( | X | | \mathcal{G} ) \text{ a.s.}.
  • [11]: For all sigma fields G\mathcal{G}, E[E(XG)]=E(X)E \left[ E ( X | \mathcal{G} ) \right] = E(X).

  • σ(X)={X1(B):BB(R)}\sigma (X) = \left\{ X^{-1} (B) : B \in \mathcal{B}(\mathbb{R}) \right\} represents the smallest sigma field generated by the random variable XX, denoted as Ω\Omega.
  • A function ZZ being F\mathcal{F}-measurable means for all Borel sets BB(R)B \in \mathcal{B}(\mathbb{R}), Z1(B)FZ^{-1} (B) \in \mathcal{F} holds.
  • A Borel function is a function f:RRf : \mathbb{R} \to \mathbb{R} for which f1(B)f^{-1} (B) is a Borel set for all Borel sets BB(R)B \in \mathcal{B}(\mathbb{R}).

Explanation

  • [1],[2]: These theorems indicate that the conditional expectation of YY given XX can be represented by some function dependent on XX. Specifically, given the value of XX, it is represented as E(YX=a)=h(a)E(Y | X = a) = h(a). [2] is a corollary of [1], ensuring that basic properties of expectation commonly used in elementary probability theory are almost surely guaranteed.

Linearity

  • [5]~[7]: The linearity of expectation, denoted as E(aX+bG)=aE(XG)+bE(aX + b | \mathcal{G}) = aE(X | \mathcal{G}) + b, is preserved even conditionally.

Sigma Fields as Information

  • [3] E(XF)=XE(X | \mathcal{F}) = X: Considering the meaning of the formula, if the random variable XX is F\mathcal{F}-measurable, it means that the sigma field F\mathcal{F} has all the information about XX. Conversely, this is why it’s called measurable. Therefore, E(XF)E(X|\mathcal{F}) can be directly perceived without any interference. A XX, fully known on F\mathcal{F}, doesn’t need to be calculated through EE.
    • For example, when playing a game where you earn 1 dollar for each face of a 6-sided die, the expected earnings are 3.5 dollars. This is calculated because we don’t know which face of the die will appear. However, if the sigma field F\mathcal{F} is precisely given in my mind before throwing the die, the face of the die XX can be accurately measured, and thus how many dollars will be received is known. Even if one has to pay 3.5 dollars each game, by not playing the losing games and only playing the winning ones, one can avoid losses. In this sense, random number hacking corresponds to an attack technique that steals the sigma field (random number table) and makes what should have been random deterministic. If successful, cryptographic systems relying on randomness, such as bank security cards or OTPs, are compromised.

Since σ(X)\sigma (X) is defined as the smallest sigma field knowing all about XX, it naturally follows E(Xσ(X))=XE(X| \sigma (X)) = X, denoted as E(XX)=XE(X|X) = X according to the introduced notation.

  • [4] E(XG)=E(X)E(X|\mathcal{G}) = E(X): Considering the meaning of the formula, the trivial sigma field G={,Ω}\mathcal{G} = \left\{ \emptyset , \Omega \right\} gives no information about XX, thus necessitating a scan of the entire probability space Ω\Omega to calculate ΩXdP\displaystyle \int_{\Omega} X d P.
  • [10] E(XG)E(XG)\left| E( X | \mathcal{G} ) \right| \le E ( | X | | \mathcal{G} ): Following the properties of absolute value, E(XG)E(XG)E(XG) - E ( | X | | \mathcal{G} ) \le E( X | \mathcal{G} ) \le E ( | X | | \mathcal{G} )
  • [11] E[E(XG)]=E(X)E \left[ E ( X | \mathcal{G} ) \right] = E(X): This equality is useful in various proofs in probability theory, mainly used as a trick when E(X)E(X) is difficult to calculate directly, but given some G\mathcal{G}, E(XG)E(X|\mathcal{G}) becomes easier to compute.

Proof

[1]

Define h:RRh : \mathbb{R} \to \mathbb{R} for zRz \in \mathbb{R} as h(z):=(gf1({z}))h(z) := \left( g \circ f^{-1} ( \left\{ z \right\} ) \right).

If {z}B(R)\left\{ z \right\} \in \mathcal{B}(\mathbb{R}), since ff is F\mathcal{F}-measurable, f1({z})Ff^{-1}(\left\{ z \right\}) \in \mathcal{F} holds, and since gg is also F\mathcal{F}-measurable, hh is well-defined and satisfies g(ω)=(hf)(ω)g (\omega) = ( h \circ f ) ( \omega ).

For all Borel sets BB(R)B \in \mathcal{B}(\mathbb{R}), consider h1(B)=(fg1)(B)=f(g1(B)) h^{-1}(B) = (f \circ g^{-1})(B) = f \left( g^{-1} (B) \right) , since g1(B)Fg^{-1} (B) \in \mathcal{F}, we have f(g1(B))B(R)f(g^{-1} (B) ) \in \mathcal{B}(\mathbb{R}). As h1(B)B(R)h^{-1}(B) \in \mathcal{B}(\mathbb{R}) holds for all BB(R)B \in \mathcal{B}(\mathbb{R}), hh is a Borel function.

[2]

E(YX)=E(Yσ(X))E ( Y | X ) = E ( Y | \sigma (X) ) is a σ(X)\sigma (X)-measurable random variable by the definition of conditional expectation, and XX is also a σ(X)\sigma (X)-measurable random variable by definition. Therefore, by [1], let’s denote F=σ(X)\mathcal{F} = \sigma (X) and set f=Xg=E(YX) f = X \\ g = E ( Y | X ) , then there exists a Borel function h:RRh : \mathbb{R} \to \mathbb{R} satisfying E(YX)=h(X)E(Y|X) = h(X).

Strategies [3]~[7]: Convert to integral form, show that definite integrals are equal, and then apply the following theorem, named Lebesgue Integration Lemma for this post.

Properties of Lebesgue Integration: AF,Afdm=0    f=0 a.e. \forall A \in \mathcal{F}, \int_{A} f dm = 0 \iff f = 0 \text{ a.e.}

[3]

Since XX uniquely exists satisfying AXdP=AXdP\displaystyle \int_{A} X dP = \int_{A} X dP for all AFA \in \mathcal{F}, by the definition of conditional expectation, X=E(XF)X = E(X| \mathcal{F}) is the conditional expectation of XX with respect to F\mathcal{F}. Therefore, for all AFA \in \mathcal{F}, AE(XF)dP=AXdP \int_{A} E(X |\mathcal{F}) dP = \int_{A} X dP and by the Lebesgue Integration Lemma

, we have X=E(XF) a.s.X = E(X |\mathcal{F}) \text{ a.s.}.

[4]

By the definition of conditional expectation, we have AE(XG)dP=AXdP\displaystyle \int_{A} E(X |\mathcal{G}) dP = \int_{A} X dP.

Case 1. A=A = \emptyset

0=E(XG)dP=XdP=0 0 = \int_{\emptyset} E(X |\mathcal{G}) dP = \int_{\emptyset} X dP = 0


Case 2. A=ΩA = \Omega

ΩE(XG)dP=ΩXdP=E(X)=E(X)P(Ω)=E(X)Ω1dP=ΩE(X)dP \int_{\Omega} E(X |\mathcal{G}) dP = \int_{\Omega} X dP = E(X) = E(X) P(\Omega) = E(X) \int_{\Omega} 1 dP = \int_{\Omega} E(X) dP


In either case, by the Lebesgue Integration Lemma, we have X=E(XG) a.s.X = E(X |\mathcal{G}) \text{ a.s.}.

[5]

Given cGc \in \mathcal{G} and E(cG)GE(c | \mathcal{G}) \in \mathcal{G}, by the definition of conditional expectation, for all AGA \in \mathcal{G}, AE(cG)dP=AXdP \int_{A} E(c |\mathcal{G}) dP = \int_{A} X dP and therefore, by the Lebesgue Integration Lemma, we have c=E(cG) a.s.c = E(c | \mathcal{G}) \text{ a.s.}.

[6]

By the definition of conditional expectation and the linearity of Lebesgue integration, for all AGA \in \mathcal{G}, AE(cXG)dP=AcXdP=cAXdP=cAE(XG)dP=AcE(XG)dP \begin{align*} \int_{A} E( cX |\mathcal{G}) dP =& \int_{A} cX dP \\ =& c \int_{A} X dP \\ =& c \int_{A} E(X|\mathcal{G}) dP \\ =& \int_{A} c E(X|\mathcal{G}) dP \end{align*} and by the Lebesgue Integration Lemma, we have E(cXG)=cE(XG)dP a.s.E( cX |\mathcal{G}) = c E(X|\mathcal{G}) dP \text{ a.s.}.

[7]

By the definition of conditional expectation and the linearity of Lebesgue integration, for all AGA \in \mathcal{G}, AE(X+YG)dP=A(X+Y)dP=AXdP+AYdP=AE(XG)dP+AE(YG)dP=A[E(XG)+E(YG)]dP \begin{align*} \int_{A} E( X+Y |\mathcal{G}) dP =& \int_{A} (X+Y) dP \\ =& \int_{A} X dP +\int_{A} Y dP \\ =& \int_{A} E(X|\mathcal{G}) dP + \int_{A} E(Y|\mathcal{G}) dP \\ =& \int_{A} \left[ E(X|\mathcal{G}) + E(Y|\mathcal{G}) \right] dP \end{align*} and by the Lebesgue Integration Lemma, E(X+YG)=E(XG)+E(YG)dP a.s. E( X +Y |\mathcal{G}) = E(X|\mathcal{G}) + E(Y|\mathcal{G}) dP \text{ a.s.}

[8]

Assuming E(XG)<0E( X |\mathcal{G}) < 0 leads to a contradiction, thus E(XG)0 a.s.E( X |\mathcal{G}) \ge 0 \text{ a.s.} must hold.

[9]

Assuming Z:=XY0Z := X - Y \ge 0 and by [8], E(XYG)0 E(X-Y | \mathcal{G}) \ge 0 and by the linearity of conditional expectation, E(XG)E(YG)0 a.s. E(X| \mathcal{G}) - E(Y | \mathcal{G}) \ge 0 \text{ a.s.}

[10]

Part 1. X0X \ge 0

If X0X \ge 0, then X=X|X| = X holds, leading to E(XG)=E(XG) E( |X| |\mathcal{G}) = E(X|\mathcal{G})

By [8], we have E(XG)0E(X|\mathcal{G}) \ge 0, similarly leading to E(XG)=E(XG)E(X|\mathcal{G}) = \left| E(X|\mathcal{G}) \right|, E(XG)=E(XG)=E(XG) E( |X| |\mathcal{G}) = E(X|\mathcal{G}) = \left| E(X|\mathcal{G}) \right|


Part 2. X<0X < 0

By [6], E(XG)=E(XG)=E(XG)=E(XG) E( |X| |\mathcal{G}) = E( -X |\mathcal{G}) = - E(X |\mathcal{G}) = \left| E(X|\mathcal{G}) \right|


Part 3. X=X+XX = X^{+} - X^{-}

By the triangle inequality, E(XG)E(X+G)+E(XG) \left| E(X|\mathcal{G}) \right| \le \left| E( X^{+} |\mathcal{G}) \right| + \left| E( X^{-} |\mathcal{G}) \right| Given X+,X0X^{+} , X^{-} \ge 0, by Part 1, E(XG)E(X+G)+E(XG) \left| E(X|\mathcal{G}) \right| \le E( \left| X^{+} \right| |\mathcal{G}) + E( \left| X^{-} \right| | \mathcal{G})

By [7] and the expression of absolute value f=f++f|f| = |f^{+}| + |f^{-}|, E(XG)E(X++XG)=E(XG) a.s. \begin{align*} \left| E(X|\mathcal{G}) \right| \le & E( \left| X^{+} \right| + \left| X^{-} \right| | \mathcal{G}) \\ =& E( \left| X \right| | \mathcal{G}) \text{ a.s.} \end{align*}

[11]

E[E(XG)]=ΩE(XG)dP=ΩXdP=E(X) \begin{align*} E \left[ E( X | \mathcal{G} ) \right] =& \int_{\Omega} E ( X | \mathcal{G} ) d P \\ =& \int_{\Omega} X d P \\ =& E(X) \end{align*}