Properties of Conditional Expectation 📂Probability Theory

Properties of Conditional Expectation

Theorem

Given a probability space $( \Omega , \mathcal{F} , P)$ :

[1] From measure theory: If measurable functions $f$ , $g$ are $\mathcal{F}$ -measurable, then there exists a Borel function $h : \mathbb{R} \to \mathbb{R}$ satisfying $g = h (f)$ .
[2] Application in probability theory: If random variables $X$ , $Y$ are $\sigma (X)$ -measurable, then there exists a Borel function $h : \mathbb{R} \to \mathbb{R}$ satisfying $E(Y | X) = h(X)$ .
[3]: If $X$ is $\mathcal{F}$ -measurable, then $E(X|\mathcal{F}) =X \text{ a.s.}$ .
[4]: For sigma field $\mathcal{G} = \left\{ \emptyset , \Omega \right\}$ , $E(X|\mathcal{G}) = E(X) \text{ a.s.}$ .
[5]: For constant $c$ and all sigma fields $\mathcal{G}$ , $E(c|\mathcal{F}) = c \text{ a.s.}$ .
[6]: For constant $c$ , $E(cX | \mathcal{G}) = c E(X | \mathcal{G}) \text{ a.s.}$ .
[7]: $E(X+Y | \mathcal{G}) = E(X | \mathcal{G}) + E(Y| \mathcal{G}) \text{ a.s.}$ .
[8]: If $X \ge 0 \text{ a.s.}$ , then $E(X | \mathcal{G}) \ge 0 \text{ a.s.}$ .
[9]: If $X \ge Y \text{ a.s.}$ , then $E(X | \mathcal{G}) \ge E(Y | \mathcal{G}) \text{ a.s.}$ .
[10]: $\left| E( X | \mathcal{G} ) \right| \le E ( | X | | \mathcal{G} ) \text{ a.s.}$ .
[11]: For all sigma fields $\mathcal{G}$ , $E \left[ E ( X | \mathcal{G} ) \right] = E(X)$ .

$\sigma (X) = \left\{ X^{-1} (B) : B \in \mathcal{B}(\mathbb{R}) \right\}$ represents the smallest sigma field generated by the random variable $X$ , denoted as $\Omega$ .
A function $Z$ being $\mathcal{F}$ -measurable means for all Borel sets $B \in \mathcal{B}(\mathbb{R})$ , $Z^{-1} (B) \in \mathcal{F}$ holds.
A Borel function is a function $f : \mathbb{R} \to \mathbb{R}$ for which $f^{-1} (B)$ is a Borel set for all Borel sets $B \in \mathcal{B}(\mathbb{R})$ .

Explanation

[1],[2]: These theorems indicate that the conditional expectation of $Y$ given $X$ can be represented by some function dependent on $X$ . Specifically, given the value of $X$ , it is represented as $E(Y | X = a) = h(a)$ . [2] is a corollary of [1], ensuring that basic properties of expectation commonly used in elementary probability theory are almost surely guaranteed.

Linearity

[5]~[7]: The linearity of expectation, denoted as $E(aX + b | \mathcal{G}) = aE(X | \mathcal{G}) + b$ , is preserved even conditionally.

Sigma Fields as Information

[3] $E(X | \mathcal{F}) = X$ : Considering the meaning of the formula, if the random variable $X$ is $\mathcal{F}$ -measurable, it means that the sigma field $\mathcal{F}$ has all the information about $X$ . Conversely, this is why it’s called measurable. Therefore, $E(X|\mathcal{F})$ can be directly perceived without any interference. A $X$ , fully known on $\mathcal{F}$ , doesn’t need to be calculated through $E$ .
- For example, when playing a game where you earn 1 dollar for each face of a 6-sided die, the expected earnings are 3.5 dollars. This is calculated because we don’t know which face of the die will appear. However, if the sigma field $\mathcal{F}$ is precisely given in my mind before throwing the die, the face of the die $X$ can be accurately measured, and thus how many dollars will be received is known. Even if one has to pay 3.5 dollars each game, by not playing the losing games and only playing the winning ones, one can avoid losses. In this sense, random number hacking corresponds to an attack technique that steals the sigma field (random number table) and makes what should have been random deterministic. If successful, cryptographic systems relying on randomness, such as bank security cards or OTPs, are compromised.

Since $\sigma (X)$ is defined as the smallest sigma field knowing all about $X$ , it naturally follows $E(X| \sigma (X)) = X$ , denoted as $E(X|X) = X$ according to the introduced notation.

[4] $E(X|\mathcal{G}) = E(X)$ : Considering the meaning of the formula, the trivial sigma field $\mathcal{G} = \left\{ \emptyset , \Omega \right\}$ gives no information about $X$ , thus necessitating a scan of the entire probability space $\Omega$ to calculate $\displaystyle \int_{\Omega} X d P$ .
[10] $\left| E( X | \mathcal{G} ) \right| \le E ( | X | | \mathcal{G} )$ : Following the properties of absolute value, $- E ( | X | | \mathcal{G} ) \le E( X | \mathcal{G} ) \le E ( | X | | \mathcal{G} )$
[11] $E \left[ E ( X | \mathcal{G} ) \right] = E(X)$ : This equality is useful in various proofs in probability theory, mainly used as a trick when $E(X)$ is difficult to calculate directly, but given some $\mathcal{G}$ , $E(X|\mathcal{G})$ becomes easier to compute.

Proof

[1]

Define $h : \mathbb{R} \to \mathbb{R}$ for $z \in \mathbb{R}$ as $h(z) := \left( g \circ f^{-1} ( \left\{ z \right\} ) \right)$ .

If $\left\{ z \right\} \in \mathcal{B}(\mathbb{R})$ , since $f$ is $\mathcal{F}$ -measurable, $f^{-1}(\left\{ z \right\}) \in \mathcal{F}$ holds, and since $g$ is also $\mathcal{F}$ -measurable, $h$ is well-defined and satisfies $g (\omega) = ( h \circ f ) ( \omega )$ .

For all Borel sets $B \in \mathcal{B}(\mathbb{R})$ , consider $h^{-1}(B) = (f \circ g^{-1})(B) = f \left( g^{-1} (B) \right)$ , since $g^{-1} (B) \in \mathcal{F}$ , we have $f(g^{-1} (B) ) \in \mathcal{B}(\mathbb{R})$ . As $h^{-1}(B) \in \mathcal{B}(\mathbb{R})$ holds for all $B \in \mathcal{B}(\mathbb{R})$ , $h$ is a Borel function.

■

[2]

$E ( Y | X ) = E ( Y | \sigma (X) )$ is a $\sigma (X)$ -measurable random variable by the definition of conditional expectation, and $X$ is also a $\sigma (X)$ -measurable random variable by definition. Therefore, by [1], let’s denote $\mathcal{F} = \sigma (X)$ and set $f = X \\ g = E ( Y | X )$ , then there exists a Borel function $h : \mathbb{R} \to \mathbb{R}$ satisfying $E(Y|X) = h(X)$ .

■

Strategies [3]~[7]: Convert to integral form, show that definite integrals are equal, and then apply the following theorem, named Lebesgue Integration Lemma for this post.

Properties of Lebesgue Integration: $\forall A \in \mathcal{F}, \int_{A} f dm = 0 \iff f = 0 \text{ a.e.}$

[3]

Since $X$ uniquely exists satisfying $\displaystyle \int_{A} X dP = \int_{A} X dP$ for all $A \in \mathcal{F}$ , by the definition of conditional expectation, $X = E(X| \mathcal{F})$ is the conditional expectation of $X$ with respect to $\mathcal{F}$ . Therefore, for all $A \in \mathcal{F}$ , $\int_{A} E(X |\mathcal{F}) dP = \int_{A} X dP$ and by the Lebesgue Integration Lemma

, we have $X = E(X |\mathcal{F}) \text{ a.s.}$ .

■

[4]

By the definition of conditional expectation, we have $\displaystyle \int_{A} E(X |\mathcal{G}) dP = \int_{A} X dP$ .

Case 1. $A = \emptyset$

$0 = \int_{\emptyset} E(X |\mathcal{G}) dP = \int_{\emptyset} X dP = 0$

Case 2. $A = \Omega$

$\int_{\Omega} E(X |\mathcal{G}) dP = \int_{\Omega} X dP = E(X) = E(X) P(\Omega) = E(X) \int_{\Omega} 1 dP = \int_{\Omega} E(X) dP$

In either case, by the Lebesgue Integration Lemma, we have $X = E(X |\mathcal{G}) \text{ a.s.}$ .

■

[5]

Given $c \in \mathcal{G}$ and $E(c | \mathcal{G}) \in \mathcal{G}$ , by the definition of conditional expectation, for all $A \in \mathcal{G}$ , $\int_{A} E(c |\mathcal{G}) dP = \int_{A} X dP$ and therefore, by the Lebesgue Integration Lemma, we have $c = E(c | \mathcal{G}) \text{ a.s.}$ .

■

[6]

By the definition of conditional expectation and the linearity of Lebesgue integration, for all $A \in \mathcal{G}$ , $\begin{align*} \int_{A} E( cX |\mathcal{G}) dP =& \int_{A} cX dP \\ =& c \int_{A} X dP \\ =& c \int_{A} E(X|\mathcal{G}) dP \\ =& \int_{A} c E(X|\mathcal{G}) dP \end{align*}$ and by the Lebesgue Integration Lemma, we have $E( cX |\mathcal{G}) = c E(X|\mathcal{G}) dP \text{ a.s.}$ .

■

[7]

By the definition of conditional expectation and the linearity of Lebesgue integration, for all $A \in \mathcal{G}$ , $\begin{align*} \int_{A} E( X+Y |\mathcal{G}) dP =& \int_{A} (X+Y) dP \\ =& \int_{A} X dP +\int_{A} Y dP \\ =& \int_{A} E(X|\mathcal{G}) dP + \int_{A} E(Y|\mathcal{G}) dP \\ =& \int_{A} \left[ E(X|\mathcal{G}) + E(Y|\mathcal{G}) \right] dP \end{align*}$ and by the Lebesgue Integration Lemma, $E( X +Y |\mathcal{G}) = E(X|\mathcal{G}) + E(Y|\mathcal{G}) dP \text{ a.s.}$

■

[8]

Assuming $E( X |\mathcal{G}) < 0$ leads to a contradiction, thus $E( X |\mathcal{G}) \ge 0 \text{ a.s.}$ must hold.

■

[9]

Assuming $Z := X - Y \ge 0$ and by [8], $E(X-Y | \mathcal{G}) \ge 0$ and by the linearity of conditional expectation, $E(X| \mathcal{G}) - E(Y | \mathcal{G}) \ge 0 \text{ a.s.}$

■

[10]

Part 1. $X \ge 0$

If $X \ge 0$ , then $|X| = X$ holds, leading to $E( |X| |\mathcal{G}) = E(X|\mathcal{G})$

By [8], we have $E(X|\mathcal{G}) \ge 0$ , similarly leading to $E(X|\mathcal{G}) = \left| E(X|\mathcal{G}) \right|$ , $E( |X| |\mathcal{G}) = E(X|\mathcal{G}) = \left| E(X|\mathcal{G}) \right|$

Part 2. $X < 0$

By [6], $E( |X| |\mathcal{G}) = E( -X |\mathcal{G}) = - E(X |\mathcal{G}) = \left| E(X|\mathcal{G}) \right|$

Part 3. $X = X^{+} - X^{-}$

By the triangle inequality, $\left| E(X|\mathcal{G}) \right| \le \left| E( X^{+} |\mathcal{G}) \right| + \left| E( X^{-} |\mathcal{G}) \right|$ Given $X^{+} , X^{-} \ge 0$ , by Part 1, $\left| E(X|\mathcal{G}) \right| \le E( \left| X^{+} \right| |\mathcal{G}) + E( \left| X^{-} \right| | \mathcal{G})$

By [7] and the expression of absolute value $|f| = |f^{+}| + |f^{-}|$ , $\begin{align*} \left| E(X|\mathcal{G}) \right| \le & E( \left| X^{+} \right| + \left| X^{-} \right| | \mathcal{G}) \\ =& E( \left| X \right| | \mathcal{G}) \text{ a.s.} \end{align*}$

■

[11]

$\begin{align*} E \left[ E( X | \mathcal{G} ) \right] =& \int_{\Omega} E ( X | \mathcal{G} ) d P \\ =& \int_{\Omega} X d P \\ =& E(X) \end{align*}$

■