logo

Smoothing Properties of Conditional Expectation 📂Probability Theory

Smoothing Properties of Conditional Expectation

Theorem

Given a probability space (Ω,F,P)( \Omega , \mathcal{F} , P) and a sub-sigma field G,GF\mathcal{G}, \mathcal{G} ' \subset \mathcal{F}, assume XX and YY are random variables.

  • [1]: If XX is G\mathcal{G}-measurable E(XYG)=XE(YG) a.s. E(XY | \mathcal{G}) = X E (Y | \mathcal{G}) \text{ a.s.}
  • [2]: If GG\mathcal{G} ' \subset \mathcal{G} then E(XG)=E(E(XG)G)=E(E(XG)G) \begin{align*} E (X | \mathcal{G} ') =& E \left( E ( X | \mathcal{G}) | \mathcal{G} ' \right) \\ =& E \left( E ( X | \mathcal{G} ') | \mathcal{G} \right) \end{align*}

  • G\mathcal{G} being a sub-sigma field of F\mathcal{F} means both are sigma fields of Ω\Omega, but GF\mathcal{G} \subset \mathcal{F}.
  • XX being a G\mathcal{G}-measurable function means for every Borel set BB(R)B \in \mathcal{B}(\mathbb{R}), we have X1(B)GX^{-1} (B) \in \mathcal{G}.

Description

When dealing with conditional expectations, a sigma field can be viewed as ‘information’ about a random variable. Especially, the smoothing property should be understood more intuitively than being obsessed over a mathematical proof:

  • [1]: The fact that XX, not being a scalar, can pass through EE is not only astonishing but also implies it can conveniently be used anywhere. That XX is G\mathcal{G}-measurable means the sigma field G\mathcal{G} has all the information about XX. Since it already knows XX itself, there’s no need to calculate its expectation; just pass through EE. Mathematically, XX is not a scalar, but at least when G\mathcal{G} is given, XX becomes a determined value—a scalar.
  • [2]: That G\mathcal{G} is a sub-sigma field of F\mathcal{F} can be understood as meaning G\mathcal{G} has less information than F\mathcal{F}. Looking at the formulas, it’s seen that regardless of the order in which expectations are taken, the result will be that of the one with less information. To interpret this intuitively:
    • E(XG)=E(E(XG)G)E (X | \mathcal{G} ') = E \left( E ( X | \mathcal{G}) | \mathcal{G} ' \right): Even if G\mathcal{G} provides a lot of information about XX, the lack of information from G\mathcal{G} ' results in obtaining an expected value at the level of G\mathcal{G} ' .
    • E(XG)=E(E(XG)G)E (X | \mathcal{G} ') = E \left( E ( X | \mathcal{G} ') | \mathcal{G} \right): GG\mathcal{G} ' \subset \mathcal{G} essentially means that whatever information G\mathcal{G} ' knows about XX is already known by G\mathcal{G}, hence, the expectation obtained is at the level of G\mathcal{G} ' .

Proof

[1]

Strategy: Start with indicator functions, generalize to simple functions, and use the trick of representing arbitrary functions with non-negative functions to push towards the case of positives.


Part 1. MGM \in \mathcal{G}, X=1MX = \mathbb{1}_{M}

For all AGA \in \mathcal{G}, AE(XYG)dP=AXYdP=A1MYdP=AMYdP=AME(YG)dP=A1ME(YG)dP=AXE(YG)dP \begin{align*} \int_{A} E ( XY | \mathcal{G} ) dP =& \int_{A} XY dP \\ =& \int_{A} \mathbb{1}_{M} Y dP \\ =& \int_{A \cap M} Y dP \\ =& \int_{A \cap M} E(Y | \mathcal{G}) dP \\ =& \int_{A} \mathbb{1}_{M} E(Y | \mathcal{G}) dP \\ =& \int_{A} X E(Y | \mathcal{G}) dP \end{align*} Since AF,Afdm=0    f=0 a.e.\displaystyle \forall A \in \mathcal{F}, \int_{A} f dm = 0 \iff f = 0 \text{ a.e.}, E(XYG)=XE(YG) a.s. E ( XY | \mathcal{G} ) = X E(Y | \mathcal{G}) \text{ a.s.}


Part 2. MGM \in \mathcal{G}, X=i=1nai1Mi\displaystyle X = \sum_{i=1}^{n} a_{i} \mathbb{1}_{M_{i}}

E(XYG)=E(i=1nai1MiYG)=i=1naiE(1MiYG) \begin{align*} E(XY | \mathcal{G} ) =& E( \sum_{i=1}^{n} a_{i} \mathbb{1}_{M_{i}} Y | \mathcal{G} ) \\ =& \sum_{i=1}^{n} a_{i} E( \mathbb{1}_{M_{i}} Y | \mathcal{G} ) \end{align*} Here, by means of Part 1., since E(1MiYG)=1MiE(YG)E( \mathbb{1}_{M_{i}} Y | \mathcal{G} ) = \mathbb{1}_{M_{i}} E( Y | \mathcal{G} ), E(XYG)=i=1naiE(1MiYG)=i=1nai1MiE(YG)=XE(YG) a.s. \begin{align*} E(XY | \mathcal{G} ) =& \sum_{i=1}^{n} a_{i} E( \mathbb{1}_{M_{i}} Y | \mathcal{G} ) \\ =& \sum_{i=1}^{n} a_{i} \mathbb{1}_{M_{i}} E( Y | \mathcal{G} ) \\ =& X E(Y | \mathcal{G}) \text{ a.s.} \end{align*}


Part 3. X0X \ge 0, Y0Y \ge 0

Define a sequence of simple functions {Xn}nN\left\{ X_{n} \right\}_{n \in \mathbb{N}} satisfying XnXX_{n} \nearrow X for XX as follows. Xn:=k=1n2nk12n1(k12nX<k2n) X_{n} := \sum_{k=1}^{n 2^n } {{k-1} \over {2^n}} \mathbb{1}_{ \left( {{k-1} \over {2^n}} \le X < {{k} \over {2^n}} \right)} Then XnX_{n} is also G\mathcal{G}-measurable, and XnYXYX_{n} Y \nearrow XY applies. Since XnX_{n} can pass through EE according to Part 2, by the conditional monotone convergence theorem, E(XYG)=E(limnXnYG)=limnE(XnYG)=limnXnE(YG)=XE(YG) a.s. \begin{align*} E(XY | \mathcal{G} ) =& E \left( \lim_{n \to \infty} X_{n} Y | \mathcal{G} \right) \\ =& \lim_{n \to \infty} E \left( X_{n} Y | \mathcal{G} \right) \\ =& \lim_{n \to \infty} X_{n} E \left( Y | \mathcal{G} \right) \\ =& X E(Y | \mathcal{G}) \text{ a.s.} \end{align*}


Part 4. X0X \ge 0

Assume Y:=Y+YY:= Y^{+} - Y^{-} then, according to Part 3, E(XYG)=E(XY+G)E(XYG)=XE(Y+G)XE(YG)=XE(YG) a.s. \begin{align*} E(XY | \mathcal{G} ) =& E(XY^{+} | \mathcal{G} ) - E(XY^{-} | \mathcal{G} ) \\ =& XE(Y^{+} | \mathcal{G} ) - XE(Y^{-} | \mathcal{G} ) \\ =& X E(Y | \mathcal{G}) \text{ a.s.} \end{align*}

Part 5. Others

Assume X:=X+XX := X^{+} - X^{-} then, according to Part 4, E(XYG)=E(X+YG)E(XYG)=X+E(YG)XE(YG)=XE(YG) a.s. \begin{align*} E(XY | \mathcal{G} ) =& E(X^{+}Y | \mathcal{G} ) - E(X^{-}Y | \mathcal{G} ) \\ =& X^{+}E(Y | \mathcal{G} ) - X^{-}E(Y | \mathcal{G} ) \\ =& X E(Y | \mathcal{G}) \text{ a.s.} \end{align*}

[2]

Part 1. E(XG)=E(E(XG)G)E (X | \mathcal{G} ') = E \left( E ( X | \mathcal{G}) | \mathcal{G} ' \right)

For all AGA \in \mathcal{G} ' , AE(XG)dP=AXdP=AE(XG)dP=AE(E(XG)G)dP \begin{align*} \int_{A} E (X | \mathcal{G} ') dP =& \int_{A} X dP \\ =& \int_{A} E(X | \mathcal{G}) dP \\ =& \int_{A} E \left( E(X | \mathcal{G}) | \mathcal{G} ' \right) dP \end{align*} Since AF,Afdm=0    f=0 a.e.\displaystyle \forall A \in \mathcal{F}, \int_{A} f dm = 0 \iff f = 0 \text{ a.e.}, E(XG)=E(E(XG)G) E (X | \mathcal{G} ') = E \left( E ( X | \mathcal{G}) | \mathcal{G} ' \right)


Part 2. E(XG)=E(E(XG)G)E (X | \mathcal{G} ') = E \left( E ( X | \mathcal{G} ') | \mathcal{G} \right)

Since GG\mathcal{G} ' \subset \mathcal{G}, by [1] E(XG)=E(XG)E(1G)=E(E(XG)1G) \begin{align*} E (X | \mathcal{G} ') =& E (X | \mathcal{G} ') \cdot E (1 | \mathcal{G}) \\ =& E \left( E ( X | \mathcal{G} ') \cdot 1 | \mathcal{G} \right) \end{align*}