logo

Conditional Variance Defined by Measure Theory 📂Probability Theory

Conditional Variance Defined by Measure Theory

Definition

Given a probability space $( \Omega , \mathcal{F} , P)$ and a sub sigma field $\mathcal{G} \subset \mathcal{F}$, let $X$ and $Y$ be random variables.

The following defined $\text{Var}$ is called the variance of $X$ given $\mathcal{G}$. $$ \text{Var} ( X | \mathcal{G}) := E \left[ (X - E(X | \mathcal{G}))^2 | \mathcal{G} \right] $$


  • That $\mathcal{G}$ is a sub sigma field of $\mathcal{F}$ means both are sigma fields of $\Omega$, implying $\mathcal{G} \subset \mathcal{F}$.

Theorem

  • [1]: $\text{Var}( X |\mathcal{G}) = E(X^2 | \mathcal{G}) - \left[ E(X | \mathcal{G}) \right]^2$
  • [2]: $\text{Var}(X) = E \left( \text{Var}(X | \mathcal{G}) \right) + \text{Var}(E(X | \mathcal{G}))$

Explanation

Variance is generally harder than the mean, and conditional things are confusing, but fortunately, once measure theory is introduced, formula development becomes easier than just using basic tools.

What to remember is that the properties of variance we’ve comfortably used remain, whether conditional or not, and not to forget $| \mathcal{G}$. Sometimes, it’s confusing and you might end up writing something like $E \left[ (X - E(X | \mathcal{G}))^2 \right]$, but this is calculated under $\mathcal{G}$ alone, and $E \left[ (X - E(X | \mathcal{G}))^2 \right]$ turns out to be garbage value calculated out of nowhere from the entire $\mathcal{F}$.

  • [2]: It is used as a trick in many places for formula development. For an intuitive understanding of this, consider the binomial form as follows. $$ V( E ( X | \mathcal{G} )) = V(X) - E ( V( X | \mathcal{G} )) $$ The concepts of expectation and variance are not exactly that, but in general, the parameters of interest in statistics are the population mean $\mu$, and variance is a measure of how different an estimate $\overline{X}$ we make might be from the actual population mean—a Risk. Here, the sigma field $\mathcal{G}$ can be thought of as a set of information given about the random variable $X$, and since $E(V(X|\mathcal{G})) \ge 0$, $V( E ( X | \mathcal{G} )) $ can be understood as having reduced risk on the original $V(X)$ with information $\mathcal{G}$ given. This coincides with the intuition that more information leads to more accurate predictions.

Proof

[1]

Strategy: The proof method is as straightforward as when there is no condition, but this proof process is extremely important. The intermediate $E(X |\mathcal{G})$ we will see is $\mathcal{G}$-measurable according to the definition of conditional expectation. $E(X |\mathcal{G})$ itself is also a random variable but being $\mathcal{G}$-measurable means that when $\mathcal{G}$ is given, all its information is known, so in essence, it is treated as a scalar $\mu = E(X |\mathcal{G})$ whose value is determined.


Smoothing property: If $X$ is $\mathcal{G}$-measurable then $$E(XY | \mathcal{G}) = X E (Y | \mathcal{G}) \text{ a.s.}$$

Since $E(X |\mathcal{G})$ is $\mathcal{G}$-measurable, according to the smoothing property $$ \begin{align*} E \left[ (X - E(X | \mathcal{G}))^2 | \mathcal{G} \right] =& E \left[ X^2 -2 X E(X | \mathcal{G}) + \left( E(X | \mathcal{G}) \right)^2 | \mathcal{G} \right] \\ =& E (X^2 | \mathcal{G}) -2 E \left[ X E(X | \mathcal{G}) | \mathcal{G} \right] + E \left[ \left( E (X | \mathcal{G}) \right)^2 | \mathcal{G} \right] \\ =& E (X^2 | \mathcal{G}) -2 E(X | \mathcal{G}) E \left[ X | \mathcal{G} \right] + \left( E (X | \mathcal{G}) \right)^2 \\ =& E (X^2 | \mathcal{G}) - \left( E (X | \mathcal{G}) \right)^2 \end{align*} $$

[2]

Property of conditional expectation: For all sigma fields $\mathcal{G}$ $$E \left[ E ( X | \mathcal{G} ) \right] = E(X)$$

$$ \begin{align*} E( \text{Var}(X | \mathcal{G}) =& E \left( E (X^2 | \mathcal{G}) - \left( E (X | \mathcal{G}) \right)^2 \right) \\ =& E \left( X^2 - \left( E (X | \mathcal{G}) \right)^2 \right) \\ =& E \left( X^2 \right) - E \left( E \left( X | \mathcal{G} \right)^2 \right) \end{align*} $$ By [1] $$ \begin{align*} \text{Var}(E(X | \mathcal{G})) =& E \left( E \left( X | \mathcal{G} \right)^2 \right) - E \left( E (X | \mathcal{G}) \right)^2 \\ =& E \left( E \left( X | \mathcal{G} \right)^2 \right) - \left( E (X ) \right)^2 \end{align*} $$ Adding both sides of the equations we obtained $$ \begin{align*} E \left( \text{Var}(X | \mathcal{G} ) \right) + \text{Var}(E(X | \mathcal{G})) =& E \left( X^2 \right) - \left( E (X ) \right)^2 \\ =& \text{Var}(X) \end{align*} $$