Probability Distributions under Conditional Probability in Mathematical Statistics 📂Mathematical Statistics

Probability Distributions under Conditional Probability in Mathematical Statistics

Definition¹

For a discrete random vector $(X, Y)$, let $p_{X, Y}$ be the joint probability mass function of $(X, Y)$. Let $p_{X}$ be the marginal probability mass function of $X$. In this case, the following $p_{Y | X}$, given $Y = y$, is called the conditional probability mass function of $X$. $$ p_{Y | X} (y | x) = \dfrac{p_{X, Y}(x, y)}{p_{X}(x)} $$
For a continuous random vector $(X, Y)$, let $f_{X, Y}$ be the joint probability density function of $(X, Y)$. Let $f_{X}$ be the marginal probability density function of $Y$. In this case, the following $f_{Y | X}$, given $Y = y$, is called the conditional probability density function of $X$.

$$ f_{Y | X} (y | x) = \dfrac{f_{X, Y}(x, y)}{f_{X}(x)} $$

Generalization

For a discrete random vector $(X_{1}, \dots, X_{n})$, let $p_{1, \dots, n}$ be the joint probability mass function of $(X_{1}, \dots, X_{n})$. Let $p_{1, \dots, n-1}$ be the joint probability mass function of $X_{1}, \dots, X_{n-1}$. In this case, the following $p_{n | 1, \dots, n-1}$, given $X_{1} = x_{1}, \dots, X_{n-1} = x_{n-1}$, is called the conditional probability mass function of $X_{n}$. $$ \begin{equation} p_{n | 1, \dots, n-1} (x_{n} | x_{1}, \dots, x_{n-1}) = \dfrac{p_{1, \dots, n}(x_{1}, \dots, x_{n})}{p_{1, \dots, n-1}(x_{1}, \dots, x_{n-1})} \end{equation} $$
For a continuous random vector $(X_{1}, \dots, X_{n})$, let $f_{1, \dots, n}$ be the joint probability density function of $(X_{1}, \dots, X_{n})$. Let $f_{1, \dots, n-1}$ be the joint probability density function of $X_{1}, \dots, X_{n-1}$. In this case, the following $f_{n | 1, \dots, n-1}$, given $X_{1} = x_{1}, \dots, X_{n-1} = x_{n-1}$, is called the conditional probability density function of $X_{n}$. $$ \begin{equation} f_{n | 1, \dots, n-1} (x_{n} | x_{1}, \dots, x_{n-1}) = \dfrac{f_{1, \dots, n}(x_{1}, \dots, x_{n})}{f_{1, \dots, n-1}(x_{1}, \dots, x_{n-1})} \end{equation} $$
For a discrete random vector $(X_{1}, \dots, X_{n})$, the following $p_{2, \cdots , n | 1}$ is called the joint conditional probability mass function of $ X_{2}, \cdots , X_{n}$ given $X_{1} = x_{1}$. $$ \begin{equation} p_{2, \cdots , n | 1} ( x_{2} , \cdots ,x_{n} | x_{1} ) = {{ p_{1, \cdots , n}(x_{1} , x_{2} , \cdots , x_{n}) } \over { p_{1}( x_{1} ) }} \end{equation} $$
For a continuous random vector $(X_{1}, \dots, X_{n})$, the following $f_{2, \cdots , n | 1}$ is called the joint conditional probability density function of $ X_{2}, \cdots , X_{n}$ given $X_{1} = x_{1}$. $$ \begin{equation} f_{2, \cdots , n | 1} ( x_{2} , \cdots ,x_{n} | x_{1} ) = {{ f_{1, \cdots , n}(x_{1} , x_{2} , \cdots , x_{n}) } \over { f_{1}( x_{1} ) }} \end{equation} $$

Expected Value

Given a function $u$ of $X_{2} , \cdots , X_{n}$, the following is called the conditional expected value of $u( X_{2}, \cdots , X_{n} )$ given $X_{1} = x_{1}$. $$ \begin{align*} & E \left[ u \left( X_{2} , \cdots , X_{n} \right) | x_{1} \right] \\ =& \int_{-\infty}^{\infty} \cdots \int_{-\infty}^{\infty} u (x_{2} , \cdots , x_{n}) f_{2 , \cdots , n | 1} (x_{2} , \cdots, x_{n} | x_{1}) dx_{2} \cdots , dx_{n} \end{align*} $$

Theorem

[1] For convenience, let’s represent $p(x_{n} | x_{1}, \dots, x_{n-1}) = p_{n | 1, \dots, n-1} (x_{n} | x_{1}, \dots, x_{n-1})$, $p(x_{1}, \dots, x_{n}) = p_{1, \dots, n}(x_{1}, \dots, x_{n})$ by omitting subscripts. The following holds. The following holds. $$ p(x_{1}, \dots, x_{n}) = p(x_{1}) p(x_{2} | x_{1}) p(x_{3} | x_{2}, x_{1}) \cdots p(x_{n} | x_{1}, \dots, x_{n-1}) $$ or $$ p(x_{1}, \dots, x_{n}) = p(x_{n}) p(x_{n-1} | x_{n}) p(x_{n-2} | x_{n-1}, x_{n}) \cdots p(x_{1} | x_{2}, \dots, x_{n}) $$
- [1-1] If $\left\{ X_{t} \right\}$ is a Markov chain, then the following holds. $$ \begin{align*} p(x_{1}, \dots, x_{n}) &= p(x_{1}) p(x_{2} | x_{1}) p(x_{3} | x_{2}) \cdots p(x_{n} | x_{n-1}) \\ &= p(x_{1}) \prod_{t=2}^{n} p(x_{t} | x_{t-1}) \end{align*} $$ Moreover, the following holds. $$ p(x_{2}, \dots, x_{n} | x_{1}) = \prod_{t=2}^{n} p(x_{t} | x_{t-1}) $$
[2] $p(x_{1}, x_{2} | x_{3}, x_{4}) = p(x_{1} | x_{2}, x_{3}, x_{4}) p(x_{2} | x_{3}, x_{4})$ or $$ p(x_{1} | x_{2}, x_{3}, x_{4}) = \dfrac{p(x_{1}, x_{2} | x_{3}, x_{4})}{p(x_{2} | x_{3}, x_{4})} $$
[3] Conditional Variance: $$ \begin{align*} \Var (X_{2} | X_{1} = x_{1}) =& E \left[ \left( X_{2} - E (X_{2} | X_{1} = x_{1}) \right)^{2} | X_{1} = x_{1} \right] \\ =& E \left( X_{2}^{2} | X_{1} = x_{1} \right) - \left[ E(X_{2} | X_{1} = x_{1}) \right]^{2} \end{align*} $$
[4]: $E \left[ E (X_{2} | X_{1}) \right] = E (X_{2} )$
[5]: If $\Var(X_{2})$ exists, then $\Var \left[ E \left( X_{2} | X_{1} \right) \right] \le \Var (X_{2})$

Explanation

Conditional probability and conditional expected value belong to the most challenging parts to compute in mathematical statistics, as they do in curriculum-level studies. More calculations naturally arise with multivariate cases. Of course, the concept of conditionality is worth the complexity. While mathematical statistics rely primarily on calculus, evolving into probability theory based on measure theory makes these calculations considerably more straightforward. The key takeaway is “Don’t dismiss it, but don’t obsess over it either.”

$(3)$ and $(4)$ can be obtained by repeatedly applying $(1)$ and $(2)$.

Proof

[1]

From $(1)$, moving the denominator gives the following:

$$ p(x_{1}, \dots, x_{n-1})p(x_{n} | x_{1}, \dots, x_{n-1}) = p(x_{1}, \dots, x_{n}) $$ $$ \begin{equation} \implies p(x_{1}, \dots, x_{n}) = p(x_{1}, \dots, x_{n-1})p(x_{n} | x_{1}, \dots, x_{n-1}) \end{equation} $$

Reapplying $(5)$ to $p(x_{1}, \dots, x_{n-1})$ gives:

$$ \begin{align*} &p(x_{1}, \dots, x_{n}) \\ &= \big[ p(x_{1}, \dots, x_{n-2}) p(x_{n-1} | x_{1}, \dots, x_{n-2}) \big] p(x_{n} | x_{1}, \dots, x_{n-1}) \end{align*} $$

Reapplying $(5)$ to $p(x_{1}, \dots, x_{n-2})$ results in:

$$ \begin{align*} &p(x_{1}, \dots, x_{n}) \\ &= \big[ p(x_{1}, \dots, x_{n-3}) p(x_{n-2} | x_{1}, \dots, x_{n-3}) \big] p(x_{n-1} | x_{1}, \dots, x_{n-2}) p(x_{n} | x_{1}, \dots, x_{n-1}) \end{align*} $$

Continuing this repetition eventually gives:

$$ p(x_{1}, \dots, x_{n}) = p(x_{1}) p(x_{2} | x_{1}) p(x_{3} | x_{2}, x_{1}) \cdots p(x_{n} | x_{1}, \dots, x_{n-1}) $$

If $\left\{ X_{t} \right\}$ is a Markov chain, since $p(x_{n} | x_{n-1}, \dots, x_{1}) = p(x_{n} | x_{n-1})$ holds, the following expression is obtained:

$$ p(x_{1}, \dots, x_{n}) = p(x_{1}) p(x_{2} | x_{1}) p(x_{3} | x_{2}) \cdots p(x_{n} | x_{n-1}) = p(x_{1}) \prod_{t=2}^{n} p(x_{t} | x_{t-1}) $$

By $(3)$, since $p(x_{1}, \dots, x_{n}) = p(x_{1}) p(x_{2}, \dots, x_{n} | x_{1})$, comparing with the expression above yields:

$$ p(x_{1}, \dots, x_{n}) = p(x_{1}) \prod_{t=2}^{n} p(x_{t} | x_{t-1}) = p(x_{1}) p(x_{2}, \dots, x_{n} | x_{1}) $$ $$ \implies p(x_{2}, \dots, x_{n} | x_{1}) = \prod_{t=2}^{n} p(x_{t} | x_{t-1}) $$

Also, applying indices in reverse order in the definition of conditional probability yields:

$$ \begin{align*} p(x_{1}, \dots, x_{n}) &= p(x_{2}, \dots, x_{n}) p(x_{1} | x_{2}, \dots, x_{n}) \\ &= p(x_{3}, \dots, x_{n}) p(x_{2} | x_{3}, \dots, x_{n}) p(x_{1} | x_{2}, \dots, x_{n}) \\ &= \vdots \\ &= p(x_{n}) p(x_{n-1} | x_{n}) p(x_{n-2} | x_{n-1}, x_{n}) \cdots p(x_{1} | x_{2}, \dots, x_{n}) \end{align*} $$

■

[2]

$$ \begin{align*} p(x_{1}, x_{2} | x_{3}, x_{4}) &= \dfrac{p(x_{1}, x_{2}, x_{3}, x_{4})}{p(x_{3}, x_{4})} \\ &= \dfrac{p(x_{1}, x_{2}, x_{3}, x_{4})}{p(x_{3}, x_{4})} \dfrac{p(x_{2}, x_{3}, x_{4})}{p(x_{2}, x_{3}, x_{4})} \\ &= \dfrac{p(x_{1}, x_{2}, x_{3}, x_{4})}{p(x_{2}, x_{3}, x_{4})} \dfrac{p(x_{2}, x_{3}, x_{4})}{p(x_{3}, x_{4})} \\ &= p(x_{1} | x_{2}, x_{3}, x_{4}) p(x_{2} | x_{3}, x_{4}) \end{align*} $$

■