logo

Probability Distributions under Conditional Probability in Mathematical Statistics 📂Mathematical Statistics

Probability Distributions under Conditional Probability in Mathematical Statistics

Definition1

  1. For a discrete random vector (X,Y)(X, Y), let pX,Yp_{X, Y} be the joint probability mass function of (X,Y)(X, Y). Let pXp_{X} be the marginal probability mass function of XX. In this case, the following pYXp_{Y | X}, given Y=yY = y, is called the conditional probability mass function of XX. pYX(yx)=pX,Y(x,y)pX(x) p_{Y | X} (y | x) = \dfrac{p_{X, Y}(x, y)}{p_{X}(x)}

  2. For a continuous random vector (X,Y)(X, Y), let fX,Yf_{X, Y} be the joint probability density function of (X,Y)(X, Y). Let fXf_{X} be the marginal probability density function of YY. In this case, the following fYXf_{Y | X}, given Y=yY = y, is called the conditional probability density function of XX.

fYX(yx)=fX,Y(x,y)fX(x) f_{Y | X} (y | x) = \dfrac{f_{X, Y}(x, y)}{f_{X}(x)}

Generalization

  1. For a discrete random vector (X1,,Xn)(X_{1}, \dots, X_{n}), let p1,,np_{1, \dots, n} be the joint probability mass function of (X1,,Xn)(X_{1}, \dots, X_{n}). Let p1,,n1p_{1, \dots, n-1} be the joint probability mass function of X1,,Xn1X_{1}, \dots, X_{n-1}. In this case, the following pn1,,n1p_{n | 1, \dots, n-1}, given X1=x1,,Xn1=xn1X_{1} = x_{1}, \dots, X_{n-1} = x_{n-1}, is called the conditional probability mass function of XnX_{n}. pn1,,n1(xnx1,,xn1)=p1,,n(x1,,xn)p1,,n1(x1,,xn1) \begin{equation} p_{n | 1, \dots, n-1} (x_{n} | x_{1}, \dots, x_{n-1}) = \dfrac{p_{1, \dots, n}(x_{1}, \dots, x_{n})}{p_{1, \dots, n-1}(x_{1}, \dots, x_{n-1})} \end{equation}

  2. For a continuous random vector (X1,,Xn)(X_{1}, \dots, X_{n}), let f1,,nf_{1, \dots, n} be the joint probability density function of (X1,,Xn)(X_{1}, \dots, X_{n}). Let f1,,n1f_{1, \dots, n-1} be the joint probability density function of X1,,Xn1X_{1}, \dots, X_{n-1}. In this case, the following fn1,,n1f_{n | 1, \dots, n-1}, given X1=x1,,Xn1=xn1X_{1} = x_{1}, \dots, X_{n-1} = x_{n-1}, is called the conditional probability density function of XnX_{n}. fn1,,n1(xnx1,,xn1)=f1,,n(x1,,xn)f1,,n1(x1,,xn1) \begin{equation} f_{n | 1, \dots, n-1} (x_{n} | x_{1}, \dots, x_{n-1}) = \dfrac{f_{1, \dots, n}(x_{1}, \dots, x_{n})}{f_{1, \dots, n-1}(x_{1}, \dots, x_{n-1})} \end{equation}

  3. For a discrete random vector (X1,,Xn)(X_{1}, \dots, X_{n}), the following p2,,n1p_{2, \cdots , n | 1} is called the joint conditional probability mass function of X2,,Xn X_{2}, \cdots , X_{n} given X1=x1X_{1} = x_{1}. p2,,n1(x2,,xnx1)=p1,,n(x1,x2,,xn)p1(x1) \begin{equation} p_{2, \cdots , n | 1} ( x_{2} , \cdots ,x_{n} | x_{1} ) = {{ p_{1, \cdots , n}(x_{1} , x_{2} , \cdots , x_{n}) } \over { p_{1}( x_{1} ) }} \end{equation}

  4. For a continuous random vector (X1,,Xn)(X_{1}, \dots, X_{n}), the following f2,,n1f_{2, \cdots , n | 1} is called the joint conditional probability density function of X2,,Xn X_{2}, \cdots , X_{n} given X1=x1X_{1} = x_{1}. f2,,n1(x2,,xnx1)=f1,,n(x1,x2,,xn)f1(x1) \begin{equation} f_{2, \cdots , n | 1} ( x_{2} , \cdots ,x_{n} | x_{1} ) = {{ f_{1, \cdots , n}(x_{1} , x_{2} , \cdots , x_{n}) } \over { f_{1}( x_{1} ) }} \end{equation}

Expected Value

  1. Given a function uu of X2,,XnX_{2} , \cdots , X_{n}, the following is called the conditional expected value of u(X2,,Xn)u( X_{2}, \cdots , X_{n} ) given X1=x1X_{1} = x_{1}. E[u(X2,,Xn)x1]=u(x2,,xn)f2,,n1(x2,,xnx1)dx2,dxn \begin{align*} & E \left[ u \left( X_{2} , \cdots , X_{n} \right) | x_{1} \right] \\ =& \int_{-\infty}^{\infty} \cdots \int_{-\infty}^{\infty} u (x_{2} , \cdots , x_{n}) f_{2 , \cdots , n | 1} (x_{2} , \cdots, x_{n} | x_{1}) dx_{2} \cdots , dx_{n} \end{align*}

Theorem

  • [1] For convenience, let’s represent p(xnx1,,xn1)=pn1,,n1(xnx1,,xn1)p(x_{n} | x_{1}, \dots, x_{n-1}) = p_{n | 1, \dots, n-1} (x_{n} | x_{1}, \dots, x_{n-1}), p(x1,,xn)=p1,,n(x1,,xn)p(x_{1}, \dots, x_{n}) = p_{1, \dots, n}(x_{1}, \dots, x_{n}) by omitting subscripts. The following holds. The following holds. p(x1,,xn)=p(x1)p(x2x1)p(x3x2,x1)p(xnx1,,xn1) p(x_{1}, \dots, x_{n}) = p(x_{1}) p(x_{2} | x_{1}) p(x_{3} | x_{2}, x_{1}) \cdots p(x_{n} | x_{1}, \dots, x_{n-1}) or p(x1,,xn)=p(xn)p(xn1xn)p(xn2xn1,xn)p(x1x2,,xn) p(x_{1}, \dots, x_{n}) = p(x_{n}) p(x_{n-1} | x_{n}) p(x_{n-2} | x_{n-1}, x_{n}) \cdots p(x_{1} | x_{2}, \dots, x_{n})

    • [1-1] If {Xt}\left\{ X_{t} \right\} is a Markov chain, then the following holds. p(x1,,xn)=p(x1)p(x2x1)p(x3x2)p(xnxn1)=p(x1)t=2np(xtxt1) \begin{align*} p(x_{1}, \dots, x_{n}) &= p(x_{1}) p(x_{2} | x_{1}) p(x_{3} | x_{2}) \cdots p(x_{n} | x_{n-1}) \\ &= p(x_{1}) \prod_{t=2}^{n} p(x_{t} | x_{t-1}) \end{align*} Moreover, the following holds. p(x2,,xnx1)=t=2np(xtxt1) p(x_{2}, \dots, x_{n} | x_{1}) = \prod_{t=2}^{n} p(x_{t} | x_{t-1})
  • [2] p(x1,x2x3,x4)=p(x1x2,x3,x4)p(x2x3,x4)p(x_{1}, x_{2} | x_{3}, x_{4}) = p(x_{1} | x_{2}, x_{3}, x_{4}) p(x_{2} | x_{3}, x_{4}) or p(x1x2,x3,x4)=p(x1,x2x3,x4)p(x2x3,x4) p(x_{1} | x_{2}, x_{3}, x_{4}) = \dfrac{p(x_{1}, x_{2} | x_{3}, x_{4})}{p(x_{2} | x_{3}, x_{4})}

  • [3] Conditional Variance: Var(X2X1=x1)=E[(X2E(X2X1=x1))2X1=x1]=E(X22X1=x1)[E(X2X1=x1)]2 \begin{align*} \Var (X_{2} | X_{1} = x_{1}) =& E \left[ \left( X_{2} - E (X_{2} | X_{1} = x_{1}) \right)^{2} | X_{1} = x_{1} \right] \\ =& E \left( X_{2}^{2} | X_{1} = x_{1} \right) - \left[ E(X_{2} | X_{1} = x_{1}) \right]^{2} \end{align*}

  • [4]: E[E(X2X1)]=E(X2)E \left[ E (X_{2} | X_{1}) \right] = E (X_{2} )

  • [5]: If Var(X2)\Var(X_{2}) exists, then Var[E(X2X1)]Var(X2)\Var \left[ E \left( X_{2} | X_{1} \right) \right] \le \Var (X_{2})

Explanation

Conditional probability and conditional expected value belong to the most challenging parts to compute in mathematical statistics, as they do in curriculum-level studies. More calculations naturally arise with multivariate cases. Of course, the concept of conditionality is worth the complexity. While mathematical statistics rely primarily on calculus, evolving into probability theory based on measure theory makes these calculations considerably more straightforward. The key takeaway is “Don’t dismiss it, but don’t obsess over it either.”

(3)(3) and (4)(4) can be obtained by repeatedly applying (1)(1) and (2)(2).

Proof

[1]

From (1)(1), moving the denominator gives the following:

p(x1,,xn1)p(xnx1,,xn1)=p(x1,,xn) p(x_{1}, \dots, x_{n-1})p(x_{n} | x_{1}, \dots, x_{n-1}) = p(x_{1}, \dots, x_{n})     p(x1,,xn)=p(x1,,xn1)p(xnx1,,xn1) \begin{equation} \implies p(x_{1}, \dots, x_{n}) = p(x_{1}, \dots, x_{n-1})p(x_{n} | x_{1}, \dots, x_{n-1}) \end{equation}

Reapplying (5)(5) to p(x1,,xn1)p(x_{1}, \dots, x_{n-1}) gives:

p(x1,,xn)=[p(x1,,xn2)p(xn1x1,,xn2)]p(xnx1,,xn1) \begin{align*} &p(x_{1}, \dots, x_{n}) \\ &= \big[ p(x_{1}, \dots, x_{n-2}) p(x_{n-1} | x_{1}, \dots, x_{n-2}) \big] p(x_{n} | x_{1}, \dots, x_{n-1}) \end{align*}

Reapplying (5)(5) to p(x1,,xn2)p(x_{1}, \dots, x_{n-2}) results in:

p(x1,,xn)=[p(x1,,xn3)p(xn2x1,,xn3)]p(xn1x1,,xn2)p(xnx1,,xn1) \begin{align*} &p(x_{1}, \dots, x_{n}) \\ &= \big[ p(x_{1}, \dots, x_{n-3}) p(x_{n-2} | x_{1}, \dots, x_{n-3}) \big] p(x_{n-1} | x_{1}, \dots, x_{n-2}) p(x_{n} | x_{1}, \dots, x_{n-1}) \end{align*}

Continuing this repetition eventually gives:

p(x1,,xn)=p(x1)p(x2x1)p(x3x2,x1)p(xnx1,,xn1) p(x_{1}, \dots, x_{n}) = p(x_{1}) p(x_{2} | x_{1}) p(x_{3} | x_{2}, x_{1}) \cdots p(x_{n} | x_{1}, \dots, x_{n-1})

If {Xt}\left\{ X_{t} \right\} is a Markov chain, since p(xnxn1,,x1)=p(xnxn1)p(x_{n} | x_{n-1}, \dots, x_{1}) = p(x_{n} | x_{n-1}) holds, the following expression is obtained:

p(x1,,xn)=p(x1)p(x2x1)p(x3x2)p(xnxn1)=p(x1)t=2np(xtxt1) p(x_{1}, \dots, x_{n}) = p(x_{1}) p(x_{2} | x_{1}) p(x_{3} | x_{2}) \cdots p(x_{n} | x_{n-1}) = p(x_{1}) \prod_{t=2}^{n} p(x_{t} | x_{t-1})

By (3)(3), since p(x1,,xn)=p(x1)p(x2,,xnx1)p(x_{1}, \dots, x_{n}) = p(x_{1}) p(x_{2}, \dots, x_{n} | x_{1}), comparing with the expression above yields:

p(x1,,xn)=p(x1)t=2np(xtxt1)=p(x1)p(x2,,xnx1) p(x_{1}, \dots, x_{n}) = p(x_{1}) \prod_{t=2}^{n} p(x_{t} | x_{t-1}) = p(x_{1}) p(x_{2}, \dots, x_{n} | x_{1})     p(x2,,xnx1)=t=2np(xtxt1) \implies p(x_{2}, \dots, x_{n} | x_{1}) = \prod_{t=2}^{n} p(x_{t} | x_{t-1})

Also, applying indices in reverse order in the definition of conditional probability yields:

p(x1,,xn)=p(x2,,xn)p(x1x2,,xn)=p(x3,,xn)p(x2x3,,xn)p(x1x2,,xn)==p(xn)p(xn1xn)p(xn2xn1,xn)p(x1x2,,xn) \begin{align*} p(x_{1}, \dots, x_{n}) &= p(x_{2}, \dots, x_{n}) p(x_{1} | x_{2}, \dots, x_{n}) \\ &= p(x_{3}, \dots, x_{n}) p(x_{2} | x_{3}, \dots, x_{n}) p(x_{1} | x_{2}, \dots, x_{n}) \\ &= \vdots \\ &= p(x_{n}) p(x_{n-1} | x_{n}) p(x_{n-2} | x_{n-1}, x_{n}) \cdots p(x_{1} | x_{2}, \dots, x_{n}) \end{align*}

[2]

p(x1,x2x3,x4)=p(x1,x2,x3,x4)p(x3,x4)=p(x1,x2,x3,x4)p(x3,x4)p(x2,x3,x4)p(x2,x3,x4)=p(x1,x2,x3,x4)p(x2,x3,x4)p(x2,x3,x4)p(x3,x4)=p(x1x2,x3,x4)p(x2x3,x4) \begin{align*} p(x_{1}, x_{2} | x_{3}, x_{4}) &= \dfrac{p(x_{1}, x_{2}, x_{3}, x_{4})}{p(x_{3}, x_{4})} \\ &= \dfrac{p(x_{1}, x_{2}, x_{3}, x_{4})}{p(x_{3}, x_{4})} \dfrac{p(x_{2}, x_{3}, x_{4})}{p(x_{2}, x_{3}, x_{4})} \\ &= \dfrac{p(x_{1}, x_{2}, x_{3}, x_{4})}{p(x_{2}, x_{3}, x_{4})} \dfrac{p(x_{2}, x_{3}, x_{4})}{p(x_{3}, x_{4})} \\ &= p(x_{1} | x_{2}, x_{3}, x_{4}) p(x_{2} | x_{3}, x_{4}) \end{align*}

See Also


  1. Hogg et al. (2018). Introduction to Mathematical Statistcs(8th Edition): p109~111, p136~137. ↩︎