logo

Probability Variables and Probability Distribution in Mathematical Statistics 📂Mathematical Statistics

Probability Variables and Probability Distribution in Mathematical Statistics

Definition 1

Let us assume that probability PP is defined in the sample space Ω\Omega.

  1. A function X:ΩRX : \Omega \to \mathbb{R} whose domain is the sample space is called a Random Variable. The range X(Ω)X(\Omega) of a random variable is also called its Space.
  2. A function FX:R[0,1]F_{X} : \mathbb{R} \to [0,1] that satisfies the following is called the Cumulative Distribution Function (cdf) of XX. FX(x)=PX((,x])=P({ωΩ:X(ω)x}) F_{X}(x) = P_{X}\left( (-\infty,x] \right) = P \left( \left\{ \omega \in \Omega : X(\omega) \le x \right\} \right)

Discrete

  • D1: If the space of the random variable XX is a countable set, then XX is called a Discrete Random Variable and is said to follow a discrete probability distribution.
  • D2: The following pX:R[0,1]p_{X} : \mathbb{R} \to [0,1] is called the Probability Mass Function (pmf) of the discrete random variable XX. pX(x):=P(X=x) p_{X}(x) := P\left( X=x \right)
  • D3: SX:={xR:pX(x)>0}\mathcal{S}_{X} := \left\{ x \in \mathbb{R} : p_{X}(x) > 0 \right\} is called the Support of XX.

Continuous

  • C1: If the cumulative distribution function FXF_{X} of the random variable XX is continuous at all xRx \in \mathbb{R}, then XX is called a Continuous Random Variable and is said to follow a continuous probability distribution.
  • C2: A function fX:R[0,)f_{X} : \mathbb{R} \to [0,\infty) that satisfies the following is called the Probability Density Function (pdf) of the continuous random variable XX, and XX is said to be Absolutely Continuous. FX(x)=xfX(t)dt F_{X}(x) = \int_{-\infty}^{x} f_{X}(t) dt
  • C3. SX:={tR:fX(t)>0}\mathcal{S}_{X} := \left\{ t \in \mathbb{R} : f_{X}(t) > 0 \right\} is called the Support of XX.

Explanation

Support, or support set, simply put, is a collection that marks the section we are interested in. It’s not a commonly used term, but it certainly conveys what probability theory wants to express. Probability doesn’t care about something definitive, and a probability of 00 means that it will never occur. Thus, S\mathcal{S} can be seen as ‘a really important set’ or ‘a set we must know’, allowing us to direct our limited energy not towards the entirety of Ω\Omega but towards S\mathcal{S}.

Even when encountering probability in high school, teachers would emphatically state that ‘a random variable is a function’. However, genuinely conceptualizing and treating random variables as functions requires a higher level of abstraction. Although the definitions introduced here are not yet mathematically strict, describing the concept of probability with sets and functions is not an easy task. Don’t despair if you don’t understand immediately, and don’t gloss over it if you think you do.

From the definitions, one can notice an essential difference between discrete and continuous random variables, which extends into a formal difference. At the undergraduate level, it can be confusing, but it is crucial to understand that the addition of a Jacobian happens only when dealing with continuous random variables.

Theorem

For a continuous random variable XX with the support SX\mathcal{S}_{X} and a differentiable injective function gg, if we define a random variable YY as Y:=g(X)Y:=g(X), then the probability density function of YY is derived as follows with respect to ySYy \in \mathcal{S}_{Y}. [ NOTE: In fact, since gg is not assumed to be bijective, the existence of inverse function g1g^{-1} is not always guaranteed. ] fY(y)=fX(g1(y))dxdy f_{Y} (y) = f_{X} \left( g^{-1}(y) \right) \left| {{ d x } \over { d y }} \right|


  • Here, SY\mathcal{S}_{Y} is the support of YY, and xx means x=g1(y)x = g^{-1}(y).

Proof

gg is injective and continuous, so it is either increasing or decreasing. Let’s think about it in cases.

Case 1. If gg is increasing FY(y)=P(Yy)=P(g(X)y)=P(Xg1(y))=FX(g1(y)) \begin{align*} F_{Y}(y) =& P \left( Y \le y \right) \\ =& P \left( g(X) \le y \right) \\ =& P \left( X \le g^{-1}(y) \right) \\ =& F_{X}\left( g^{-1}(y) \right) \end{align*} According to the fundamental theorem of calculus, the probability density function of YY is fY(y)=ddyFY(y)=ddyxfX(t)dt=ddxxfX(t)dtdxdy=fX(x)dxdy=fX(g1(y))dxdy \begin{align*} f_{Y}(y) =& {{ d } \over { d y }} F_{Y}(y) \\ =& {{ d } \over { d y }} \int_{-\infty}^{x} f_{X}(t) dt \\ =& {{ d } \over { d x }} \int_{-\infty}^{x} f_{X}(t) dt {{ d x } \over { d y }} \\ =& f_{X} \left( x \right) {{ d x } \over { d y }} \\ =& f_{X} \left( g^{-1} (y) \right) {{ d x } \over { d y }} \end{align*} Since gg is increasing, dxdy=dg1(y)dy>0\displaystyle {{ d x } \over { d y }} = {{ d g^{-1}(y) } \over { d y }} >0, and therefore dxdy=dxdy {{ d x } \over { d y }} = \left| {{ d x } \over { d y }} \right|


Case 2. If gg is decreasing FY(y)=P(Yy)=P(g(X)y)=P(Xg1(y))=1FX(g1(y)) \begin{align*} F_{Y}(y) =& P \left( Y \le y \right) \\ =& P \left( g(X) \le y \right) \\ =& P \left( X \le g^{-1}(y) \right) \\ =& 1- F_{X}\left( g^{-1}(y) \right) \end{align*} Similarly, fY(y)=fX(g1(y))dxdy\displaystyle f_{Y}(y) = - f_{X} \left( g^{-1} (y) \right) {{ d x } \over { d y }}. Since gg is decreasing, dxdy<0\displaystyle {{ d x } \over { d y }} < 0, and therefore dxdy=dxdy - {{ d x } \over { d y }} = \left| {{ d x } \over { d y }} \right|

Strict Definition


  1. Hogg et al. (2013). Introduction to Mathematical Statistics(7th Edition): p32~41. ↩︎