Probability Variables and Probability Distribution in Mathematical Statistics 📂Mathematical Statistics

Probability Variables and Probability Distribution in Mathematical Statistics

Definition ¹

Let us assume that probability $P$ is defined in the sample space $\Omega$.

A function $X : \Omega \to \mathbb{R}$ whose domain is the sample space is called a Random Variable. The range $X(\Omega)$ of a random variable is also called its Space.
A function $F_{X} : \mathbb{R} \to [0,1]$ that satisfies the following is called the Cumulative Distribution Function (cdf) of $X$. $$ F_{X}(x) = P_{X}\left( (-\infty,x] \right) = P \left( \left\{ \omega \in \Omega : X(\omega) \le x \right\} \right) $$

Discrete

D1: If the space of the random variable $X$ is a countable set, then $X$ is called a Discrete Random Variable and is said to follow a discrete probability distribution.
D2: The following $p_{X} : \mathbb{R} \to [0,1]$ is called the Probability Mass Function (pmf) of the discrete random variable $X$. $$ p_{X}(x) := P\left( X=x \right) $$
D3: $\mathcal{S}_{X} := \left\{ x \in \mathbb{R} : p_{X}(x) > 0 \right\}$ is called the Support of $X$.

Continuous

C1: If the cumulative distribution function $F_{X}$ of the random variable $X$ is continuous at all $x \in \mathbb{R}$, then $X$ is called a Continuous Random Variable and is said to follow a continuous probability distribution.
C2: A function $f_{X} : \mathbb{R} \to [0,\infty)$ that satisfies the following is called the Probability Density Function (pdf) of the continuous random variable $X$, and $X$ is said to be Absolutely Continuous. $$ F_{X}(x) = \int_{-\infty}^{x} f_{X}(t) dt $$
C3. $\mathcal{S}_{X} := \left\{ t \in \mathbb{R} : f_{X}(t) > 0 \right\}$ is called the Support of $X$.

Explanation

Support, or support set, simply put, is a collection that marks the section we are interested in. It’s not a commonly used term, but it certainly conveys what probability theory wants to express. Probability doesn’t care about something definitive, and a probability of $0$ means that it will never occur. Thus, $\mathcal{S}$ can be seen as ‘a really important set’ or ‘a set we must know’, allowing us to direct our limited energy not towards the entirety of $\Omega$ but towards $\mathcal{S}$.

Even when encountering probability in high school, teachers would emphatically state that ‘a random variable is a function’. However, genuinely conceptualizing and treating random variables as functions requires a higher level of abstraction. Although the definitions introduced here are not yet mathematically strict, describing the concept of probability with sets and functions is not an easy task. Don’t despair if you don’t understand immediately, and don’t gloss over it if you think you do.

From the definitions, one can notice an essential difference between discrete and continuous random variables, which extends into a formal difference. At the undergraduate level, it can be confusing, but it is crucial to understand that the addition of a Jacobian happens only when dealing with continuous random variables.

Theorem

For a continuous random variable $X$ with the support $\mathcal{S}_{X}$ and a differentiable injective function $g$, if we define a random variable $Y$ as $Y:=g(X)$, then the probability density function of $Y$ is derived as follows with respect to $y \in \mathcal{S}_{Y}$. [ NOTE: In fact, since $g$ is not assumed to be bijective, the existence of inverse function $g^{-1}$ is not always guaranteed. ] $$ f_{Y} (y) = f_{X} \left( g^{-1}(y) \right) \left| {{ d x } \over { d y }} \right| $$

Here, $\mathcal{S}_{Y}$ is the support of $Y$, and $x$ means $x = g^{-1}(y)$.

Proof

$g$ is injective and continuous, so it is either increasing or decreasing. Let’s think about it in cases.

Case 1. If $g$ is increasing $$ \begin{align*} F_{Y}(y) =& P \left( Y \le y \right) \\ =& P \left( g(X) \le y \right) \\ =& P \left( X \le g^{-1}(y) \right) \\ =& F_{X}\left( g^{-1}(y) \right) \end{align*} $$ According to the fundamental theorem of calculus, the probability density function of $Y$ is $$ \begin{align*} f_{Y}(y) =& {{ d } \over { d y }} F_{Y}(y) \\ =& {{ d } \over { d y }} \int_{-\infty}^{x} f_{X}(t) dt \\ =& {{ d } \over { d x }} \int_{-\infty}^{x} f_{X}(t) dt {{ d x } \over { d y }} \\ =& f_{X} \left( x \right) {{ d x } \over { d y }} \\ =& f_{X} \left( g^{-1} (y) \right) {{ d x } \over { d y }} \end{align*} $$ Since $g$ is increasing, $\displaystyle {{ d x } \over { d y }} = {{ d g^{-1}(y) } \over { d y }} >0$, and therefore $$ {{ d x } \over { d y }} = \left| {{ d x } \over { d y }} \right| $$

Case 2. If $g$ is decreasing $$ \begin{align*} F_{Y}(y) =& P \left( Y \le y \right) \\ =& P \left( g(X) \le y \right) \\ =& P \left( X \ge g^{-1}(y) \right) \\ =& 1- F_{X}\left( g^{-1}(y) \right) \end{align*} $$ Similarly, $\displaystyle f_{Y}(y) = - f_{X} \left( g^{-1} (y) \right) {{ d x } \over { d y }}$. Since $g$ is decreasing, $\displaystyle {{ d x } \over { d y }} < 0$, and therefore $$ - {{ d x } \over { d y }} = \left| {{ d x } \over { d y }} \right| $$

■

Strict Definition

Hogg et al. (2013). Introduction to Mathematical Statistics(7th Edition): p32~41. ↩︎