logo

How to Sample Univariate Probability Variables 📂Mathematical Statistics

How to Sample Univariate Probability Variables

Overview

A method for obtaining a specific realization of a random variable.

Theorem

Let the cumulative distribution function $F = F_{T}$ of a univariate random variable $T$ be an increasing function. Then, for a probability variable $U \sim U \left( 0,1 \right)$ that follows a uniform distribution, the following holds: $$ T = F^{-1} \left( U \right) $$

Explanation

Considering that the range of the cumulative distribution function is always $[0,1]$ seems obvious and ingenious at the same time. Whatsoever between $0$ and $1$, picking just one value will inevitably fall into the range of $F$, and finding the corresponding inverse allows for sampling any distribution as $T$. Strictly speaking, it’s not possible for all distributions, but nevertheless, the idea of inversely finding points where $U$ falls is effective for any distribution.

How do you find $U$?

If you have $U$, you can find the cumulative distribution function of the desired univariate random variable, but ironically, this method cannot be used to pick $U$ itself, so $U$ needs to be sampled differently.

The analog way to draw a realization of $U$ that follows a uniform distribution is as follows:

  • (1) Prepare a random number table (a table where numbers from 0 to 9 are written randomly).
  • (2) Discard the part of the table before the given seed number $s$, and start reading from $s$.
  • (3) Write down to a sufficiently long length $n$. For example, if it’s decided to write down to the $n=4$th place from 34920 73125, then it becomes 3492.
  • (4) Just add $0.$ to the front. Continuing the example, you get $u = 0.3492$.

The digital way to draw a realization of $U$ that follows a uniform distribution is as follows:

  • (1) Prepare a pseudo-random number generator (an algorithm that returns 0 and 1 as pseudo-random numbers).
  • (2) Prepare sufficiently long $n$-bit memory.
  • (3) According to the seed number $s$, store either $0$ or $1$ in the memory up to $n$ bits. For example, if it’s 10110 10011 for $n=4$-bit, then it’s 1011.
  • (4) Convert the binary numbers written in memory to decimal and divide by $\left( 2^{n}-1 \right)$. Continuing the example, it’s $1011_{2} = 11$, and since $2^{4}-1 = 15$, you get $u = 0.733 \cdots$.

Proof

It is sufficient to show $P \left( F^{-1} (U) \le t \right) = F(t)$. Since $F$ is an increasing function, there exists an inverse function $F^{-1}$, and for all $t \in \mathbb{R}$, $$ \begin{align*} & P \left( F^{-1} (U) \le t \right) \\ =& P \left( F \left( F^{-1} (U) \right) \le F (t) \right) \\ =& P \left( U \le F (t) \right) \\ =& F(t) \end{align*} $$ The last equality comes from the fact that $U$ is uniformly distributed.

What if $F^{-1}$ doesn’t exist?

If the support is not connected, in other words, if there is a part where the probability density function is $0$, then the $F^{-1}$ used in the proof might not exist. For example, if there is a probability density function $f$ with two peaks and the middle is entirely $0$, then $F$ is not an increasing function, and the theorem does not apply.

However, we intuitively know that it’s unlikely for our random sample to hit exactly that point. More precisely, we’re aware that the probability of exactly that point being picked is $0$; in such cumbersome cases, by bringing measure theory into account and considering the set of points in the range of $F$ that do not increase, pointing out that it constitutes a null set shows that almost surely there is no issue with sampling.

However, such consideration is overly complicated compared to the simple subject of ‘univariate random variable sampling’, and it suffices to clearly understand the principle of how to draw the desired random sample from a uniform distribution.