logo

How to Sample Univariate Probability Variables 📂Mathematical Statistics

How to Sample Univariate Probability Variables

Overview

A method for obtaining a specific realization of a random variable.

Theorem

Let the cumulative distribution function F=FTF = F_{T} of a univariate random variable TT be an increasing function. Then, for a probability variable UU(0,1)U \sim U \left( 0,1 \right) that follows a uniform distribution, the following holds: T=F1(U) T = F^{-1} \left( U \right)

Explanation

Considering that the range of the cumulative distribution function is always [0,1][0,1] seems obvious and ingenious at the same time. Whatsoever between 00 and 11, picking just one value will inevitably fall into the range of FF, and finding the corresponding inverse allows for sampling any distribution as TT. Strictly speaking, it’s not possible for all distributions, but nevertheless, the idea of inversely finding points where UU falls is effective for any distribution.

How do you find UU?

If you have UU, you can find the cumulative distribution function of the desired univariate random variable, but ironically, this method cannot be used to pick UU itself, so UU needs to be sampled differently.

The analog way to draw a realization of UU that follows a uniform distribution is as follows:

  • (1) Prepare a random number table (a table where numbers from 0 to 9 are written randomly).
  • (2) Discard the part of the table before the given seed number ss, and start reading from ss.
  • (3) Write down to a sufficiently long length nn. For example, if it’s decided to write down to the n=4n=4th place from 34920 73125, then it becomes 3492.
  • (4) Just add 0.0. to the front. Continuing the example, you get u=0.3492u = 0.3492.

The digital way to draw a realization of UU that follows a uniform distribution is as follows:

  • (1) Prepare a pseudo-random number generator (an algorithm that returns 0 and 1 as pseudo-random numbers).
  • (2) Prepare sufficiently long nn-bit memory.
  • (3) According to the seed number ss, store either 00 or 11 in the memory up to nn bits. For example, if it’s 10110 10011 for n=4n=4-bit, then it’s 1011.
  • (4) Convert the binary numbers written in memory to decimal and divide by (2n1)\left( 2^{n}-1 \right). Continuing the example, it’s 10112=111011_{2} = 11, and since 241=152^{4}-1 = 15, you get u=0.733u = 0.733 \cdots.

Proof

It is sufficient to show P(F1(U)t)=F(t)P \left( F^{-1} (U) \le t \right) = F(t). Since FF is an increasing function, there exists an inverse function F1F^{-1}, and for all tRt \in \mathbb{R}, P(F1(U)t)=P(F(F1(U))F(t))=P(UF(t))=F(t) \begin{align*} & P \left( F^{-1} (U) \le t \right) \\ =& P \left( F \left( F^{-1} (U) \right) \le F (t) \right) \\ =& P \left( U \le F (t) \right) \\ =& F(t) \end{align*} The last equality comes from the fact that UU is uniformly distributed.

What if F1F^{-1} doesn’t exist?

If the support is not connected, in other words, if there is a part where the probability density function is 00, then the F1F^{-1} used in the proof might not exist. For example, if there is a probability density function ff with two peaks and the middle is entirely 00, then FF is not an increasing function, and the theorem does not apply.

However, we intuitively know that it’s unlikely for our random sample to hit exactly that point. More precisely, we’re aware that the probability of exactly that point being picked is 00; in such cumbersome cases, by bringing measure theory into account and considering the set of points in the range of FF that do not increase, pointing out that it constitutes a null set shows that almost surely there is no issue with sampling.

However, such consideration is overly complicated compared to the simple subject of ‘univariate random variable sampling’, and it suffices to clearly understand the principle of how to draw the desired random sample from a uniform distribution.