Geometric Distribution 📂Probability Distribution

Geometric Distribution

Definition ¹

For $p \in (0,1]$, the discrete probability distribution $\text{Geo}(p)$ that follows the probability mass function as shown above, is called the Geometric Distribution. $$ p(x) = p (1 - p)^{x-1} \qquad , x = 1 , 2, 3, \cdots $$

Take special care with the domain and the formula as there are two definitions used.

Basic Properties

Moment Generating Function

[1]: $$m(t) = {{ p e^{t} } \over { 1 - (1-p) e^{t} }} \qquad , t < -\log (1-p)$$

Mean and Variance

[2]: If $X \sim \text{Geo} (p)$ then $$ \begin{align*} E(X) =& {{ 1 } \over { p }} \\ \Var(X) =& {{ 1-p } \over { p^{2} }} \end{align*} $$

Sufficient Statistic and Maximum Likelihood Estimator

[3]: Suppose a random sample $\mathbf{X} := \left( X_{1} , \cdots , X_{n} \right) \sim \text{Geo} \left( p \right)$ is given. The sufficient statistic $T$ and maximum likelihood estimator $\hat{p}$ for $p$ are as follows. $$ \begin{align*} T =& \sum_{k=1}^{n} X_{k} \\ \hat{p} =& {{ n } \over { \sum_{k=1}^{n} X_{k} }} \end{align*} $$

Theorems

Memorylessness

[a]: If $X \sim \text{Geo} (p)$ then $$ P(X \ge s+ t ,|, X \ge s) = P(X \ge t) $$

Generalization to Geometric Distribution

[b]: If $Y = X_{1} + \cdots + X_{r}$ and $X_{i} \overset{\text{iid}}{\sim} \text{Geo}(p)$ then $Y \sim \text{NB}(r,p)$

Explanation

Relation with Exponential Distribution

The geometric distribution is interested in how many trials it takes to achieve success with probability $0 < p \le 1$. Its probability mass function represents the probability of failing $x-1$ times with probability $(1-p)$ before finally succeeding with probability $p$. This characteristic allows it to be seen as the discretization of exponential distribution.

Naming

The distribution is called the geometric distribution because its probability mass function has the form of a geometric sequence. If we set $a := p$, $r := (1-p)$, we get a familiar formula with $p(x) = a r ^{x-1}$. Indeed, when computing the moment-generating function, the formula for a geometric series appears.

Proof

[1]

$$ \begin{align*} M(t) =& \sum_{x=1}^{\infty} e^{tx} p(x) \\ =& \sum_{x=1}^{\infty} e^{tx} p (1-p)^{x-1} \\ =& p e^{t} \sum_{x=1}^{\infty} \left[ e^{t}(1-p) \right]^{x-1} \end{align*} $$ When $ t < -\log (1-p)$, according to the formula for a geometric series, $$ p e^{t} \sum_{x=1}^{\infty} \left[ e^{t}(1-p) \right]^{x-1} = {{ p e^{t} } \over { 1 - (1-p) e^{t} }} $$

■

[2]

There are two methods.

■

[3]

Direct deduction.

■

[a]

Deduced using conditional probability.

■

[b]

Deduced using the moment generating function.

■

Code

Below is a Julia code that shows the probability mass function of the geometric distribution as a gif.

@time using LaTeXStrings
@time using Distributions
@time using Plots

cd(@__DIR__)

x = 0:20
P = collect(0.01:0.01:0.5); append!(P, reverse(P))

animation = @animate for p ∈ P
    scatter(x, pdf.(Geometric(p), x),
     color = :black, markerstrokecolor = :black,
     label = "p = $(rpad(p, 4, '0'))", size = (400,300))
    xlims!(0,20); ylims!(0,0.3); title!(L"\mathrm{pmf\,of\,Geo}(p)")
end
gif(animation, "pmf.gif")

Hogg et al. (2013). Introduction to Mathematical Statistics (7th Edition): p145. ↩︎