Geometric Distribution
Definition 1
For $p \in (0,1]$, the discrete probability distribution $\text{Geo}(p)$ that follows the probability mass function as shown above, is called the Geometric Distribution. $$ p(x) = p (1 - p)^{x-1} \qquad , x = 1 , 2, 3, \cdots $$
- Take special care with the domain and the formula as there are two definitions used.
Basic Properties
Moment Generating Function
- [1]: $$m(t) = {{ p e^{t} } \over { 1 - (1-p) e^{t} }} \qquad , t < -\log (1-p)$$
Mean and Variance
- [2]: If $X \sim \text{Geo} (p)$ then $$ \begin{align*} E(X) =& {{ 1 } \over { p }} \\ \operatorname{Var}(X) =& {{ 1-p } \over { p^{2} }} \end{align*} $$
Sufficient Statistic and Maximum Likelihood Estimator
- [3]: Suppose a random sample $\mathbf{X} := \left( X_{1} , \cdots , X_{n} \right) \sim \text{Geo} \left( p \right)$ is given. The sufficient statistic $T$ and maximum likelihood estimator $\hat{p}$ for $p$ are as follows. $$ \begin{align*} T =& \sum_{k=1}^{n} X_{k} \\ \hat{p} =& {{ n } \over { \sum_{k=1}^{n} X_{k} }} \end{align*} $$
Theorems
Memorylessness
- [a]: If $X \sim \text{Geo} (p)$ then $$ P(X \ge s+ t ,|, X \ge s) = P(X \ge t) $$
Generalization to Geometric Distribution
- [b]: If $Y = X_{1} + \cdots + X_{r}$ and $X_{i} \overset{\text{iid}}{\sim} \text{Geo}(p)$ then $Y \sim \text{NB}(r,p)$
Explanation
Relation with Exponential Distribution
The geometric distribution is interested in how many trials it takes to achieve success with probability $0 < p \le 1$. Its probability mass function represents the probability of failing $x-1$ times with probability $(1-p)$ before finally succeeding with probability $p$. This characteristic allows it to be seen as the discretization of exponential distribution.
Naming
The distribution is called the geometric distribution because its probability mass function has the form of a geometric sequence. If we set $a := p$, $r := (1-p)$, we get a familiar formula with $p(x) = a r ^{x-1}$. Indeed, when computing the moment-generating function, the formula for a geometric series appears.
Proof
[1]
$$ \begin{align*} M(t) =& \sum_{x=1}^{\infty} e^{tx} p(x) \\ =& \sum_{x=1}^{\infty} e^{tx} p (1-p)^{x-1} \\ =& p e^{t} \sum_{x=1}^{\infty} \left[ e^{t}(1-p) \right]^{x-1} \end{align*} $$ When $ t < -\log (1-p)$, according to the formula for a geometric series, $$ p e^{t} \sum_{x=1}^{\infty} \left[ e^{t}(1-p) \right]^{x-1} = {{ p e^{t} } \over { 1 - (1-p) e^{t} }} $$
■
[2]
■
[3]
■
[a]
Deduced using conditional probability.
■
[b]
Deduced using the moment generating function.
■
Code
Below is a Julia code that shows the probability mass function of the geometric distribution as a gif.
@time using LaTeXStrings
@time using Distributions
@time using Plots
cd(@__DIR__)
x = 0:20
P = collect(0.01:0.01:0.5); append!(P, reverse(P))
animation = @animate for p ∈ P
scatter(x, pdf.(Geometric(p), x),
color = :black, markerstrokecolor = :black,
label = "p = $(rpad(p, 4, '0'))", size = (400,300))
xlims!(0,20); ylims!(0,0.3); title!(L"\mathrm{pmf\,of\,Geo}(p)")
end
gif(animation, "pmf.gif")
Hogg et al. (2013). Introduction to Mathematical Statistics (7th Edition): p145. ↩︎