logo

Mean and Variance of the Geometric Distribution 📂Probability Distribution

Mean and Variance of the Geometric Distribution

Formula

XGeo(p)X \sim \text{Geo} (p) Area E(X)=1pVar(X)=1pp2 E(X) = {{ 1 } \over { p }} \\ \Var(X) = {{ 1-p } \over { p^{2} }}

Derivation

The mean and variance of the Geometric Distribution are not as easily calculated as one might think. This post introduces two interesting and useful proofs.

The discrete probability distribution that has the following probability mass function is called a Geometric Distribution, according to the definition of Geometric Distributionp(0,1]p \in (0,1]: p(x)=p(1p)x1,x=1,2,3, p(x) = p (1 - p)^{x-1} \qquad , x = 1 , 2, 3, \cdots

First Method

Strategy: Use the formula for a geometric series and differentiation.

Mean

E(X)=x=1xp(1p)x1 E(X)=\sum _{ x=1 }^{ \infty }{ xp { (1-p) }^{ x-1 } } Let f(p):=x=0(1p)x\displaystyle f(p): =\sum _{ x=0 }^{ \infty }{ { (1-p) }^{ x } } be f(p)=11(1p)=1p f(p)=\frac { 1 }{ 1-(1-p) }=\frac { 1 }{ p } Differentiating pp, according to the formula for a geometric series, f(p)=1p2 f '(p)=-\frac { 1 }{ { p }^{ 2 } } On the other hand, differentiating the geometric series directly gives f(p)=x=1x(1p)x1 f ' (p)=\sum _{ x=1 }^{ \infty }{ {-x { (1-p) }^{ x-1 } } } Therefore, 1p2=x=1x(1p)x1    1p=px=1x(1p)x1    1p=x=1xp(1p)x1=E(X) \begin{align*} & -\frac { 1 }{ { p }^{ 2 } }=-\sum _{ x=1 }^{ \infty }{ x { (1-p) }^{ x-1 } } \\ \implies& \frac { 1 }{ p }=p\sum _{ x=1 }^{ \infty }{ x { (1-p) }^{ x-1 } } \\ \implies& \frac { 1 }{ p }=\sum _{ x=1 }^{ \infty }{ xp { (1-p) }^{ x-1 } }=E(X) \end{align*} Thus E(X)=1p\displaystyle E(X)=\frac { 1 }{ p }

Variance

V(X)=E(X2)E(X)2=x=1x2p(1p)x11p2 V(X)=E({ X }^{ 2 })-{ {E(X)} }^{ 2 }=\sum _{ x=1 }^{ \infty }{ { x^ 2} { p { (1-p) }^{ x-1 } }-\frac { 1 }{ { p }^{ 2 } } } Therefore, it suffices to find E(X2)=x=1x2p(1p)x1\displaystyle E({ X }^{ 2 })=\sum _{ x=1 }^{ \infty }{ { x^2 }{ p { (1-p) }^{ x-1 } } }.

Likewise, let f(p):=x=0(1p)x\displaystyle f(p) :=\sum _{ x=0 }^{ \infty }{ { (1-p) }^{ x } } be f(p)=11(1p)=1pf(p)=1p2f(p)=2p3 f(p)=\frac { 1 }{ 1-(1-p) }=\frac { 1 }{ p } \\ f '(p) = - \frac { 1 }{ p^{2} } \\ f ''(p)=\frac { 2 }{ { p }^{ 3 } } On the other hand, f(p)=x=1x(x1)(1p)x2\displaystyle f ''(p)=\sum _{ x=1 }^{ \infty }{ x(x-1) { (1-p) }^{ x-2 } } also serves, 2p3=x=1x(x1)(1p)x2    2p3=x=1x2(1p)x2x=1x(1p)x2    p2p3=px=1x2(1p)x2px=1x(1p)x2    2p2=x=1x2p(1p)x2x=1xp(1p)x2    2p2=11px=1x2p(1p)x111px=1xp(1p)x1    2(1p)p2=E(X2)1p    E(X2)=2pp2 \begin{align*} & \frac { 2 }{ { p }^{ 3 } }=\sum _{ x=1 }^{ \infty }{ x(x-1) { (1-p) }^{ x-2 } } \\ \implies& \frac { 2 }{ { p }^{ 3 } }=\sum _{ x=1 }^{ \infty }{ { x^2 } { { (1-p) }^{ x-2 } }-\sum _{ x=1 }^{ \infty }{ x { (1-p) }^{ x-2 } } } \\ \implies& p\frac { 2 }{ { p }^{ 3 } }=p\sum _{ x=1 }^{ \infty }{ { x^2 } { { (1-p) }^{ x-2 } }-p\sum _{ x=1 }^{ \infty }{ x { (1-p) }^{ x-2 } } } \\ \implies& \frac { 2 }{ { p }^{ 2 } }=\sum _{ x=1 }^{ \infty }{ { x^2 } { p { (1-p) }^{ x-2 } }-\sum _{ x=1 }^{ \infty }{ xp { (1-p) }^{ x-2 } } } \\ \implies& \frac { 2 }{ { p }^{ 2 } }=\frac { 1 }{ 1-p }\sum _{ x=1 }^{ \infty }{ { x^2 } { p { (1-p) }^{ x-1 } }-\frac { 1 }{ 1-p }\sum _{ x=1 }^{ \infty }{ xp { (1-p) }^{ x-1 } } } \\ \implies& \frac { 2(1-p) }{ { p }^{ 2 } }=E({ X }^{ 2 })-\frac { 1 }{ p } \\ \implies& E({ X }^{ 2 })=\frac { 2-p }{ { p }^{ 2 } } \end{align*} Therefore, V(X)=1pp2\displaystyle V(X)=\frac { 1-p }{ { p }^{ 2 } }

Second Method

Strategy: Use the forgetfulness of the geometric distribution. In a way, it feels like skipping complex formulas and explaining with words, but it might feel more difficult for some people.

Mean

E(X)=1P( 첫번째 시행에서 성공 )+E(Y+1)P(첫번째 시행에서 실패) E(X)=1 \cdot P(\text{ 첫번째 시행에서 성공 })+E(Y+1)\cdot P( \text{첫번째 시행에서 실패}) According to the definition of expectation, the expectation E(X)E(X) is the sum of the product of the probability that the first trial is successful and the number of trials in that case, 11, and the product of the probability that the first trial fails and the expectation in that scenario, E(Y+1)E(Y+1). Of course, the mentioned YY follows the same rule as XX, which in turn is based on Geo(p)\text{Geo} (p). Whether the first trial is successful or not, the geometric distribution is forgetful, so it starts from the beginning, and YY has an added correction of 11 separately. If we state it cleanly again, it goes as follows. E(X)=1p+E(Y+1)(1p) E(X)=1\cdot p+E(Y+1)\cdot (1-p) But E(Y+1)E(Y+1) can be represented by E(Y+1)=E(Y)+E(1)=E(Y)+1E(Y+1)=E(Y)+E(1)=E(Y)+1, and since XGeo(p)X \sim \text{Geo} (p) and YGeo(p)Y \sim \text{Geo} (p), E(Y)=E(X) E(Y)=E(X) Organizing E(X)=p+E(X)+1(1p)\displaystyle E(X)=p+{E(X)+1}(1-p) with respect to E(X)E(X) gives: E(X)=1p E(X)=\frac { 1 }{ p }

Variance

E(X2)=1p+E((Y+1)2)(1p)=p+E(X2)+2E(X)+1(1p)=p+E(X2)+2E(X)+1pE(X2)2pE(X)p \begin{align*} E({ X }^{ 2 }) =& 1\cdot p+E({ (Y+1) }^{ 2 })\cdot (1-p) \\ &=p+{E({ X }^{ 2 })+2E(X)+1}(1-p) \\ &=p+E({ X }^{ 2 })+2E(X)+1-pE({ X }^{ 2 })-2pE(X)-p \end{align*} To neatly arrange, 0=2E(X)+1pE(X2)2pE(X) 0=2E(X)+1-pE({ X }^{ 2 })-2pE(X) Binomially scaling the 22th moment gives pE(X2)=2(1p)E(X)+1=2(1p)1p+1=2pp \begin{align*} pE({ X }^{ 2 }) =& 2(1-p)E(X)+1 \\ &=2(1-p)\frac { 1 }{ p }+1 \\ =& \frac { 2-p }{ p } \end{align*} Dividing both sides by pp gives E(X2)=2pp2 E({ X }^{ 2 })=\frac { 2-p }{ { p }^{ 2 } } Therefore, V(X)=1pp2\displaystyle V(X)=\frac { 1-p }{ { p }^{ 2 } }