logo

Moment Method 📂Mathematical Statistics

Moment Method

Definition 1

When the parameters of a given distribution are unknown, the method of forming simultaneous equations for the parameters using moments and considering the solution to these equations as estimates of the parameters is known as the Moment Method.

Description

The moment method has been a point estimation technique used for a long time since the 1800s by Karl Pearson and others. Although it has not always produced very good results in many cases, because it is the simplest and easiest approach, it is worth trying first in any study.

Example: Crime Rate

Suppose we have a random sample X1,,XnX_{1} , \cdots , X_{n} following a binomial distribution B(k,p)B \left( k, p \right). The binomial distribution can be very easily and simply analyzed given kk and pp, but let’s consider the case where we only have the data without knowing kk and pp. This can be applied, for example, to crime rates, especially in cases of sexual crimes where there have been incidents that have not been reported. Assuming kk represents the total number of incidents and pp represents the probability of reporting, the number of reported incidents can be understood as a distribution that the actual data of reported cases follows – the binomial distribution B(k,p)B \left( k, p \right).

We know that the 11th moment is related to the mean, and the 22th moment is related to the variance. By forming simultaneous equations, m1:=1ni=1nXi=kpm2:=1ni=1nXi2=kp(1p)+k2p2 \begin{align*} m_{1} := {{ 1 } \over { n }} \sum_{i=1}^{n} X_{i} =& kp \\ m_{2} := {{ 1 } \over { n }} \sum_{i=1}^{n} X_{i}^{2} =& kp(1-p) + k^{2} p^{2} \end{align*} we obtain. Solving these equations, the estimate of pp, p^\hat{p}, in relation to the estimate of kk, k^\hat{k}, is p^=1k1ni=1nXi=Xk^ \hat{p} = {{ 1 } \over { k }} {{ 1 } \over { n }} \sum_{i=1}^{n} X_{i} = {{ \overline{X} } \over { \hat{k} }} and k^\hat{k} is 1ni=1nXi2=kp(1p)+k2p2    1ni=1nXi2=1ni=1nXi(1p)+(1ni=1nXi)2    1ni=1nXi2=1ni=1nXi(1Xk^)+(1ni=1nXi)2    m2=m1(1m1k^)+m12    (m2m12)m1=1m1k^    m1k^=m1(m2m12)m1    k^=m12m1(m2m12) \begin{align*} & {{ 1 } \over { n }} \sum_{i=1}^{n} X_{i}^{2} = kp(1-p) + k^{2} p^{2} \\ \implies & {{ 1 } \over { n }} \sum_{i=1}^{n} X_{i}^{2} = {{ 1 } \over { n }} \sum_{i=1}^{n} X_{i} \cdot (1-p) + \left( {{ 1 } \over { n }} \sum_{i=1}^{n} X_{i} \right)^{2} \\ \implies & {{ 1 } \over { n }} \sum_{i=1}^{n} X_{i}^{2} = {{ 1 } \over { n }} \sum_{i=1}^{n} X_{i} \cdot \left( 1 - {{ \overline{X} } \over { \hat{k} }} \right) + \left( {{ 1 } \over { n }} \sum_{i=1}^{n} X_{i} \right)^{2} \\ \implies & m_{2} = m_{1} \left( 1 - {{ m_{1} } \over { \hat{k} }} \right) + m_{1}^{2} \\ \implies & {{ \left( m_{2} - m_{1}^{2} \right) } \over { m_{1} }} = 1 - {{ m_{1} } \over { \hat{k} }} \\ \implies & {{ m_{1} } \over { \hat{k} }} = {{ m_{1} - \left( m_{2} - m_{1}^{2} \right) } \over { m_{1} }} \\ \implies & \hat{k} = {{ m_{1}^{2} } \over { m_{1} - \left( m_{2} - m_{1}^{2} \right) }} \end{align*} namely, k^=X2X2(XiX)2/n \hat{k} = {{ \overline{X}^{2} } \over { \overline{X}^{2} - \sum \left( X_{i} - \overline{X} \right)^{2} / n }} This is a fairly usable estimate, but it can be difficult to use if the denominator becomes negative or gets too close to 00 due to blowing up. Looking at the formula, the problem in the denominator occurs if (i) the data itself is too small, so X2\overline{X}^{2} is too small, or (ii) the variance is too high, so (XiX)2\sum \left( X_{i} - \overline{X} \right)^{2} is too large. This does not significantly conflict with statistical intuition and can be considered a flaw, but a reasonable one.


  1. Casella. (2001). Statistical Inference(2nd Edition): p312. ↩︎