Moment Method
Definition 1
When the parameters of a given distribution are unknown, the method of forming simultaneous equations for the parameters using moments and considering the solution to these equations as estimates of the parameters is known as the Moment Method.
Description
The moment method has been a point estimation technique used for a long time since the 1800s by Karl Pearson and others. Although it has not always produced very good results in many cases, because it is the simplest and easiest approach, it is worth trying first in any study.
Example: Crime Rate
Suppose we have a random sample $X_{1} , \cdots , X_{n}$ following a binomial distribution $B \left( k, p \right)$. The binomial distribution can be very easily and simply analyzed given $k$ and $p$, but let’s consider the case where we only have the data without knowing $k$ and $p$. This can be applied, for example, to crime rates, especially in cases of sexual crimes where there have been incidents that have not been reported. Assuming $k$ represents the total number of incidents and $p$ represents the probability of reporting, the number of reported incidents can be understood as a distribution that the actual data of reported cases follows – the binomial distribution $B \left( k, p \right)$.
We know that the $1$th moment is related to the mean, and the $2$th moment is related to the variance. By forming simultaneous equations, $$ \begin{align*} m_{1} := {{ 1 } \over { n }} \sum_{i=1}^{n} X_{i} =& kp \\ m_{2} := {{ 1 } \over { n }} \sum_{i=1}^{n} X_{i}^{2} =& kp(1-p) + k^{2} p^{2} \end{align*} $$ we obtain. Solving these equations, the estimate of $p$, $\hat{p}$, in relation to the estimate of $k$, $\hat{k}$, is $$ \hat{p} = {{ 1 } \over { k }} {{ 1 } \over { n }} \sum_{i=1}^{n} X_{i} = {{ \overline{X} } \over { \hat{k} }} $$ and $\hat{k}$ is $$ \begin{align*} & {{ 1 } \over { n }} \sum_{i=1}^{n} X_{i}^{2} = kp(1-p) + k^{2} p^{2} \\ \implies & {{ 1 } \over { n }} \sum_{i=1}^{n} X_{i}^{2} = {{ 1 } \over { n }} \sum_{i=1}^{n} X_{i} \cdot (1-p) + \left( {{ 1 } \over { n }} \sum_{i=1}^{n} X_{i} \right)^{2} \\ \implies & {{ 1 } \over { n }} \sum_{i=1}^{n} X_{i}^{2} = {{ 1 } \over { n }} \sum_{i=1}^{n} X_{i} \cdot \left( 1 - {{ \overline{X} } \over { \hat{k} }} \right) + \left( {{ 1 } \over { n }} \sum_{i=1}^{n} X_{i} \right)^{2} \\ \implies & m_{2} = m_{1} \left( 1 - {{ m_{1} } \over { \hat{k} }} \right) + m_{1}^{2} \\ \implies & {{ \left( m_{2} - m_{1}^{2} \right) } \over { m_{1} }} = 1 - {{ m_{1} } \over { \hat{k} }} \\ \implies & {{ m_{1} } \over { \hat{k} }} = {{ m_{1} - \left( m_{2} - m_{1}^{2} \right) } \over { m_{1} }} \\ \implies & \hat{k} = {{ m_{1}^{2} } \over { m_{1} - \left( m_{2} - m_{1}^{2} \right) }} \end{align*} $$ namely, $$ \hat{k} = {{ \overline{X}^{2} } \over { \overline{X}^{2} - \sum \left( X_{i} - \overline{X} \right)^{2} / n }} $$ This is a fairly usable estimate, but it can be difficult to use if the denominator becomes negative or gets too close to $0$ due to blowing up. Looking at the formula, the problem in the denominator occurs if (i) the data itself is too small, so $\overline{X}^{2}$ is too small, or (ii) the variance is too high, so $\sum \left( X_{i} - \overline{X} \right)^{2}$ is too large. This does not significantly conflict with statistical intuition and can be considered a flaw, but a reasonable one.
Casella. (2001). Statistical Inference(2nd Edition): p312. ↩︎