logo

Maximum Likelihood Estimator 📂Mathematical Statistics

Maximum Likelihood Estimator

Buildup

Consider a random variable XX with a probability density function (pdf) f(x;θ)f \left( x ; \theta \right) for parameter θΘ\theta \in \Theta. A random sample X1,,XnX_{1} , \cdots , X_{n} drawn identically and independently (iid) from the same distribution as XX has the same pdf f(x;θ)f(x ; \theta) and realization x:=(x1,,xn)\mathbf{x} := \left( x_{1} , \cdots , x_{n} \right). The function LL defined for this is called the Likelihood Function. L(θ;x):=k=1nf(xk;θ) L ( \theta ; \mathbf{x} ) := \prod_{k=1}^{n} f \left( x_{k} ; \theta \right) As will be discussed below, since we are interested in the maximum value of this function, it is more convenient to represent it as ll by taking the logarithm to convert the product \prod into the sum \sum. l(θ;x):=k=1nlogf(xk;θ) l ( \theta ; \mathbf{x} ) := \sum_{k=1}^{n} \log f \left( x_{k} ; \theta \right)

Definition 1

The estimator θ^:=θ^(X)\hat{\theta} := \hat{\theta} \left( \mathbf{X} \right) that satisfies the following is called the Maximum Likelihood Estimator (MLE). θ^=arg maxL(θ;X) \hat{\theta} = \argmax L \left( \theta ; \mathbf{X} \right)


  • X\mathbf{X} is a random vector X:=(X1,,Xn)\mathbf{X} := \left( X_{1} , \cdots , X_{n} \right) .
  • arg maxg\argmax g is the argument that maximizes function gg, which is the value that makes gg the largest.

Explanation

Intuition

Actually, ‘Likelihood’ is more intuitive in English, meaning “plausibility”.

For example, let’s assume that the heights of any three men picked off the street were measured to be 169cm, 171cm, and 182cm, and that the heights of Korean males follow a normal distribution N(μ,σ2)N \left( \mu , \sigma^{2} \right). Since the probability density function of the normal distribution f(x;μ)f (x; \mu) achieves its maximum value at the mean x=μx = \mu, the product defined by L(θ;x)L \left( \theta ; \mathbf{x} \right) is most likely to be the largest when θ=μ\theta = \mu.

Note that the main argument of function LL is not the data x\mathbf{x} but θ\theta. In other words, it is useful to imagine that LL is a function whose values do not change as xx is inserted into the pdf f(x)f(x), but rather fθf_{\theta} itself moves left and right according to θ\theta.

Since we do not know much about the properties of LL, we cannot confidently say that the place where LL is the largest is θ=171\theta = 171, but it is definitely not θ=182\theta = 182. The terms likelihood and arg max\argmax may seem unfamiliar, but in essence, the Maximum Likelihood Estimator is simply “the most plausible value.”

Formulas

If LL is differentiable, the Maximum Likelihood Estimator satisfies the following Estimating Equation, which is a partial differential equation. l(θ)θ=0 {{ \partial l ( \theta ) } \over { \partial \theta }} = 0 This is merely an extension of the solution used to find the maximum value of a function using derivatives in the curriculum. However, this area might seem unfamiliar and intimidating in textbooks, especially for statistics students, who often do not deal with differential equations after their freshman year in college. In reality, you won’t have to solve differential equations and it’s okay not to know about them, so don’t worry too much.


  1. Hogg et al. (2013). Introduction to Mathematical Statistics (7th Edition): p209, 329. ↩︎