Definition of Likelihood Ratio Test in Mathematical Statistics
Definition 1
$$ \begin{align*} H_{0} :& \theta \in \Theta_{0} \\ H_{1} :& \theta \in \Theta_{0}^{c} \end{align*} $$
For the hypothesis test described above, the statistic $\lambda$ is called the Likelihood Ratio test statistic. $$ \lambda \left( \mathbf{x} \right) := {{ \sup_{\Theta_{0}} L \left( \theta \mid \mathbf{x} \right) } \over { \sup_{\Theta} L \left( \theta \mid \mathbf{x} \right) }} $$
A hypothesis test that has a rejection region $\left\{ \mathbf{x} : \lambda \left( \mathbf{x} \right) \le c \right\}$ for a given $c \in [0,1]$ is called a Likelihood Ratio Test and is often abbreviated as LRT.
- $L$ is a likelihood function.
Explanation
In the definition of $\lambda$, the numerator involves finding the supremum in $\sup_{\Theta_{0}}$, and the denominator involves finding the supremum in $\sup_{\Theta}$. The parameter space under the null hypothesis $\Theta_{0}$ is a subset of the entire parameter space $\Theta_{0} \subseteq \Theta$, and naturally, $0 \le \lambda \left( \mathbf{x} \right) \le 1$ holds. The closer this ratio is to $0$, the less plausible the parameters are under the null hypothesis.
Reflecting back on when we first encountered statistics, starting from basic probability distribution theory, studying test statistics like the t-distribution, F-distribution, chi-squared distribution separately seemed much cleaner. Of course, there is a certain motivation behind the Likelihood Ratio Test, but it makes sense without any buildup unlike the other tests mentioned.
Example: Normal Distribution
For practical applications of LRT, the supremum $\sup$ must be reflected. Since the denominator becomes largest over the entire parameter space $\Theta$, the maximum likelihood estimator is used, and the numerator is set to be maximized under the null hypothesis. $$ \begin{align*} H_{0} :& \theta = \theta_{0} \\ H_{1} :& \theta \ne \theta_{0} \end{align*} $$ Consider a hypothesis test for a known-variance normal distribution $N \left( \theta , \sigma^{2} \right)$ with a random sample $X_{1} , \cdots , X_{n}$. In this case, the denominator should use the sample mean $\bar{\mathbf{x}}$, which is the maximum likelihood estimator for the population mean $\theta$, and the numerator should directly use $\theta_{0}$ since the parameter space of the null hypothesis is a singleton set $\Theta_{0} = \left\{ \theta_{0} \right\}$. It can be mathematically derived as follows. $$ \begin{align*} \lambda \left( \mathbf{x} \right) =& {{ \sup_{\Theta_{0}} L \left( \theta \mid \mathbf{x} \right) } \over { \sup_{\Theta} L \left( \theta \mid \mathbf{x} \right) }} \\ =& {{ L \left( \theta_{0} \mid \mathbf{x} \right) } \over { L \left( \bar{\mathbf{x}} \mid \mathbf{x} \right) }} \\ =& {{ (2\pi)^{-n/2} \exp \left( - \sum \left( x_{k} - \theta_{0} \right)^{2} / 2 \right) } \over { (2\pi)^{-n/2} \exp \left( - \sum \left( x_{k} - \bar{\mathbf{x}} \right)^{2} / 2 \right) }} \\ =& \exp \left( -n \left( \bar{\mathbf{x}} - \theta_{0} \right)^{2} / 2 \right) \end{align*} $$ Note that $\lambda (\mathbf{x})$ is exactly $\bar{\mathbf{x}} = \theta_{0}$ when $1$, and the larger the difference, the closer it gets to $0$. Of course, it is understood that in the mathematical definition of LRT, it ranges from $0$ to $1$, but seeing the calculated result intuitively allows one to perform hypothesis testing based on how similar the sample variance is to the population mean under the null hypothesis.
Casella. (2001). Statistical Inference(2nd Edition): p375. ↩︎