Proof of the Neyman-Pearson Lemma 📂Mathematical Statistics

Proof of the Neyman-Pearson Lemma

Theorem

Hypothesis Testing: $\begin{align*} H_{0} :& \theta = \theta_{0} \\ H_{1} :& \theta = \theta_{1} \end{align*}$

In the hypothesis testing above, let $\theta_{0}, \theta_{1}$ have a probability density function or probability mass function denoted by $f \left( \mathbf{x} | \theta_{0} \right), f \left( \mathbf{x} | \theta_{1} \right)$ , and let the rejection region be $R$ , and some constant $k \ge 0$ , then if

(i): $f \left( \mathbf{x} | \theta_{1} \right) > k f \left( \mathbf{x} | \theta_{0} \right)$ , then $\mathbf{x} \in R$
(ii): $f \left( \mathbf{x} | \theta_{1} \right) < k f \left( \mathbf{x} | \theta_{0} \right)$ , then $\mathbf{x} \in R^{c}$
(iii): $\alpha = P_{\theta_{0}} \left( \mathbf{X} \in R \right)$

the following two propositions are equivalent:

All hypothesis tests satisfying the above three conditions are most powerful tests at level $\alpha$ .
If there exists a hypothesis test satisfying the above three conditions together with the constant $k > 0$ , then all most powerful tests at level $\alpha$ satisfy (i) and (ii), excluding the set $A \subset \Omega$ , and are exactly most powerful tests at the size $\alpha$ .

$\Omega$ is the sample space.

Explanation

The parameter space of the given hypothesis test is $\Theta = \left\{ \theta_{0}, \theta_{1} \right\}$ , and note that the alternative hypothesis is $\theta \in \Theta_{0}^{c} \iff \theta = \theta_{1}$ .

Power Function:
The function $\beta (\theta) := P_{\theta} \left( \mathbf{X} \in \mathbb{R} \right)$ for the parameter $\theta$ with the rejection region being $R$ is called the Power Function.
$\sup_{\theta \in \Theta_{0}} \beta (\theta) = \alpha$ then the given hypothesis test is called a size $\alpha$ hypothesis test.
$\sup_{\theta \in \Theta_{0}} \beta (\theta) \le \alpha$ then the given hypothesis test is called a level $\alpha$ hypothesis test.

That all most powerful tests at level $\alpha$ are exactly most powerful tests at the size $\alpha$ means that they satisfy condition (iii). Those hypothesis tests that are the size $\alpha$ $P_{\theta} \left( \mathbf{X} \in R \right) = P_{\theta_{0}} \left( \mathbf{X} \in R \right) = \alpha$ and $\Theta_{0}$ is a singleton set, so they are also hypothesis tests at the level $\alpha$ .

Proof ¹

Strategy: Let’s prove only for the case of a probability density function, i.e., the continuous case. For discrete random variables, simply change $\int$ to $\sum$ . To simplify the proof, let’s define a Test Function $\phi$ using the indicator function as follows. $\phi \left( \mathbf{x} \right) := \chi_{R} \left( \mathbf{x} \right) = \begin{cases} 1 & , \text{if } x \in R \\ 0 & , \text{if } x \notin R \end{cases}$

Thus,

$\phi$ is the test function satisfying conditions (i)-(iii)
- $\beta$ is the power function for $\phi$
$\phi '$ is another test function for any other level $\alpha$
- $\beta '$ is the power function for $\phi '$

$\left( \impliedby \right)$

In conditions (i), (ii),

(i): $f \left( \mathbf{x} | \theta_{1} \right) > k f \left( \mathbf{x} | \theta_{0} \right)$ then $\mathbf{x} \in R \implies \phi \left( \mathbf{x} \right) = 1$
(ii): $f \left( \mathbf{x} | \theta_{1} \right) < k f \left( \mathbf{x} | \theta_{0} \right)$ then $\mathbf{x} \in R^{c} \implies \phi \left( \mathbf{x} \right) = 0$

Meanwhile, since $0 \le \phi’ \left( \mathbf{x} \right) \le 1$ ,

(A): $\mathbf{x} \in R \implies \phi \left( \mathbf{x} \right) - \phi’ \left( \mathbf{x} \right) \ge 0$
(B): $\mathbf{x} \notin R \implies \phi \left( \mathbf{x} \right) - \phi’ \left( \mathbf{x} \right) \le 0$

Therefore, whether (i), (A) or (ii), (B), multiply $\phi \left( \mathbf{x} \right) - \phi’ \left( \mathbf{x} \right)$ by $f \left( \mathbf{x} | \theta_{1} \right) - k f \left( \mathbf{x} | \theta_{0} \right)$ to obtain the following inequality. $\left[ \phi \left( \mathbf{x} \right) - \phi’ \left( \mathbf{x} \right) \right] \left[ f \left( \mathbf{x} | \theta_{1} \right) - k f \left( \mathbf{x} | \theta_{0} \right) \right] \ge 0$ Integrating over the entire sample space $\int_{\Omega} \cdot d \mathbf{x}$ yields $\begin{align*} 0 \le & \int_{\Omega} \left[ \phi \left( \mathbf{x} \right) - \phi’ \left( \mathbf{x} \right) \right] \left[ f \left( \mathbf{x} | \theta_{1} \right) - k f \left( \mathbf{x} | \theta_{0} \right) \right] d \mathbf{x} \\ =& \int_{\Omega} \phi \left( \mathbf{x} \right) f \left( \mathbf{x} | \theta_{1} \right) - \phi’ \left( \mathbf{x} \right) f \left( \mathbf{x} | \theta_{1} \right) - \phi \left( \mathbf{x} \right) k f \left( \mathbf{x} | \theta_{0} \right) + \phi’ \left( \mathbf{x} \right) k f \left( \mathbf{x} | \theta_{0} \right) d \mathbf{x} \\ =& \int_{R} f \left( \mathbf{x} | \theta_{1} \right) - f \left( \mathbf{x} | \theta_{1} \right) - k f \left( \mathbf{x} | \theta_{0} \right) + k f \left( \mathbf{x} | \theta_{0} \right) d \mathbf{x} \\ =& \beta \left( \theta_{1} \right) - \beta’ \left( \theta_{1} \right) - k \beta \left( \theta_{0} \right) + k \beta’ \left( \theta_{0} \right) \end{align*}$ By definition, $\phi '$ was regarding the test at level $\alpha \ge \beta’ \left( \theta \right)$ , and $\phi = \sup \beta \left( \theta \right)$ was the test at size $\alpha$ , therefore $\beta \left( \theta_{0} \right) - \beta’ \left( \theta_{0} \right) = \alpha - \beta’ \left( \theta_{0} \right) \ge 0$ and since $k \ge 0$ , $0 \le \beta \left( \theta_{1} \right) - \beta’ \left( \theta_{1} \right) - \left[ k \beta \left( \theta_{0} \right) - \beta’ \left( \theta_{0} \right) \right] \le \beta \left( \theta_{1} \right) - \beta’ \left( \theta_{1} \right)$ In summary,

$\beta’ \left( \theta_{1} \right) \le \beta \left( \theta_{1} \right)$ , and
$\phi '$ was any level $\alpha$ , and
$\theta_{1}$ is the only element of $\Theta_{0}^{c}$ ,

it has been shown that the hypothesis test satisfying conditions (i)-(iii) is the most powerful test.

Most Powerful Test: Let $\mathcal{C}$ be the set of hypothesis tests as mentioned above.

In $\mathcal{C}$ , a hypothesis test $A$ with the power function $\beta (\theta)$ , for all $\theta \in \Theta_{0}^{c}$ and against all power functions $\beta ' (\theta)$ of all hypothesis tests in $\mathcal{C}$ , $\beta ' (\theta) \le \beta (\theta)$ is called the (Uniformly) Most Powerful Test, UMP.

$\left( \implies \right)$

Now, let’s assume $\phi '$ is the test function for the level $\alpha$ most powerful test.

Since $\phi$ satisfies conditions (i)-(iii), the corresponding hypothesis test is also the most powerful test, and the value of the power function is the same for all $\theta \in \Theta_{0}^{c}$ . That is $\beta \left( \theta_{1} \right) = \beta’ \left( \theta_{1} \right)$ and, $\begin{align*} 0 \le & \beta \left( \theta_{1} \right) - \beta’ \left( \theta_{1} \right) - k \left[ \beta \left( \theta_{0} \right) - \beta’ \left( \theta_{0} \right) \right] \\ =& 0 - k \left[ \beta \left( \theta_{0} \right) - \beta’ \left( \theta_{0} \right) \right] \end{align*}$

By organizing the inequality obtained above, we get $\beta \left( \theta_{0} \right) \le \beta’ \left( \theta_{0} \right)$ , and since $\phi$ was the size $\alpha = \sup_{\theta \in \Theta_{0}} \beta (\theta)$ hypothesis test, then $\alpha \le \beta’ \left( \theta_{0} \right)$ .
From the premise, $\phi '$ was the level $\alpha \ge \sup_{\theta \in \Theta_{0}} \beta ' (\theta)$ hypothesis test, therefore $\beta ' (\theta) \le \alpha$ .

According to both inequalities, $\beta ' (\theta) = \alpha$ , and the hypothesis test of $\phi '$ is exactly a size $\alpha$ hypothesis test. However, this inequality seems to hold only outside the set $A \subset \Omega$ given $\int_{A} f \left( \mathbf{x} | \theta \right) d \mathbf{x} = 0$ , thus in the summary, $A$ should be considered an exception.

■

Casella. (2001). Statistical Inference(2nd Edition): p388~389. ↩︎