logo

Proof of the Neyman-Pearson Lemma 📂Mathematical Statistics

Proof of the Neyman-Pearson Lemma

Theorem

Hypothesis Testing: $$ \begin{align*} H_{0} :& \theta = \theta_{0} \\ H_{1} :& \theta = \theta_{1} \end{align*} $$

In the hypothesis testing above, let $\theta_{0}, \theta_{1}$ have a probability density function or probability mass function denoted by $f \left( \mathbf{x} | \theta_{0} \right), f \left( \mathbf{x} | \theta_{1} \right)$, and let the rejection region be $R$, and some constant $k \ge 0$, then if

  • (i): $f \left( \mathbf{x} | \theta_{1} \right) > k f \left( \mathbf{x} | \theta_{0} \right)$, then $\mathbf{x} \in R$
  • (ii): $f \left( \mathbf{x} | \theta_{1} \right) < k f \left( \mathbf{x} | \theta_{0} \right)$, then $\mathbf{x} \in R^{c}$
  • (iii): $\alpha = P_{\theta_{0}} \left( \mathbf{X} \in R \right)$

the following two propositions are equivalent:


Explanation

The parameter space of the given hypothesis test is $\Theta = \left\{ \theta_{0}, \theta_{1} \right\}$, and note that the alternative hypothesis is $\theta \in \Theta_{0}^{c} \iff \theta = \theta_{1}$.

Power Function:

  1. The function $\beta (\theta) := P_{\theta} \left( \mathbf{X} \in \mathbb{R} \right)$ for the parameter $\theta$ with the rejection region being $R$ is called the Power Function.
  2. $\sup_{\theta \in \Theta_{0}} \beta (\theta) = \alpha$ then the given hypothesis test is called a size $\alpha$ hypothesis test.
  3. $\sup_{\theta \in \Theta_{0}} \beta (\theta) \le \alpha$ then the given hypothesis test is called a level $\alpha$ hypothesis test.

That all most powerful tests at level $\alpha$ are exactly most powerful tests at the size $\alpha$ means that they satisfy condition (iii). Those hypothesis tests that are the size $\alpha$ $$ P_{\theta} \left( \mathbf{X} \in R \right) = P_{\theta_{0}} \left( \mathbf{X} \in R \right) = \alpha $$ and $\Theta_{0}$ is a singleton set, so they are also hypothesis tests at the level $\alpha$.

Proof 1

Strategy: Let’s prove only for the case of a probability density function, i.e., the continuous case. For discrete random variables, simply change $\int$ to $\sum$. To simplify the proof, let’s define a Test Function $\phi$ using the indicator function as follows. $$ \phi \left( \mathbf{x} \right) := \chi_{R} \left( \mathbf{x} \right) = \begin{cases} 1 & , \text{if } x \in R \\ 0 & , \text{if } x \notin R \end{cases} $$

Thus,

  • $\phi$ is the test function satisfying conditions (i)-(iii)
    • $\beta$ is the power function for $\phi$
  • $\phi '$ is another test function for any other level $\alpha$
    • $\beta '$ is the power function for $\phi '$

$\left( \impliedby \right)$

In conditions (i), (ii),

  • (i): $f \left( \mathbf{x} | \theta_{1} \right) > k f \left( \mathbf{x} | \theta_{0} \right)$ then $\mathbf{x} \in R \implies \phi \left( \mathbf{x} \right) = 1$
  • (ii): $f \left( \mathbf{x} | \theta_{1} \right) < k f \left( \mathbf{x} | \theta_{0} \right)$ then $\mathbf{x} \in R^{c} \implies \phi \left( \mathbf{x} \right) = 0$

Meanwhile, since $0 \le \phi’ \left( \mathbf{x} \right) \le 1$,

  • (A): $\mathbf{x} \in R \implies \phi \left( \mathbf{x} \right) - \phi’ \left( \mathbf{x} \right) \ge 0$
  • (B): $\mathbf{x} \notin R \implies \phi \left( \mathbf{x} \right) - \phi’ \left( \mathbf{x} \right) \le 0$

Therefore, whether (i), (A) or (ii), (B), multiply $\phi \left( \mathbf{x} \right) - \phi’ \left( \mathbf{x} \right)$ by $f \left( \mathbf{x} | \theta_{1} \right) - k f \left( \mathbf{x} | \theta_{0} \right)$ to obtain the following inequality. $$ \left[ \phi \left( \mathbf{x} \right) - \phi’ \left( \mathbf{x} \right) \right] \left[ f \left( \mathbf{x} | \theta_{1} \right) - k f \left( \mathbf{x} | \theta_{0} \right) \right] \ge 0 $$ Integrating over the entire sample space $\int_{\Omega} \cdot d \mathbf{x}$ yields $$ \begin{align*} 0 \le & \int_{\Omega} \left[ \phi \left( \mathbf{x} \right) - \phi’ \left( \mathbf{x} \right) \right] \left[ f \left( \mathbf{x} | \theta_{1} \right) - k f \left( \mathbf{x} | \theta_{0} \right) \right] d \mathbf{x} \\ =& \int_{\Omega} \phi \left( \mathbf{x} \right) f \left( \mathbf{x} | \theta_{1} \right) - \phi’ \left( \mathbf{x} \right) f \left( \mathbf{x} | \theta_{1} \right) - \phi \left( \mathbf{x} \right) k f \left( \mathbf{x} | \theta_{0} \right) + \phi’ \left( \mathbf{x} \right) k f \left( \mathbf{x} | \theta_{0} \right) d \mathbf{x} \\ =& \int_{R} f \left( \mathbf{x} | \theta_{1} \right) - f \left( \mathbf{x} | \theta_{1} \right) - k f \left( \mathbf{x} | \theta_{0} \right) + k f \left( \mathbf{x} | \theta_{0} \right) d \mathbf{x} \\ =& \beta \left( \theta_{1} \right) - \beta’ \left( \theta_{1} \right) - k \beta \left( \theta_{0} \right) + k \beta’ \left( \theta_{0} \right) \end{align*} $$ By definition, $\phi '$ was regarding the test at level $\alpha \ge \beta’ \left( \theta \right)$, and $\phi = \sup \beta \left( \theta \right)$ was the test at size $\alpha$, therefore $$ \beta \left( \theta_{0} \right) - \beta’ \left( \theta_{0} \right) = \alpha - \beta’ \left( \theta_{0} \right) \ge 0 $$ and since $k \ge 0$, $$ 0 \le \beta \left( \theta_{1} \right) - \beta’ \left( \theta_{1} \right) - \left[ k \beta \left( \theta_{0} \right) - \beta’ \left( \theta_{0} \right) \right] \le \beta \left( \theta_{1} \right) - \beta’ \left( \theta_{1} \right) $$ In summary,

  • $\beta’ \left( \theta_{1} \right) \le \beta \left( \theta_{1} \right)$, and
  • $\phi '$ was any level $\alpha$, and
  • $\theta_{1}$ is the only element of $\Theta_{0}^{c}$,

it has been shown that the hypothesis test satisfying conditions (i)-(iii) is the most powerful test.

Most Powerful Test: Let $\mathcal{C}$ be the set of hypothesis tests as mentioned above.

In $\mathcal{C}$, a hypothesis test $A$ with the power function $\beta (\theta)$, for all $\theta \in \Theta_{0}^{c}$ and against all power functions $\beta ' (\theta)$ of all hypothesis tests in $\mathcal{C}$, $$ \beta ' (\theta) \le \beta (\theta) $$ is called the (Uniformly) Most Powerful Test, UMP.


$\left( \implies \right)$

Now, let’s assume $\phi '$ is the test function for the level $\alpha$ most powerful test.

Since $\phi$ satisfies conditions (i)-(iii), the corresponding hypothesis test is also the most powerful test, and the value of the power function is the same for all $\theta \in \Theta_{0}^{c}$. That is $\beta \left( \theta_{1} \right) = \beta’ \left( \theta_{1} \right)$ and, $$ \begin{align*} 0 \le & \beta \left( \theta_{1} \right) - \beta’ \left( \theta_{1} \right) - k \left[ \beta \left( \theta_{0} \right) - \beta’ \left( \theta_{0} \right) \right] \\ =& 0 - k \left[ \beta \left( \theta_{0} \right) - \beta’ \left( \theta_{0} \right) \right] \end{align*} $$

  • By organizing the inequality obtained above, we get $\beta \left( \theta_{0} \right) \le \beta’ \left( \theta_{0} \right)$, and since $\phi$ was the size $\alpha = \sup_{\theta \in \Theta_{0}} \beta (\theta)$ hypothesis test, then $\alpha \le \beta’ \left( \theta_{0} \right)$.
  • From the premise, $\phi '$ was the level $\alpha \ge \sup_{\theta \in \Theta_{0}} \beta ' (\theta)$ hypothesis test, therefore $\beta ' (\theta) \le \alpha$.

According to both inequalities, $\beta ' (\theta) = \alpha$, and the hypothesis test of $\phi '$ is exactly a size $\alpha$ hypothesis test. However, this inequality seems to hold only outside the set $A \subset \Omega$ given $\int_{A} f \left( \mathbf{x} | \theta \right) d \mathbf{x} = 0$, thus in the summary, $A$ should be considered an exception.


  1. Casella. (2001). Statistical Inference(2nd Edition): p388~389. ↩︎