logo

Proof of the Neyman-Pearson Lemma 📂Mathematical Statistics

Proof of the Neyman-Pearson Lemma

Theorem

Hypothesis Testing: H0:θ=θ0H1:θ=θ1 \begin{align*} H_{0} :& \theta = \theta_{0} \\ H_{1} :& \theta = \theta_{1} \end{align*}

In the hypothesis testing above, let θ0,θ1\theta_{0}, \theta_{1} have a probability density function or probability mass function denoted by f(xθ0),f(xθ1)f \left( \mathbf{x} | \theta_{0} \right), f \left( \mathbf{x} | \theta_{1} \right), and let the rejection region be RR, and some constant k0k \ge 0, then if

  • (i): f(xθ1)>kf(xθ0)f \left( \mathbf{x} | \theta_{1} \right) > k f \left( \mathbf{x} | \theta_{0} \right), then xR\mathbf{x} \in R
  • (ii): f(xθ1)<kf(xθ0)f \left( \mathbf{x} | \theta_{1} \right) < k f \left( \mathbf{x} | \theta_{0} \right), then xRc\mathbf{x} \in R^{c}
  • (iii): α=Pθ0(XR)\alpha = P_{\theta_{0}} \left( \mathbf{X} \in R \right)

the following two propositions are equivalent:


Explanation

The parameter space of the given hypothesis test is Θ={θ0,θ1}\Theta = \left\{ \theta_{0}, \theta_{1} \right\}, and note that the alternative hypothesis is θΘ0c    θ=θ1\theta \in \Theta_{0}^{c} \iff \theta = \theta_{1}.

Power Function:

  1. The function β(θ):=Pθ(XR)\beta (\theta) := P_{\theta} \left( \mathbf{X} \in \mathbb{R} \right) for the parameter θ\theta with the rejection region being RR is called the Power Function.
  2. supθΘ0β(θ)=α\sup_{\theta \in \Theta_{0}} \beta (\theta) = \alpha then the given hypothesis test is called a size α\alpha hypothesis test.
  3. supθΘ0β(θ)α\sup_{\theta \in \Theta_{0}} \beta (\theta) \le \alpha then the given hypothesis test is called a level α\alpha hypothesis test.

That all most powerful tests at level α\alpha are exactly most powerful tests at the size α\alpha means that they satisfy condition (iii). Those hypothesis tests that are the size α\alpha Pθ(XR)=Pθ0(XR)=α P_{\theta} \left( \mathbf{X} \in R \right) = P_{\theta_{0}} \left( \mathbf{X} \in R \right) = \alpha and Θ0\Theta_{0} is a singleton set, so they are also hypothesis tests at the level α\alpha.

Proof 1

Strategy: Let’s prove only for the case of a probability density function, i.e., the continuous case. For discrete random variables, simply change \int to \sum. To simplify the proof, let’s define a Test Function ϕ\phi using the indicator function as follows. ϕ(x):=χR(x)={1,if xR0,if xR \phi \left( \mathbf{x} \right) := \chi_{R} \left( \mathbf{x} \right) = \begin{cases} 1 & , \text{if } x \in R \\ 0 & , \text{if } x \notin R \end{cases}

Thus,

  • ϕ\phi is the test function satisfying conditions (i)-(iii)
    • β\beta is the power function for ϕ\phi
  • ϕ\phi ' is another test function for any other level α\alpha
    • β\beta ' is the power function for ϕ\phi '

(    )\left( \impliedby \right)

In conditions (i), (ii),

  • (i): f(xθ1)>kf(xθ0)f \left( \mathbf{x} | \theta_{1} \right) > k f \left( \mathbf{x} | \theta_{0} \right) then xR    ϕ(x)=1\mathbf{x} \in R \implies \phi \left( \mathbf{x} \right) = 1
  • (ii): f(xθ1)<kf(xθ0)f \left( \mathbf{x} | \theta_{1} \right) < k f \left( \mathbf{x} | \theta_{0} \right) then xRc    ϕ(x)=0\mathbf{x} \in R^{c} \implies \phi \left( \mathbf{x} \right) = 0

Meanwhile, since 0ϕ(x)10 \le \phi’ \left( \mathbf{x} \right) \le 1,

  • (A): xR    ϕ(x)ϕ(x)0\mathbf{x} \in R \implies \phi \left( \mathbf{x} \right) - \phi’ \left( \mathbf{x} \right) \ge 0
  • (B): xR    ϕ(x)ϕ(x)0\mathbf{x} \notin R \implies \phi \left( \mathbf{x} \right) - \phi’ \left( \mathbf{x} \right) \le 0

Therefore, whether (i), (A) or (ii), (B), multiply ϕ(x)ϕ(x)\phi \left( \mathbf{x} \right) - \phi’ \left( \mathbf{x} \right) by f(xθ1)kf(xθ0)f \left( \mathbf{x} | \theta_{1} \right) - k f \left( \mathbf{x} | \theta_{0} \right) to obtain the following inequality. [ϕ(x)ϕ(x)][f(xθ1)kf(xθ0)]0 \left[ \phi \left( \mathbf{x} \right) - \phi’ \left( \mathbf{x} \right) \right] \left[ f \left( \mathbf{x} | \theta_{1} \right) - k f \left( \mathbf{x} | \theta_{0} \right) \right] \ge 0 Integrating over the entire sample space Ωdx\int_{\Omega} \cdot d \mathbf{x} yields 0Ω[ϕ(x)ϕ(x)][f(xθ1)kf(xθ0)]dx=Ωϕ(x)f(xθ1)ϕ(x)f(xθ1)ϕ(x)kf(xθ0)+ϕ(x)kf(xθ0)dx=Rf(xθ1)f(xθ1)kf(xθ0)+kf(xθ0)dx=β(θ1)β(θ1)kβ(θ0)+kβ(θ0) \begin{align*} 0 \le & \int_{\Omega} \left[ \phi \left( \mathbf{x} \right) - \phi’ \left( \mathbf{x} \right) \right] \left[ f \left( \mathbf{x} | \theta_{1} \right) - k f \left( \mathbf{x} | \theta_{0} \right) \right] d \mathbf{x} \\ =& \int_{\Omega} \phi \left( \mathbf{x} \right) f \left( \mathbf{x} | \theta_{1} \right) - \phi’ \left( \mathbf{x} \right) f \left( \mathbf{x} | \theta_{1} \right) - \phi \left( \mathbf{x} \right) k f \left( \mathbf{x} | \theta_{0} \right) + \phi’ \left( \mathbf{x} \right) k f \left( \mathbf{x} | \theta_{0} \right) d \mathbf{x} \\ =& \int_{R} f \left( \mathbf{x} | \theta_{1} \right) - f \left( \mathbf{x} | \theta_{1} \right) - k f \left( \mathbf{x} | \theta_{0} \right) + k f \left( \mathbf{x} | \theta_{0} \right) d \mathbf{x} \\ =& \beta \left( \theta_{1} \right) - \beta’ \left( \theta_{1} \right) - k \beta \left( \theta_{0} \right) + k \beta’ \left( \theta_{0} \right) \end{align*} By definition, ϕ\phi ' was regarding the test at level αβ(θ)\alpha \ge \beta’ \left( \theta \right), and ϕ=supβ(θ)\phi = \sup \beta \left( \theta \right) was the test at size α\alpha, therefore β(θ0)β(θ0)=αβ(θ0)0 \beta \left( \theta_{0} \right) - \beta’ \left( \theta_{0} \right) = \alpha - \beta’ \left( \theta_{0} \right) \ge 0 and since k0k \ge 0, 0β(θ1)β(θ1)[kβ(θ0)β(θ0)]β(θ1)β(θ1) 0 \le \beta \left( \theta_{1} \right) - \beta’ \left( \theta_{1} \right) - \left[ k \beta \left( \theta_{0} \right) - \beta’ \left( \theta_{0} \right) \right] \le \beta \left( \theta_{1} \right) - \beta’ \left( \theta_{1} \right) In summary,

  • β(θ1)β(θ1)\beta’ \left( \theta_{1} \right) \le \beta \left( \theta_{1} \right), and
  • ϕ\phi ' was any level α\alpha, and
  • θ1\theta_{1} is the only element of Θ0c\Theta_{0}^{c},

it has been shown that the hypothesis test satisfying conditions (i)-(iii) is the most powerful test.

Most Powerful Test: Let C\mathcal{C} be the set of hypothesis tests as mentioned above.

In C\mathcal{C}, a hypothesis test AA with the power function β(θ)\beta (\theta), for all θΘ0c\theta \in \Theta_{0}^{c} and against all power functions β(θ)\beta ' (\theta) of all hypothesis tests in C\mathcal{C}, β(θ)β(θ) \beta ' (\theta) \le \beta (\theta) is called the (Uniformly) Most Powerful Test, UMP.


(    )\left( \implies \right)

Now, let’s assume ϕ\phi ' is the test function for the level α\alpha most powerful test.

Since ϕ\phi satisfies conditions (i)-(iii), the corresponding hypothesis test is also the most powerful test, and the value of the power function is the same for all θΘ0c\theta \in \Theta_{0}^{c}. That is β(θ1)=β(θ1)\beta \left( \theta_{1} \right) = \beta’ \left( \theta_{1} \right) and, 0β(θ1)β(θ1)k[β(θ0)β(θ0)]=0k[β(θ0)β(θ0)] \begin{align*} 0 \le & \beta \left( \theta_{1} \right) - \beta’ \left( \theta_{1} \right) - k \left[ \beta \left( \theta_{0} \right) - \beta’ \left( \theta_{0} \right) \right] \\ =& 0 - k \left[ \beta \left( \theta_{0} \right) - \beta’ \left( \theta_{0} \right) \right] \end{align*}

  • By organizing the inequality obtained above, we get β(θ0)β(θ0)\beta \left( \theta_{0} \right) \le \beta’ \left( \theta_{0} \right), and since ϕ\phi was the size α=supθΘ0β(θ)\alpha = \sup_{\theta \in \Theta_{0}} \beta (\theta) hypothesis test, then αβ(θ0)\alpha \le \beta’ \left( \theta_{0} \right).
  • From the premise, ϕ\phi ' was the level αsupθΘ0β(θ)\alpha \ge \sup_{\theta \in \Theta_{0}} \beta ' (\theta) hypothesis test, therefore β(θ)α\beta ' (\theta) \le \alpha.

According to both inequalities, β(θ)=α\beta ' (\theta) = \alpha, and the hypothesis test of ϕ\phi ' is exactly a size α\alpha hypothesis test. However, this inequality seems to hold only outside the set AΩA \subset \Omega given Af(xθ)dx=0\int_{A} f \left( \mathbf{x} | \theta \right) d \mathbf{x} = 0, thus in the summary, AA should be considered an exception.


  1. Casella. (2001). Statistical Inference(2nd Edition): p388~389. ↩︎