logo

Hypothesis Testing for Population Mean 📂Statistical Test

Hypothesis Testing for Population Mean

Hypothesis Testing 1

Suppose the population distribution follows (μ,σ2)\left( \mu , \sigma^{2} \right). When the sample is a large sample, i.e., when the number of samples is n>30n > 30, the hypothesis testing for the candidate of population mean μ0\mu_{0} is as follows:

  • H0H_{0}: μ=μ0\mu = \mu_{0}. That is, the population mean is μ0\mu_{0}.
  • H1H_{1}: μμ0\mu \ne \mu_{0}. That is, the population mean is not μ0\mu_{0}.

test statistic

The test statistic is calculated slightly differently depending on whether the population standard deviation σ\sigma is known or not.

  • When σ\sigma is known: Use the population standard deviation σ\sigma as is, as follows. Z=Xμ0σ/n Z = {{ \overline{X} - \mu_{0} } \over { \sigma / \sqrt{n} }}
  • When σ\sigma is unknown: Use the sample standard deviation ss, as follows. Z=Xμ0s/n Z = {{ \overline{X} - \mu_{0} } \over { s / \sqrt{n} }}

Explanation

It can’t be stated outright that the sample mean x\overline{x} will predict the population mean μ0\mu_{0} to be about x=μ=μ0\overline{x} = \mu = \mu_{0} just because it’s expected. The concept of statistics is not just about taking an average of everything thrown together and believing it roughly; it’s about statistically supporting that claim through hypothesis testing.

Derivation 2

Central Limit Theorem: If {Xk}k=1n\left\{ X_{k} \right\}_{k=1}^{n} are iid random variables with the probability distribution (μ,σ2)\left( \mu, \sigma^2 \right) , then when nn \to \infty, nXnμσDN(0,1) \sqrt{n} {{ \overline{X}_n - \mu } \over {\sigma}} \overset{D}{\to} N (0,1)

Since the population distribution is assumed to be (μ,σ2)\left( \mu , \sigma^{2} \right) and the sample is considered to be large, Z=Xμ0σ/n Z = {{ \overline{X} - \mu_{0} } \over { \sigma / \sqrt{n} }} follows a distribution almost approximating the standard normal distribution N(0,1)N (0,1). Likewise, in the case of a large sample, sσs \approx \sigma, it’s acceptable to use ss instead of σ\sigma when the population variance is unknown. When random variable YY follows the standard normal distribution, rejecting H0H_{0} for a significance level α\alpha satisfying P(Yzα)=αP \left( Y \ge z_{\alpha} \right) = \alpha about zαz_{\alpha} is equivalent to: Zzα \left| Z \right| \ge z_{\alpha} This means it’s too far from μ0\mu_{0} to believe μ=μ0\mu = \mu_{0} according to the null hypothesis.

On the other hand, in the derivation process, one might question the assumption of a large sample n30n \ge 30 being considered like nn \to \infty, but this is to emotionally accept that in the universal world of statistics, ’large sample’ is about this level. Despite the word Bigdata being used so often since the 2010s that units like thousands or billions might not feel significant, considering our given population could be ‘genetically controlled expensive lab mice’ or ‘rare disease patients’, it could still feel like what is called a large sample.


  1. Mendenhall. (2012). Introduction to Probability and Statistics (13th Edition): p347. ↩︎

  2. 경북대학교 통계학과. (2008). 엑셀을 이용한 통계학: p204. ↩︎