Hypothesis Testing for Population Mean
Hypothesis Testing 1
Suppose the population distribution follows $\left( \mu , \sigma^{2} \right)$. When the sample is a large sample, i.e., when the number of samples is $n > 30$, the hypothesis testing for the candidate of population mean $\mu_{0}$ is as follows:
- $H_{0}$: $\mu = \mu_{0}$. That is, the population mean is $\mu_{0}$.
- $H_{1}$: $\mu \ne \mu_{0}$. That is, the population mean is not $\mu_{0}$.
test statistic
The test statistic is calculated slightly differently depending on whether the population standard deviation $\sigma$ is known or not.
- When $\sigma$ is known: Use the population standard deviation $\sigma$ as is, as follows. $$ Z = {{ \overline{X} - \mu_{0} } \over { \sigma / \sqrt{n} }} $$
- When $\sigma$ is unknown: Use the sample standard deviation $s$, as follows. $$ Z = {{ \overline{X} - \mu_{0} } \over { s / \sqrt{n} }} $$
Explanation
It can’t be stated outright that the sample mean $\overline{x}$ will predict the population mean $\mu_{0}$ to be about $\overline{x} = \mu = \mu_{0}$ just because it’s expected. The concept of statistics is not just about taking an average of everything thrown together and believing it roughly; it’s about statistically supporting that claim through hypothesis testing.
Derivation 2
Central Limit Theorem: If $\left\{ X_{k} \right\}_{k=1}^{n}$ are iid random variables with the probability distribution $\left( \mu, \sigma^2 \right) $, then when $n \to \infty$, $$ \sqrt{n} {{ \overline{X}_n - \mu } \over {\sigma}} \overset{D}{\to} N (0,1) $$
Since the population distribution is assumed to be $\left( \mu , \sigma^{2} \right)$ and the sample is considered to be large, $$ Z = {{ \overline{X} - \mu_{0} } \over { \sigma / \sqrt{n} }} $$ follows a distribution almost approximating the standard normal distribution $N (0,1)$. Likewise, in the case of a large sample, $s \approx \sigma$, it’s acceptable to use $s$ instead of $\sigma$ when the population variance is unknown. When random variable $Y$ follows the standard normal distribution, rejecting $H_{0}$ for a significance level $\alpha$ satisfying $P \left( Y \ge z_{\alpha} \right) = \alpha$ about $z_{\alpha}$ is equivalent to: $$ \left| Z \right| \ge z_{\alpha} $$ This means it’s too far from $\mu_{0}$ to believe $\mu = \mu_{0}$ according to the null hypothesis.
■
On the other hand, in the derivation process, one might question the assumption of a large sample $n \ge 30$ being considered like $n \to \infty$, but this is to emotionally accept that in the universal world of statistics, ’large sample’ is about this level. Despite the word Bigdata being used so often since the 2010s that units like thousands or billions might not feel significant, considering our given population could be ‘genetically controlled expensive lab mice’ or ‘rare disease patients’, it could still feel like what is called a large sample.