logo

Normal Distribution 📂Probability Distribution

Normal Distribution

Definition

pdf.png

A continuous probability distribution N(μ,σ2)N \left( \mu,\sigma^{2} \right) with a probability density function having a mean of μR\mu \in \mathbb{R} and a variance of σ2>0\sigma^{2} > 0 is called a normal distribution.

f(x)=12πσexp[12(xμσ)2],xR f(x) = {{ 1 } \over { \sqrt{2 \pi} \sigma }} \exp \left[ - {{ 1 } \over { 2 }} \left( {{ x - \mu } \over { \sigma }} \right)^{2} \right] \qquad, x \in \mathbb{R}

In particular, a normal distribution with the following probability density function N(0,12)N \left( 0,1^{2} \right) is referred to as the standard normal distribution.

f(z)=12πexp[z22] f(z) = {{ 1 } \over { \sqrt{2 \pi} }} \exp \left[ - {{ z^{2} } \over { 2 }} \right]

Explanation

Another name for the normal distribution is the Gaussian distribution. Historically, it became widely known when Gauss introduced the normal distribution through research on the method of least squares in 1809. We cannot definitively say that Gauss was the first to realize the essence of the normal distribution, but he is certainly deserving of having it named after him.

In 1794, at the tender age of seventeen, Gauss was inspired to find a method to determine true values from the measurements encountered in daily life and research. Gauss collected data by counting his footsteps on his regular path, plotted the data, and obtained a bell-shaped curve. This discovery predated the concept of a histogram, yet Gauss believed that concepts of the normal distribution and the method of least squares were already well-known techniques used by everyone1. One could only call this an overwhelming display of genius. Meanwhile, many calculations related to the normal distribution also employ the Gaussian integral.

Subsequently, the normal distribution has been extensively researched and become an indispensable tool across the sciences. So familiar, in fact, that laypeople sometimes make the mistake of assuming that statistics amount to nothing more than assuming data follows a normal distribution and calculating the mean and variance. While it would be unfortunate if such underestimation led to a career in statistics, for non-specialists, this level of explanation might be sufficient. Such statements underscore the importance and power of the normal distribution.

Basic Properties

Moment Generating Function

  • [1]: m(t)=exp(μt+σ2t22),tRm(t) = \exp \left( \mu t + {{ \sigma^{2} t^{2} } \over { 2 }} \right) \qquad , t \in \mathbb{R}

Mean and Variance

  • [2]: If XN(μ,σ2)X \sim N\left( \mu , \sigma^{2} \right), then E(X)=μVar(X)=σ2 \begin{align*} E(X) =& \mu \\ \Var (X) =& \sigma^{2} \end{align*}

Sufficient Statistics and Maximum Likelihood Estimators

  • [3]: Suppose a random sample follows a normal distribution X:=(X1,,Xn)N(μ,σ2)\mathbf{X} := \left( X_{1} , \cdots , X_{n} \right) \sim N \left( \mu , \sigma^{2} \right) is given.

The sufficient statistics TT and maximum likelihood estimators (μ^,σ2^)\left( \hat{\mu}, \widehat{\sigma^{2}} \right) for (μ,σ2)\left( \mu, \sigma^{2} \right) are as follows. T=(kXk,kXk2)(μ^,σ2^)=(1nkXk,1nk(XkX)2) \begin{align*} T =& \left( \sum_{k} X_{k}, \sum_{k} X_{k}^{2} \right) \\ \left( \hat{\mu}, \widehat{\sigma^{2}} \right) =& \left( {{ 1 } \over { n }} \sum_{k} X_{k}, {{ 1 } \over { n }} \sum_{k} \left( X_{k} - \overline{X} \right)^{2} \right) \end{align*}

Entropy

  • [4]: (When choosing the natural logarithm) The entropy of the normal distribution is as follows. H=ln2πeσ2 H = \ln \sqrt{2\pi e \sigma^{2}}

Relative Entropy

  • [5]: The relative entropy (KLD) between two normal distributions N(μ,σ2)N(\mu, \sigma^{2}) and N(μ1,σ12)N(\mu_{1}, \sigma_{1}^{2}) is as follows.

DKL(N(μ,σ2)N(μ1,σ12))=log(σ1σ)+σ2+(μμ1)22σ1212 D_{\text{KL}}\big( N(\mu, \sigma^{2}) \| N(\mu_{1}, \sigma_{1}^{2}) \big) = \log \left( \dfrac{\sigma_{1}}{\sigma} \right) + \dfrac{\sigma^{2} + (\mu - \mu_{1})^{2}}{2\sigma_{1}^{2}} - \dfrac{1}{2}

Theorems

The importance of the normal distribution can be adequately demonstrated by simply listing the theorems without needing lengthy explanations. Observe the following.

Central Limit Theorem

  • [a]: If {Xk}k=1n\left\{ X_{k} \right\}_{k=1}^{n} are iid random variables following a probability distribution (μ,σ2)\left( \mu, \sigma^2 \right) , then for nn \to \infty nXnμσDN(0,1) \sqrt{n} {{ \overline{X}_n - \mu } \over {\sigma}} \overset{D}{\to} N (0,1)

Relation with the Chi-Square Distribution

  • [b]: If XN(μ,σ2)X \sim N(\mu,\sigma ^2) V=(Xμσ)2χ2(1) V=\left( { X - \mu \over \sigma} \right) ^2 \sim \chi ^2 (1)

Standard Normal Distribution as the Limit Distribution of the Binomial Distribution

  • [c]: If XiB(1,p)X_i \sim B(1,p) and Yn=X1+X2++XnY_n = X_1 + X_2 + \cdots + X_n then YnB(n,p)Y_n \sim B(n,p) Ynnpnp(1p)DN(0,1) { { Y_n - np } \over {\sqrt{ np(1-p) } } }\overset{D}{\to} N(0,1)

Standard Normal Distribution as the Limit Distribution of the Poisson Distribution

  • [d]: If XnPoi(n)X_{n} \sim \text{Poi} \left( n \right) and Yn:=Xnnn\displaystyle Y_{n} := {{ X_{n} - n } \over { \sqrt{n} }} YnDN(0,1) Y_{n} \overset{D}{\to} N(0,1)

Standard Normal Distribution as the Limit Distribution of the Student’s t-distribution

  • [e]: If Tnt(n)T_n \sim t(n) Tn DN(0,1) T_n \ \overset{D}{\to} N(0,1)

Derivation of the t-distribution from the Normal and Chi-Square Distributions

  • [f]: If two random variables W,VW,V are independent and WN(0,1)W \sim N(0,1), Vχ2(r)V \sim \chi^{2} (r) T=WV/rt(r) T = { {W} \over {\sqrt{V/r} } } \sim t(r)

Proofs

Strategy: Use Gaussian integration by forming a perfect square in the exponential term, deriving the moment generating function of the standard normal distribution first, and then obtaining the moment generating function of the normal distribution through substitution.

Gaussian Integral: ex2dx=π \int_{-\infty}^{\infty} e^{-x^2} dx= \sqrt{\pi}

[1] 2

If Z:=XμσN(0,1)\displaystyle Z := {{ X - \mu } \over { \sigma }} \sim N(0,1), then the moment generating function is

mZ(t)=exp(tz)12πexp[12z2]dz=1π12exp[12z2+tz]dz=1π12exp[12(zt)2+t22]dz=1π12exp[12(zt)2]exp[t22]dz=exp[t22]1π12exp[w2]2dw=exp[t22] \begin{align*} m_{Z}(t) =& \int_{-\infty}^{\infty} \exp (tz) {{ 1 } \over { \sqrt{2 \pi} }} \exp \left[ - {{ 1 } \over { 2 }} z^{2} \right] dz \\ =& {{ 1 } \over { \sqrt{\pi} }} \int_{-\infty}^{\infty} {{ 1 } \over { \sqrt{2} }} \exp \left[ - {{ 1 } \over { 2 }} z^{2} + tz \right] dz \\ =& {{ 1 } \over { \sqrt{\pi} }} \int_{-\infty}^{\infty} {{ 1 } \over { \sqrt{2} }} \exp \left[ - {{ 1 } \over { 2 }} \left( z - t \right)^{2} + {{ t^{2} } \over { 2 }} \right] dz \\ =& {{ 1 } \over { \sqrt{\pi} }} \int_{-\infty}^{\infty} {{ 1 } \over { \sqrt{2} }} \exp \left[ - {{ 1 } \over { 2 }} \left( z - t \right)^{2} \right] \exp \left[ {{ t^{2} } \over { 2 }} \right] dz \\ =& \exp \left[ {{ t^{2} } \over { 2 }} \right] {{ 1 } \over { \sqrt{\pi} }} \int_{-\infty}^{\infty} {{ 1 } \over { \sqrt{2} }} \exp \left[ - w^{2} \right] \sqrt{2} dw \\ =& \exp \left[ {{ t^{2} } \over { 2 }} \right] \end{align*}

Then the moment generating function of XN(μ,σ2)X \sim N \left( \mu , \sigma^{2} \right) is

mX(t)=E[exp(tX)]=E[exp(t(σZ+μ))]=exp(μt)E[exp(tσZ)]=exp(μt)exp(t2σ22)=exp(μt+σ2t22) \begin{align*} m_{X}(t) =& E \left[ \exp ( t X ) \right] \\ =& E \left[ \exp \left( t (\sigma Z + \mu) \right) \right] \\ =& \exp(\mu t) E \left[ \exp \left( t \sigma Z \right) \right] \\ =& \exp(\mu t) \exp \left( {{ t^{2} \sigma^{2} } \over { 2 }} \right) \\ =& \exp \left( \mu t + {{ \sigma^{2} t^{2} } \over { 2 }} \right) \end{align*}

[2]

Deduce directly with the moment generating function.

[3]

Deduced directly.

[4]

Deduced directly.

[a]

Apply the method of moments.

[b]

Derived directly from the probability density function. It uses the relationship between gamma functions, gamma distributions, and chi-square distributions.

[c]

Shown using the central limit theorem.

[d]

Shown using the moment generating function.

[e]

Not easy. Convergence of the probability density function is shown through Stirling’s approximation.

[f]

Simple yet complex. Directly deduced from the probability density function.

Code

The following is Julia code that shows the probability density function of the Cauchy distribution, the t-distribution, and the Cauchy distribution.

@time using LaTeXStrings
@time using Distributions
@time using Plots

cd(@__DIR__)

x = -4:0.1:4
plot(x, pdf.(Cauchy(), x),
 color = :red,
 label = "Cauchy", size = (400,300))
plot!(x, pdf.(TDist(3), x),
 color = :orange,
 label = "t(3)", size = (400,300))
plot!(x, pdf.(TDist(30), x),
 color = :black, linestyle = :dash,
 label = "t(30)", size = (400,300))
plot!(x, pdf.(Normal(), x),
 color = :black,
 label = "Standard Normal", size = (400,300))

xlims!(-4,5); ylims!(0,0.5); title!(L"\mathrm{pdf\,of\, t}(\nu)")
png("pdf")

  1. 후베르트 마니아. (2010). 뜨거운 몰입 (차가운 수의 세계에서 절대 질서를 찾아낸, 가우스 평전): p69~72. ↩︎

  2. Hogg et al. (2013). Introduction to Mathematical Statistcs(7th Edition): p171~172. ↩︎