logo

Entropy of Normal Distribution 📂Probability Distribution

Entropy of Normal Distribution

Theorem

The entropy of the normal distribution N(μ,σ2)N(\mu, \sigma^{2}) (when using natural logarithms) is as follows.

H=12ln(2πeσ2)=ln2πeσ2 H = \dfrac{1}{2} \ln (2\pi e \sigma^{2}) = \ln \sqrt{2\pi e \sigma^{2}}

The entropy of the multivariate normal distribution Np(μ,Σ)N_{p}(\boldsymbol{\mu}, \Sigma) is as follows.

H=12ln[(2πe)pΣ]=12ln(det(2πeΣ)) H = \dfrac{1}{2}\ln \left[ (2 \pi e)^{p} \left| \Sigma \right| \right] = \dfrac{1}{2}\ln (\det (2\pi e \Sigma))

Σ\left| \Sigma \right| is the determinant of the covariance matrix.

Description

The mean μ\mu does not affect the entropy. The entropy of the standard normal distribution N(0,1)N(0,1) when using natural logarithms is approximately H=ln2πe1.4189385332046727H = \ln \sqrt{2\pi e } \approx 1.4189385332046727. Even if a log base 22 is used, the form of the formula remains the same, and its value is,

H=log22πe2.047095585180641 H = \log_{2} \sqrt{2\pi e } \approx 2.047095585180641

Proof

Univariate Normal Distribution

To show this, we use that the integral of p(x)=12πσ2exp((xμ)22σ2)p(x) = \dfrac{1}{\sqrt{2\pi\sigma^{2}}}\exp\left( - \dfrac{(x-\mu)^{2}}{2\sigma^{2}} \right) is 11.

H=p(x)lnp(x)dx=p(x)ln[12πσ2exp((xμ)22σ2)]dx=p(x)ln12πσ2dxp(x)lnexp((xμ)22σ2)dx=ln12πσ2+p(x)(xμ)22σ2dx=ln2πσ2+12σ2p(x)(xμ)2dx=ln2πσ2+12σ2E[(Xμ)2]=ln2πσ2+12σ2σ2=ln2πσ2+12=ln2πσ2+lne=ln2πeσ2 \begin{align*} H &= - \int_{-\infty}^{\infty} p(x) \ln p(x) dx \\ &= - \int_{-\infty}^{\infty} p(x) \ln \left[ \dfrac{1}{\sqrt{2\pi\sigma^{2}}}\exp\left( - \dfrac{(x-\mu)^{2}}{2\sigma^{2}} \right) \right] dx \\ &= - \int_{-\infty}^{\infty} p(x) \ln \dfrac{1}{\sqrt{2\pi\sigma^{2}}} dx - \int_{-\infty}^{\infty} p(x) \ln \exp\left( - \dfrac{(x-\mu)^{2}}{2\sigma^{2}} \right) dx \\ &= -\ln \dfrac{1}{\sqrt{2\pi\sigma^{2}}} + \int_{-\infty}^{\infty} p(x) \dfrac{(x-\mu)^{2}}{2\sigma^{2}} dx \\ &= \ln \sqrt{2\pi\sigma^{2}} + \dfrac{1}{2\sigma^{2}}\int_{-\infty}^{\infty} p(x) (x-\mu)^{2} dx \\ &= \ln \sqrt{2\pi\sigma^{2}} + \dfrac{1}{2\sigma^{2}} E[(X-\mu)^{2}] \\ &= \ln \sqrt{2\pi\sigma^{2}} + \dfrac{1}{2\sigma^{2}}\sigma^{2} \\ &= \ln \sqrt{2\pi\sigma^{2}} + \dfrac{1}{2} \\ &= \ln \sqrt{2\pi\sigma^{2}} + \ln \sqrt{e} \\ &= \ln \sqrt{2\pi e \sigma^{2}} \end{align*}

Multivariate Normal Distribution

Since the probability density function of the multivariate normal distribution is p(x)=1(2π)pΣexp(12(xμ)TΣ1(xμ))p(\mathbf{x}) = \dfrac{1}{\sqrt{(2\pi)^{p} \left| \Sigma \right|}} \exp \left( -\dfrac{1}{2}(\mathbf{x} - \boldsymbol{\mu})^{T} \Sigma^{-1} (\mathbf{x} - \boldsymbol{\mu}) \right),

H(p)=p(x)ln(p(x))dx=p(x)ln[1(2π)pΣexp(12(xμ)TΣ1(xμ))]=p(x)ln(1(2π)pΣ)dx+12p(x)(xμ)TΣ1(xμ)dx=ln(1(2π)pΣ)p(x)dx+12E[(xμ)TΣ1(xμ)]=ln(1(2π)pΣ)+12E[(xμ)TΣ1(xμ)] \begin{align*} H(p) &= -\int p(\mathbf{x}) \ln(p(\mathbf{x}))d \mathbf{x} \\ &= -\int p(\mathbf{x}) \ln \left[ \dfrac{1}{\sqrt{(2\pi)^{p} \left| \Sigma \right|}} \exp \left( -\dfrac{1}{2}(\mathbf{x} - \boldsymbol{\mu})^{T} \Sigma^{-1} (\mathbf{x} - \boldsymbol{\mu}) \right) \right] \\ &= -\int p(\mathbf{x}) \ln\left( \dfrac{1}{\sqrt{(2\pi)^{p} \left| \Sigma \right|}} \right)d \mathbf{x} + \dfrac{1}{2}\int p(\mathbf{x}) (\mathbf{x} - \boldsymbol{\mu})^{T} \Sigma^{-1} (\mathbf{x} - \boldsymbol{\mu})d \mathbf{x} \\ &= -\ln\left( \dfrac{1}{\sqrt{(2\pi)^{p} \left| \Sigma \right|}} \right)\int p(\mathbf{x}) d \mathbf{x} + \dfrac{1}{2} E \left[ (\mathbf{x} - \boldsymbol{\mu})^{T} \Sigma^{-1} (\mathbf{x} - \boldsymbol{\mu}) \right] \\ &= -\ln\left( \dfrac{1}{\sqrt{(2\pi)^{p} \left| \Sigma \right|}} \right) + \dfrac{1}{2} E \left[ (\mathbf{x} - \boldsymbol{\mu})^{T} \Sigma^{-1} (\mathbf{x} - \boldsymbol{\mu}) \right] \end{align*}

The second term is calculated as follows.

E[(xμ)TΣ1(xμ)]=E[tr((xμ)TΣ1(xμ))]=E[tr(Σ1(xμ)(xμ)T)]=tr[E(Σ1(xμ)(xμ)T)]=tr[Σ1E((xμ)(xμ)T)]=tr[Σ1Σ]=tr[Ip×p]=p \begin{align*} E \left[ (\mathbf{x} - \boldsymbol{\mu})^{T} \Sigma^{-1} (\mathbf{x} - \boldsymbol{\mu}) \right] &= E \left[ \tr \left( (\mathbf{x} - \boldsymbol{\mu})^{T} \Sigma^{-1} (\mathbf{x} - \boldsymbol{\mu}) \right) \right] \\ &= E \left[ \tr \left( \Sigma^{-1} (\mathbf{x} - \boldsymbol{\mu}) (\mathbf{x} - \boldsymbol{\mu})^{T} \right) \right] \\ &= \tr \left[ E \left( \Sigma^{-1} (\mathbf{x} - \boldsymbol{\mu}) (\mathbf{x} - \boldsymbol{\mu})^{T} \right) \right] \\ &= \tr \left[ \Sigma^{-1} E \left( (\mathbf{x} - \boldsymbol{\mu}) (\mathbf{x} - \boldsymbol{\mu})^{T} \right) \right] \\ &= \tr \left[ \Sigma^{-1} \Sigma \right] \\ &= \tr \left[ I_{p\times p} \right] \\ &= p \end{align*}

Therefore, the entropy is as follows.

H(p)=ln(1(2π)pΣ)+12E[(xμ)TΣ1(xμ)]=12ln[(2π)pΣ]+12p=12ln[(2π)pΣ]+12lnep=12ln[(2πe)pΣ] \begin{align*} H(p) &= -\ln\left( \dfrac{1}{\sqrt{(2\pi)^{p} \left| \Sigma \right|}} \right) + \dfrac{1}{2} E \left[ (\mathbf{x} - \boldsymbol{\mu})^{T} \Sigma^{-1} (\mathbf{x} - \boldsymbol{\mu}) \right] \\ &= \dfrac{1}{2} \ln \left[ (2\pi)^{p} \left| \Sigma \right| \right] + \dfrac{1}{2}p \\ &= \dfrac{1}{2} \ln \left[ (2\pi)^{p} \left| \Sigma \right| \right] + \dfrac{1}{2}\ln e^{p} \\ &= \dfrac{1}{2} \ln \left[ (2\pi e)^{p} \left| \Sigma \right| \right] \end{align*}