Entropy of Normal Distribution
📂Probability Distribution Entropy of Normal Distribution Theorem The entropy of the normal distribution N ( μ , σ 2 ) N(\mu, \sigma^{2}) N ( μ , σ 2 ) (when using natural logarithms) is as follows.
H = 1 2 ln ( 2 π e σ 2 ) = ln 2 π e σ 2
H = \dfrac{1}{2} \ln (2\pi e \sigma^{2}) = \ln \sqrt{2\pi e \sigma^{2}}
H = 2 1 ln ( 2 π e σ 2 ) = ln 2 π e σ 2
The entropy of the multivariate normal distribution N p ( μ , Σ ) N_{p}(\boldsymbol{\mu}, \Sigma) N p ( μ , Σ ) is as follows.
H = 1 2 ln [ ( 2 π e ) p ∣ Σ ∣ ] = 1 2 ln ( det ( 2 π e Σ ) )
H = \dfrac{1}{2}\ln \left[ (2 \pi e)^{p} \left| \Sigma \right| \right] = \dfrac{1}{2}\ln (\det (2\pi e \Sigma))
H = 2 1 ln [ ( 2 π e ) p ∣ Σ ∣ ] = 2 1 ln ( det ( 2 π e Σ ))
∣ Σ ∣ \left| \Sigma \right| ∣ Σ ∣ is the determinant of the covariance matrix .
Description The mean μ \mu μ does not affect the entropy. The entropy of the standard normal distribution N ( 0 , 1 ) N(0,1) N ( 0 , 1 ) when using natural logarithms is approximately H = ln 2 π e ≈ 1.4189385332046727 H = \ln \sqrt{2\pi e } \approx 1.4189385332046727 H = ln 2 π e ≈ 1.4189385332046727 . Even if a log base 2 2 2 is used, the form of the formula remains the same, and its value is,
H = log 2 2 π e ≈ 2.047095585180641
H = \log_{2} \sqrt{2\pi e } \approx 2.047095585180641
H = log 2 2 π e ≈ 2.047095585180641
Proof Univariate Normal Distribution To show this, we use that the integral of p ( x ) = 1 2 π σ 2 exp ( − ( x − μ ) 2 2 σ 2 ) p(x) = \dfrac{1}{\sqrt{2\pi\sigma^{2}}}\exp\left( - \dfrac{(x-\mu)^{2}}{2\sigma^{2}} \right) p ( x ) = 2 π σ 2 1 exp ( − 2 σ 2 ( x − μ ) 2 ) is 1 1 1 .
H = − ∫ − ∞ ∞ p ( x ) ln p ( x ) d x = − ∫ − ∞ ∞ p ( x ) ln [ 1 2 π σ 2 exp ( − ( x − μ ) 2 2 σ 2 ) ] d x = − ∫ − ∞ ∞ p ( x ) ln 1 2 π σ 2 d x − ∫ − ∞ ∞ p ( x ) ln exp ( − ( x − μ ) 2 2 σ 2 ) d x = − ln 1 2 π σ 2 + ∫ − ∞ ∞ p ( x ) ( x − μ ) 2 2 σ 2 d x = ln 2 π σ 2 + 1 2 σ 2 ∫ − ∞ ∞ p ( x ) ( x − μ ) 2 d x = ln 2 π σ 2 + 1 2 σ 2 E [ ( X − μ ) 2 ] = ln 2 π σ 2 + 1 2 σ 2 σ 2 = ln 2 π σ 2 + 1 2 = ln 2 π σ 2 + ln e = ln 2 π e σ 2
\begin{align*}
H
&= - \int_{-\infty}^{\infty} p(x) \ln p(x) dx \\
&= - \int_{-\infty}^{\infty} p(x) \ln \left[ \dfrac{1}{\sqrt{2\pi\sigma^{2}}}\exp\left( - \dfrac{(x-\mu)^{2}}{2\sigma^{2}} \right) \right] dx \\
&= - \int_{-\infty}^{\infty} p(x) \ln \dfrac{1}{\sqrt{2\pi\sigma^{2}}} dx - \int_{-\infty}^{\infty} p(x) \ln \exp\left( - \dfrac{(x-\mu)^{2}}{2\sigma^{2}} \right) dx \\
&= -\ln \dfrac{1}{\sqrt{2\pi\sigma^{2}}} + \int_{-\infty}^{\infty} p(x) \dfrac{(x-\mu)^{2}}{2\sigma^{2}} dx \\
&= \ln \sqrt{2\pi\sigma^{2}} + \dfrac{1}{2\sigma^{2}}\int_{-\infty}^{\infty} p(x) (x-\mu)^{2} dx \\
&= \ln \sqrt{2\pi\sigma^{2}} + \dfrac{1}{2\sigma^{2}} E[(X-\mu)^{2}] \\
&= \ln \sqrt{2\pi\sigma^{2}} + \dfrac{1}{2\sigma^{2}}\sigma^{2} \\
&= \ln \sqrt{2\pi\sigma^{2}} + \dfrac{1}{2} \\
&= \ln \sqrt{2\pi\sigma^{2}} + \ln \sqrt{e} \\
&= \ln \sqrt{2\pi e \sigma^{2}}
\end{align*}
H = − ∫ − ∞ ∞ p ( x ) ln p ( x ) d x = − ∫ − ∞ ∞ p ( x ) ln [ 2 π σ 2 1 exp ( − 2 σ 2 ( x − μ ) 2 ) ] d x = − ∫ − ∞ ∞ p ( x ) ln 2 π σ 2 1 d x − ∫ − ∞ ∞ p ( x ) ln exp ( − 2 σ 2 ( x − μ ) 2 ) d x = − ln 2 π σ 2 1 + ∫ − ∞ ∞ p ( x ) 2 σ 2 ( x − μ ) 2 d x = ln 2 π σ 2 + 2 σ 2 1 ∫ − ∞ ∞ p ( x ) ( x − μ ) 2 d x = ln 2 π σ 2 + 2 σ 2 1 E [( X − μ ) 2 ] = ln 2 π σ 2 + 2 σ 2 1 σ 2 = ln 2 π σ 2 + 2 1 = ln 2 π σ 2 + ln e = ln 2 π e σ 2
■
Multivariate Normal Distribution Since the probability density function of the multivariate normal distribution is p ( x ) = 1 ( 2 π ) p ∣ Σ ∣ exp ( − 1 2 ( x − μ ) T Σ − 1 ( x − μ ) ) p(\mathbf{x}) = \dfrac{1}{\sqrt{(2\pi)^{p} \left| \Sigma \right|}} \exp \left( -\dfrac{1}{2}(\mathbf{x} - \boldsymbol{\mu})^{T} \Sigma^{-1} (\mathbf{x} - \boldsymbol{\mu}) \right) p ( x ) = ( 2 π ) p ∣ Σ ∣ 1 exp ( − 2 1 ( x − μ ) T Σ − 1 ( x − μ ) ) ,
H ( p ) = − ∫ p ( x ) ln ( p ( x ) ) d x = − ∫ p ( x ) ln [ 1 ( 2 π ) p ∣ Σ ∣ exp ( − 1 2 ( x − μ ) T Σ − 1 ( x − μ ) ) ] = − ∫ p ( x ) ln ( 1 ( 2 π ) p ∣ Σ ∣ ) d x + 1 2 ∫ p ( x ) ( x − μ ) T Σ − 1 ( x − μ ) d x = − ln ( 1 ( 2 π ) p ∣ Σ ∣ ) ∫ p ( x ) d x + 1 2 E [ ( x − μ ) T Σ − 1 ( x − μ ) ] = − ln ( 1 ( 2 π ) p ∣ Σ ∣ ) + 1 2 E [ ( x − μ ) T Σ − 1 ( x − μ ) ]
\begin{align*}
H(p)
&= -\int p(\mathbf{x}) \ln(p(\mathbf{x}))d \mathbf{x} \\
&= -\int p(\mathbf{x}) \ln \left[ \dfrac{1}{\sqrt{(2\pi)^{p} \left| \Sigma \right|}} \exp \left( -\dfrac{1}{2}(\mathbf{x} - \boldsymbol{\mu})^{T} \Sigma^{-1} (\mathbf{x} - \boldsymbol{\mu}) \right) \right] \\
&= -\int p(\mathbf{x}) \ln\left( \dfrac{1}{\sqrt{(2\pi)^{p} \left| \Sigma \right|}} \right)d \mathbf{x} + \dfrac{1}{2}\int p(\mathbf{x}) (\mathbf{x} - \boldsymbol{\mu})^{T} \Sigma^{-1} (\mathbf{x} - \boldsymbol{\mu})d \mathbf{x} \\
&= -\ln\left( \dfrac{1}{\sqrt{(2\pi)^{p} \left| \Sigma \right|}} \right)\int p(\mathbf{x}) d \mathbf{x} + \dfrac{1}{2} E \left[ (\mathbf{x} - \boldsymbol{\mu})^{T} \Sigma^{-1} (\mathbf{x} - \boldsymbol{\mu}) \right] \\
&= -\ln\left( \dfrac{1}{\sqrt{(2\pi)^{p} \left| \Sigma \right|}} \right) + \dfrac{1}{2} E \left[ (\mathbf{x} - \boldsymbol{\mu})^{T} \Sigma^{-1} (\mathbf{x} - \boldsymbol{\mu}) \right]
\end{align*}
H ( p ) = − ∫ p ( x ) ln ( p ( x )) d x = − ∫ p ( x ) ln [ ( 2 π ) p ∣ Σ ∣ 1 exp ( − 2 1 ( x − μ ) T Σ − 1 ( x − μ ) ) ] = − ∫ p ( x ) ln ( ( 2 π ) p ∣ Σ ∣ 1 ) d x + 2 1 ∫ p ( x ) ( x − μ ) T Σ − 1 ( x − μ ) d x = − ln ( ( 2 π ) p ∣ Σ ∣ 1 ) ∫ p ( x ) d x + 2 1 E [ ( x − μ ) T Σ − 1 ( x − μ ) ] = − ln ( ( 2 π ) p ∣ Σ ∣ 1 ) + 2 1 E [ ( x − μ ) T Σ − 1 ( x − μ ) ]
The second term is calculated as follows.
E [ ( x − μ ) T Σ − 1 ( x − μ ) ] = E [ tr ( ( x − μ ) T Σ − 1 ( x − μ ) ) ] = E [ tr ( Σ − 1 ( x − μ ) ( x − μ ) T ) ] = tr [ E ( Σ − 1 ( x − μ ) ( x − μ ) T ) ] = tr [ Σ − 1 E ( ( x − μ ) ( x − μ ) T ) ] = tr [ Σ − 1 Σ ] = tr [ I p × p ] = p
\begin{align*}
E \left[ (\mathbf{x} - \boldsymbol{\mu})^{T} \Sigma^{-1} (\mathbf{x} - \boldsymbol{\mu}) \right]
&= E \left[ \tr \left( (\mathbf{x} - \boldsymbol{\mu})^{T} \Sigma^{-1} (\mathbf{x} - \boldsymbol{\mu}) \right) \right] \\
&= E \left[ \tr \left( \Sigma^{-1} (\mathbf{x} - \boldsymbol{\mu}) (\mathbf{x} - \boldsymbol{\mu})^{T} \right) \right] \\
&= \tr \left[ E \left( \Sigma^{-1} (\mathbf{x} - \boldsymbol{\mu}) (\mathbf{x} - \boldsymbol{\mu})^{T} \right) \right] \\
&= \tr \left[ \Sigma^{-1} E \left( (\mathbf{x} - \boldsymbol{\mu}) (\mathbf{x} - \boldsymbol{\mu})^{T} \right) \right] \\
&= \tr \left[ \Sigma^{-1} \Sigma \right] \\
&= \tr \left[ I_{p\times p} \right] \\
&= p
\end{align*}
E [ ( x − μ ) T Σ − 1 ( x − μ ) ] = E [ tr ( ( x − μ ) T Σ − 1 ( x − μ ) ) ] = E [ tr ( Σ − 1 ( x − μ ) ( x − μ ) T ) ] = tr [ E ( Σ − 1 ( x − μ ) ( x − μ ) T ) ] = tr [ Σ − 1 E ( ( x − μ ) ( x − μ ) T ) ] = tr [ Σ − 1 Σ ] = tr [ I p × p ] = p
Therefore, the entropy is as follows.
H ( p ) = − ln ( 1 ( 2 π ) p ∣ Σ ∣ ) + 1 2 E [ ( x − μ ) T Σ − 1 ( x − μ ) ] = 1 2 ln [ ( 2 π ) p ∣ Σ ∣ ] + 1 2 p = 1 2 ln [ ( 2 π ) p ∣ Σ ∣ ] + 1 2 ln e p = 1 2 ln [ ( 2 π e ) p ∣ Σ ∣ ]
\begin{align*}
H(p)
&= -\ln\left( \dfrac{1}{\sqrt{(2\pi)^{p} \left| \Sigma \right|}} \right) + \dfrac{1}{2} E \left[ (\mathbf{x} - \boldsymbol{\mu})^{T} \Sigma^{-1} (\mathbf{x} - \boldsymbol{\mu}) \right] \\
&= \dfrac{1}{2} \ln \left[ (2\pi)^{p} \left| \Sigma \right| \right] + \dfrac{1}{2}p \\
&= \dfrac{1}{2} \ln \left[ (2\pi)^{p} \left| \Sigma \right| \right] + \dfrac{1}{2}\ln e^{p} \\
&= \dfrac{1}{2} \ln \left[ (2\pi e)^{p} \left| \Sigma \right| \right]
\end{align*}
H ( p ) = − ln ( ( 2 π ) p ∣ Σ ∣ 1 ) + 2 1 E [ ( x − μ ) T Σ − 1 ( x − μ ) ] = 2 1 ln [ ( 2 π ) p ∣ Σ ∣ ] + 2 1 p = 2 1 ln [ ( 2 π ) p ∣ Σ ∣ ] + 2 1 ln e p = 2 1 ln [ ( 2 π e ) p ∣ Σ ∣ ]
■