In Bayesian estimation, the Bayes estimator of the mean squared error is the expected value of the posterior distribution. 📂Mathematical Statistics

In Bayesian estimation, the Bayes estimator of the mean squared error is the expected value of the posterior distribution.

Theorem

The Bayes estimator under mean squared error is the expectation of the posterior distribution.

$$ \begin{align*} E_{\Theta}[\Theta | X] &= \argmin_{\phi} \int (\theta - \phi(x))^{2} p(\theta | x) \mathrm{d}\theta \\ &= \argmin_{\phi} E_{\Theta} \left[(\Theta - \phi(X))^2 | X \right] \end{align*} $$

Explanation

A Bayes estimator is an estimator $\phi(X)$ for the parameter $\theta$ that minimizes the following integral.

$$ \phi(X) = \argmin_{\phi} \int \mathcal{L}(\theta, \phi(x)) p(\theta | x) \mathrm{d}\theta $$

The above theorem states that when the loss function is the squared error $\mathcal{L}(\theta, \phi(x)) = (\theta - \phi(x))^{2}$, the quantity that minimizes the expected squared error is the expectation of the posterior distribution.

Proof

Analytical proof

To find the minimizer $\phi$, differentiate the integral above with respect to $\phi$ and solve for the $\phi$ that satisfies $0$.

$$ \begin{align*} & \dfrac{\mathrm{d} }{\mathrm{d} \phi(x)} \int (\theta - \phi(x))^{2} p(\theta | x) \mathrm{d}\theta \\ &= \int 2(\theta - \phi(x)) p(\theta | x) \mathrm{d}\theta \\ &= 2 \left( \int \theta p(\theta | x) \mathrm{d}\theta - \int \phi(x) p(\theta | x) \mathrm{d}\theta \right) \\ &= 0 \end{align*} $$

$$ \implies \int \phi(x) p(x | \theta) \mathrm{d}\theta = \int \theta p(\theta | x) \mathrm{d}\theta $$

Here, $\phi(x)$ on the left-hand side factors out of the integral, and what remains is the integral of the probability density function, which equals 1. The right-hand side is the expectation of the posterior distribution. Therefore we obtain the following.

$$ \phi(x) = \int \theta p(\theta | x) \mathrm{d}\theta = E_{\Theta} [\Theta | X] $$

Thus the Bayes estimator is the expectation of the posterior distribution.

$$ E_{\Theta}[\Theta | X] = \argmin_{\phi} \int (\theta - \phi(x))^{2} p(\theta | x) \mathrm{d}\theta = \argmin_{\phi} E_{\Theta} \left[(\Theta - \phi(X))^2 | X \right] $$

■

Algebraic proof

The expression to be minimized is as follows.

$$ E_{\Theta}[(\Theta - \phi(X))^2 | X] $$

Subtracting and adding the posterior expectation $\mu = E_{\Theta}[\Theta | X]$ inside the squared term gives:

$$ \begin{align*} &E_{\Theta}[(\Theta - \phi(X))^2 | X] \\ &= E_{\Theta}\left[ ((\Theta - \mu) + (\mu - \phi(X)))^2 | X \right] \\ &= E_{\Theta}\left[ (\Theta - \mu)^{2} + 2(\Theta - \mu)(\mu - \phi(X)) + (\mu - \phi(X))^{2} | X \right] \\ &= E_{\Theta}\left[ (\Theta - \mu)^{2} | X \right] + 2 E_{\Theta}\left[(\Theta - \mu)(\mu - \phi(X)) | X \right] + E_{\Theta}\left[(\mu - \phi(X))^{2} | X \right] \\ \end{align*} $$

Since $\mu$ is the mean of the posterior distribution, the first term is the posterior variance. The second term can be computed as follows.

$$ \begin{align*} & 2 E_{\Theta}\left[(\Theta - \mu)(\mu - \phi(X)) | X \right] \\ &= 2 (\mu - \phi(X)) E_{\Theta}\left[ (\Theta - \mu) | X \right] \\ &= 2 (\mu - \phi(X)) \left( E_{\Theta}\left[\Theta | X \right] - E_{\Theta}\left[ \mu | X \right] \right) \\ &= 2 (\mu - \phi(X)) \left( \mu - \mu \right) \\ &= 0 \end{align*} $$

The third term is constant, so its expectation is the same. Hence we obtain:

$$ E_{\Theta}[(\Theta - \phi(X))^2 | X] = \Var (\Theta | X) + (\mu - \phi(X))^2 $$

The first term, the posterior variance, does not depend on variations of $\phi$, so the Bayes estimator is the $\phi$ that makes the second squared term $0$. Therefore $\phi = \mu$, and since $\mu$ is the expectation of the posterior distribution, we obtain the following result.

$$ E_{\Theta}[\Theta | X] = \argmin_{\phi} E_{\Theta} \left[(\Theta - \phi(X))^2 | X \right] $$

■