Loss Functions and Risk Functions in Mathematical Statistics 📂Mathematical Statistics

Loss Functions and Risk Functions in Mathematical Statistics

Introduction¹

Let the random variables of a sample of size $n$ be denoted by $X$. Assume this sample was obtained from a distribution with probability density function $p(x; \theta)$. Denote the estimator of the parameter $\theta$ by $\phi(X)$ as follows.

Since the role of an estimator is to approximate the parameter $\theta$ well, it is natural to consider a function that measures the error between them. Because it measures a difference, it is reasonable that the function takes nonnegative values. Therefore, define a function $\mathcal{L}$ that represents the discrepancy between the two as follows.

Definition

A function $\mathcal{L}$ of the parameter $\theta$ and its estimator $\phi(X)$ is called the loss function.

$$ \mathcal{L} : (\theta, \phi(X)) \mapsto \mathcal{L}(\theta, \phi(X)) \in \mathbb{R}^{+} $$

Here $\mathbb{R}^{+} = \left\{ x \in \mathbb{R} : x \gt 0 \right\}$. The expectation of the loss function is called the risk function.

$$ R(\theta, \phi(X)) = E_{X}\left[ \mathcal{L}(\theta, \phi(X)) \right] = \int \mathcal{L}(\theta, \phi(x)) p(x; \theta) \mathrm{d}x $$

Explanation

Each time an estimator $\phi(X)$ is chosen, the discrepancy $\mathcal{L}$ with the parameter is fixed; in this context the estimator is also called a decision function. Often the initial letter of decision is used, and it is denoted by $\delta(X)$.

$$ R(\theta, \delta) = E_{X} \left[ \mathcal{L}(\theta, \delta(X)) \right] = \int \mathcal{L}(\theta, \delta(X)) p(x; \theta) \mathrm{d}x $$

Note that in the definition the form of the loss function is not uniquely specified. The loss function can be defined in many ways, but for it to work well as a loss function it should reflect the gap between parameter and estimator and have computational advantages, among other considerations. In this context, the risk function becomes an important criterion for selecting a loss function; it is desirable to choose a loss function that minimizes $R(\theta, \phi(X))$ for every $\theta$.

From the frequencyist perspective the parameter $\theta$ is a fixed value, whereas from the Bayesian perspective the parameter $\theta$ is treated as a random variable. In that case Risk function in Bayesian inference becomes the expectation with respect to the joint distribution.

$$ R(\Theta, \phi(X)) = E_{\Theta, X} \left[ \mathcal{L}(\Theta, \phi(X)) \right] = \int \int \mathcal{L}(\theta, \phi(x)) p(\theta, x) \mathrm{d}\theta \mathrm{d}x $$

Mean squared error

The following risk function is called the mean squared error.

$$ R(\theta, \phi(X)) = E_{X} \left[ \big( \theta - \phi(X) \big)^{2} \right] = \int \big( \theta - \phi(X) \big)^{2} p(x; \theta)\mathrm{d}x $$

The meaning is literal: it is the mean of the squared errors.