Mathematical Proof of the Properties of Representative Values 📂Mathematical Statistics

Mathematical Proof of the Properties of Representative Values

Theorem

Let’s assume that we have given data $X = \left\{ x_{1} , \cdots , x_{n} \right\}$ .

[0]: The $\theta$ that minimizes $\displaystyle h(\theta)=\sum_{i=1}^{n} {|x_i - \theta|}^{0}$ is $\argmin_{\theta} h \left( \theta \right) = \text{mode}(X)$
[1]: The $\theta$ that minimizes $\displaystyle h(\theta)=\sum_{i=1}^{n} {|x_i - \theta|}^{1}$ is $\argmin_{\theta} h \left( \theta \right) = \text{median}(X)$
[2]: The $\theta$ that minimizes $\displaystyle h(\theta)=\sum_{i=1}^{n} {|x_i - \theta|}^{2}$ is $\argmin_{\theta} h \left( \theta \right) = \text{mean}(X)$

Explanation

In terms of linear algebra terminology, it can be stated as follows:

[0]: Minimizing the $l^{0}$ -norm is equivalent to the mode.
[1]: Minimizing the $l^{1}$ -norm is equivalent to the median.
[2]: Minimizing the $l^{2}$ -norm is equivalent to the mean.

These theorems provide mathematical justifications as to why specific values are considered representative. Especially, the case [2] implies that the mean is the representative value that minimizes variance, which could answer the question of ‘why is variance defined this way’.

Proof

Mode

Strategy: The $l^{0}$ -norm counts not the degree of non-equality but the number of inequalities.

$\left| x_{i} - \theta \right|^{0} := \begin{cases} 1 & , \theta \ne x_{i} \\ 0 & , \theta = x_{i} \end{cases}$ Therefore, the $\theta$ that minimizes $\displaystyle h(\theta)=\sum_{i=1}^{n} {|x_i - \theta|}^{0} = 1 + 0 + 1 + \cdots 1+ 1$ is $\text{mode}(X)$

■

Median

Strategy: Start by simplifying according to the definition of absolute values. Pair the largest and smallest terms in the data to eliminate variables and reduce them to constant terms. This makes it easy to find the variable term that needs to be minimized last.

Let’s denote it as $x_{(1)} \le x_{(2)} \le \cdots \le x_{(n)}$ .

Part 1. $\theta \in [x_{(1)} , x_{(n)} ]$ Assuming $\theta < x_{(1)}$ , all $x_{(i)}$ are smaller than $\theta$ , so $h(\theta)=\sum_{i=1}^{n} {\left( x_{(i)} - \theta \right) } > \sum_{i=1}^{n} { \left( x_{(i)} - x_{(1)} \right) }$ Assuming $x_{(n)} < \theta$ , all $x_{(i)}$ are larger than $\theta$ , so $h(\theta)=\sum_{i=1}^{n} { \left( \theta - x_{(i)} \right) } > \sum_{i=1}^{n} { \left( x_{(n)} - x_{(i)} \right) }$ Therefore, regardless of what $\theta$ specifically is, it must initially be $\theta \in [x_{(1)} , x_{(n)} ]$ .

Part 2.

For $\theta_{0} \in [x_{(1)} , x_{(n)} ]$ $\begin{align*} h(\theta_{0}) =& \sum_{i=1}^{n} | x_{(i)} - \theta_{0} | \\ =& \sum_{i=2}^{n-1} | x_{(i)} - \theta_{0} | + ( \theta_{0} - x_{(1)} ) + ( x_{(n)} - \theta_{0} ) \\ =& \sum_{i=2}^{n-1} | x_{(i)} - \theta_{0} | + ( x_{(n)} - x_{(1)} ) \end{align*}$

For $\theta_{1} \in [x_{(2)} , x_{(n-1)} ] \subset [x_{(1)} , x_{(n)} ]$ $\begin{align*} h(\theta_{1}) =& \sum_{i=1}^{n} | x_{(i)} - \theta_{1} | \\ =& \sum_{i=2}^{n-1} | x_{(i)} - \theta_{1} | + ( x_{(n)} - x_{(1)} ) \\ =& \sum_{i=3}^{n-2} | x_{(i)} - \theta_{1} | + ( x_{(n-1)} - x_{(2)} ) + ( x_{(n)} - x_{(1)} ) \end{align*}$

This way, whenever a suitable $\theta_{k} \in [x_{(1+k)} , x_{(n-k)} ]$ is chosen, $( x_{(n-k)} - x_{(1+k)} )$ can be brought outside the sigma notation. Since these terms are determined by the data $X$ , they are constant terms. For convenience, let’s express their sum as follows. $C_{k} : = \sum_{j=0}^{k} \left( x_{(n-j)} - x_{(j+1)} \right)$

Part 3.

Case 3-1. $n$ is odd

According to Part 2. $\begin{align*} h ( \theta ) =& \sum_{i=1}^{n} | x_{(i)} - \theta | \\ =& \sum_{i=1+k}^{n-k} | x_{(i)} - \theta | + C_{k} \\ =& \left| x_{\left( {{n+1} \over {2}} \right)} - \theta \right| + C_{{{n-1} \over {2}} - 1} \end{align*}$ Therefore, the value that minimizes $h( \theta )$ is $\theta = x_{\left( {{n+1} \over {2}} \right)}$ .

Case 3-2. $n$ is even

According to Part 2. $\begin{align*} h ( \theta ) =& \sum_{i=1}^{n} | x_{(i)} - \theta | \\ =& \sum_{i=1+k}^{n-k} | x_{(i)} - \theta | + C_{k} \\ =& \left| x_{\left( {{n} \over {2}} \right)} - \theta \right| + \left| x_{\left( {{n} \over {2}} + 1 \right)} - \theta \right| + C_{{{n} \over {2}} - 2} \end{align*}$ $20190521\_115121.png$ In this case, all $\displaystyle \theta \in \left[ x_{ \left( {{n} \over {2}} \right)} , x_{ \left( {{n} \over {2}} + 1 \right)} \right]$ make $h ( \theta )$ minimized.

Eventually, whether $n$ is even or odd, the $\theta$ that minimizes $h ( \theta)$ is the median of $X$ .

■

Mean

Strategy: It can be easily derived through differentiation.

${{ d } \over { d \theta }} \sum_{i=1}^{n} \left( x_{i} - \theta \right) = \sum_{i=1}^{n} 2 \left( x_{i} - \theta \right) = 0$ The $\theta$ that satisfies the above equation minimizes $\displaystyle h(\theta)=\sum_{i=1}^{n} {|x_i - \theta|}^{2}$ , thus $\displaystyle\sum_{i=1}^{n} 2 \left( x_{i} - \theta \right) = 0 \implies \sum_{i=1}^{n} x_{i} = n \theta \implies \theta = {{ 1 } \over { n }} \sum_{i=1}^{n} x_{i}$

■