Mathematical Proof of the Properties of Representative Values
Theorem
Let’s assume that we have given data $X = \left\{ x_{1} , \cdots , x_{n} \right\}$.
- [0]: The $\theta$ that minimizes $\displaystyle h(\theta)=\sum_{i=1}^{n} {|x_i - \theta|}^{0}$ is $$ \argmin_{\theta} h \left( \theta \right) = \text{mode}(X) $$
- [1]: The $\theta$ that minimizes $\displaystyle h(\theta)=\sum_{i=1}^{n} {|x_i - \theta|}^{1}$ is $$ \argmin_{\theta} h \left( \theta \right) = \text{median}(X) $$
- [2]: The $\theta$ that minimizes $\displaystyle h(\theta)=\sum_{i=1}^{n} {|x_i - \theta|}^{2}$ is $$ \argmin_{\theta} h \left( \theta \right) = \text{mean}(X) $$
Explanation
In terms of linear algebra terminology, it can be stated as follows:
- [0]: Minimizing the $l^{0}$-norm is equivalent to the mode.
- [1]: Minimizing the $l^{1}$-norm is equivalent to the median.
- [2]: Minimizing the $l^{2}$-norm is equivalent to the mean.
These theorems provide mathematical justifications as to why specific values are considered representative. Especially, the case [2] implies that the mean is the representative value that minimizes variance, which could answer the question of ‘why is variance defined this way’.
Proof
Mode
Strategy: The $l^{0}$-norm counts not the degree of non-equality but the number of inequalities.
$$ \left| x_{i} - \theta \right|^{0} := \begin{cases} 1 & , \theta \ne x_{i} \\ 0 & , \theta = x_{i} \end{cases} $$ Therefore, the $\theta$ that minimizes $\displaystyle h(\theta)=\sum_{i=1}^{n} {|x_i - \theta|}^{0} = 1 + 0 + 1 + \cdots 1+ 1$ is $\text{mode}(X)$
■
Median
Strategy: Start by simplifying according to the definition of absolute values. Pair the largest and smallest terms in the data to eliminate variables and reduce them to constant terms. This makes it easy to find the variable term that needs to be minimized last.
Let’s denote it as $ x_{(1)} \le x_{(2)} \le \cdots \le x_{(n)}$.
Part 1. $\theta \in [x_{(1)} , x_{(n)} ]$ Assuming $\theta < x_{(1)}$, all $x_{(i)}$ are smaller than $\theta$, so $$ h(\theta)=\sum_{i=1}^{n} {\left( x_{(i)} - \theta \right) } > \sum_{i=1}^{n} { \left( x_{(i)} - x_{(1)} \right) } $$ Assuming $ x_{(n)} < \theta$, all $x_{(i)}$ are larger than $\theta$, so $$ h(\theta)=\sum_{i=1}^{n} { \left( \theta - x_{(i)} \right) } > \sum_{i=1}^{n} { \left( x_{(n)} - x_{(i)} \right) } $$ Therefore, regardless of what $\theta$ specifically is, it must initially be $\theta \in [x_{(1)} , x_{(n)} ]$.
Part 2.
For $\theta_{0} \in [x_{(1)} , x_{(n)} ]$ $$ \begin{align*} h(\theta_{0}) =& \sum_{i=1}^{n} | x_{(i)} - \theta_{0} | \\ =& \sum_{i=2}^{n-1} | x_{(i)} - \theta_{0} | + ( \theta_{0} - x_{(1)} ) + ( x_{(n)} - \theta_{0} ) \\ =& \sum_{i=2}^{n-1} | x_{(i)} - \theta_{0} | + ( x_{(n)} - x_{(1)} ) \end{align*} $$
For $\theta_{1} \in [x_{(2)} , x_{(n-1)} ] \subset [x_{(1)} , x_{(n)} ]$ $$ \begin{align*} h(\theta_{1}) =& \sum_{i=1}^{n} | x_{(i)} - \theta_{1} | \\ =& \sum_{i=2}^{n-1} | x_{(i)} - \theta_{1} | + ( x_{(n)} - x_{(1)} ) \\ =& \sum_{i=3}^{n-2} | x_{(i)} - \theta_{1} | + ( x_{(n-1)} - x_{(2)} ) + ( x_{(n)} - x_{(1)} ) \end{align*} $$
This way, whenever a suitable $\theta_{k} \in [x_{(1+k)} , x_{(n-k)} ]$ is chosen, $( x_{(n-k)} - x_{(1+k)} )$ can be brought outside the sigma notation. Since these terms are determined by the data $X$, they are constant terms. For convenience, let’s express their sum as follows. $$ C_{k} : = \sum_{j=0}^{k} \left( x_{(n-j)} - x_{(j+1)} \right) $$
Part 3.
Case 3-1. $ n$ is odd
- According to Part 2. $$ \begin{align*} h ( \theta ) =& \sum_{i=1}^{n} | x_{(i)} - \theta | \\ =& \sum_{i=1+k}^{n-k} | x_{(i)} - \theta | + C_{k} \\ =& \left| x_{\left( {{n+1} \over {2}} \right)} - \theta \right| + C_{{{n-1} \over {2}} - 1} \end{align*} $$ Therefore, the value that minimizes $h( \theta )$ is $\theta = x_{\left( {{n+1} \over {2}} \right)}$.
Case 3-2. $ n$ is even
- According to Part 2. $$ \begin{align*} h ( \theta ) =& \sum_{i=1}^{n} | x_{(i)} - \theta | \\ =& \sum_{i=1+k}^{n-k} | x_{(i)} - \theta | + C_{k} \\ =& \left| x_{\left( {{n} \over {2}} \right)} - \theta \right| + \left| x_{\left( {{n} \over {2}} + 1 \right)} - \theta \right| + C_{{{n} \over {2}} - 2} \end{align*} $$ In this case, all $\displaystyle \theta \in \left[ x_{ \left( {{n} \over {2}} \right)} , x_{ \left( {{n} \over {2}} + 1 \right)} \right]$ make $h ( \theta )$ minimized.
Eventually, whether $n$ is even or odd, the $\theta$ that minimizes $h ( \theta)$ is the median of $X$.
■
Mean
Strategy: It can be easily derived through differentiation.
$$ {{ d } \over { d \theta }} \sum_{i=1}^{n} \left( x_{i} - \theta \right) = \sum_{i=1}^{n} 2 \left( x_{i} - \theta \right) = 0 $$ The $\theta$ that satisfies the above equation minimizes $\displaystyle h(\theta)=\sum_{i=1}^{n} {|x_i - \theta|}^{2}$, thus $$ \displaystyle\sum_{i=1}^{n} 2 \left( x_{i} - \theta \right) = 0 \implies \sum_{i=1}^{n} x_{i} = n \theta \implies \theta = {{ 1 } \over { n }} \sum_{i=1}^{n} x_{i} $$
■