logo

Mathematical Proof of the Properties of Representative Values 📂Mathematical Statistics

Mathematical Proof of the Properties of Representative Values

Theorem

Let’s assume that we have given data X={x1,,xn}X = \left\{ x_{1} , \cdots , x_{n} \right\}.

  • [0]: The θ\theta that minimizes h(θ)=i=1nxiθ0\displaystyle h(\theta)=\sum_{i=1}^{n} {|x_i - \theta|}^{0} is arg minθh(θ)=mode(X) \argmin_{\theta} h \left( \theta \right) = \text{mode}(X)
  • [1]: The θ\theta that minimizes h(θ)=i=1nxiθ1\displaystyle h(\theta)=\sum_{i=1}^{n} {|x_i - \theta|}^{1} is arg minθh(θ)=median(X) \argmin_{\theta} h \left( \theta \right) = \text{median}(X)
  • [2]: The θ\theta that minimizes h(θ)=i=1nxiθ2\displaystyle h(\theta)=\sum_{i=1}^{n} {|x_i - \theta|}^{2} is arg minθh(θ)=mean(X) \argmin_{\theta} h \left( \theta \right) = \text{mean}(X)

Explanation

In terms of linear algebra terminology, it can be stated as follows:

  • [0]: Minimizing the l0l^{0}-norm is equivalent to the mode.
  • [1]: Minimizing the l1l^{1}-norm is equivalent to the median.
  • [2]: Minimizing the l2l^{2}-norm is equivalent to the mean.

These theorems provide mathematical justifications as to why specific values are considered representative. Especially, the case [2] implies that the mean is the representative value that minimizes variance, which could answer the question of ‘why is variance defined this way’.

Proof

Mode

Strategy: The l0l^{0}-norm counts not the degree of non-equality but the number of inequalities.


xiθ0:={1,θxi0,θ=xi \left| x_{i} - \theta \right|^{0} := \begin{cases} 1 & , \theta \ne x_{i} \\ 0 & , \theta = x_{i} \end{cases} Therefore, the θ\theta that minimizes h(θ)=i=1nxiθ0=1+0+1+1+1\displaystyle h(\theta)=\sum_{i=1}^{n} {|x_i - \theta|}^{0} = 1 + 0 + 1 + \cdots 1+ 1 is mode(X)\text{mode}(X)

Median

Strategy: Start by simplifying according to the definition of absolute values. Pair the largest and smallest terms in the data to eliminate variables and reduce them to constant terms. This makes it easy to find the variable term that needs to be minimized last.


Let’s denote it as x(1)x(2)x(n) x_{(1)} \le x_{(2)} \le \cdots \le x_{(n)}.

Part 1. θ[x(1),x(n)]\theta \in [x_{(1)} , x_{(n)} ] Assuming θ<x(1)\theta < x_{(1)}, all x(i)x_{(i)} are smaller than θ\theta, so h(θ)=i=1n(x(i)θ)>i=1n(x(i)x(1)) h(\theta)=\sum_{i=1}^{n} {\left( x_{(i)} - \theta \right) } > \sum_{i=1}^{n} { \left( x_{(i)} - x_{(1)} \right) } Assuming x(n)<θ x_{(n)} < \theta, all x(i)x_{(i)} are larger than θ\theta, so h(θ)=i=1n(θx(i))>i=1n(x(n)x(i)) h(\theta)=\sum_{i=1}^{n} { \left( \theta - x_{(i)} \right) } > \sum_{i=1}^{n} { \left( x_{(n)} - x_{(i)} \right) } Therefore, regardless of what θ\theta specifically is, it must initially be θ[x(1),x(n)]\theta \in [x_{(1)} , x_{(n)} ].


Part 2.

For θ0[x(1),x(n)]\theta_{0} \in [x_{(1)} , x_{(n)} ] h(θ0)=i=1nx(i)θ0=i=2n1x(i)θ0+(θ0x(1))+(x(n)θ0)=i=2n1x(i)θ0+(x(n)x(1)) \begin{align*} h(\theta_{0}) =& \sum_{i=1}^{n} | x_{(i)} - \theta_{0} | \\ =& \sum_{i=2}^{n-1} | x_{(i)} - \theta_{0} | + ( \theta_{0} - x_{(1)} ) + ( x_{(n)} - \theta_{0} ) \\ =& \sum_{i=2}^{n-1} | x_{(i)} - \theta_{0} | + ( x_{(n)} - x_{(1)} ) \end{align*}

For θ1[x(2),x(n1)][x(1),x(n)]\theta_{1} \in [x_{(2)} , x_{(n-1)} ] \subset [x_{(1)} , x_{(n)} ] h(θ1)=i=1nx(i)θ1=i=2n1x(i)θ1+(x(n)x(1))=i=3n2x(i)θ1+(x(n1)x(2))+(x(n)x(1)) \begin{align*} h(\theta_{1}) =& \sum_{i=1}^{n} | x_{(i)} - \theta_{1} | \\ =& \sum_{i=2}^{n-1} | x_{(i)} - \theta_{1} | + ( x_{(n)} - x_{(1)} ) \\ =& \sum_{i=3}^{n-2} | x_{(i)} - \theta_{1} | + ( x_{(n-1)} - x_{(2)} ) + ( x_{(n)} - x_{(1)} ) \end{align*}

This way, whenever a suitable θk[x(1+k),x(nk)]\theta_{k} \in [x_{(1+k)} , x_{(n-k)} ] is chosen, (x(nk)x(1+k))( x_{(n-k)} - x_{(1+k)} ) can be brought outside the sigma notation. Since these terms are determined by the data XX, they are constant terms. For convenience, let’s express their sum as follows. Ck:=j=0k(x(nj)x(j+1)) C_{k} : = \sum_{j=0}^{k} \left( x_{(n-j)} - x_{(j+1)} \right)


Part 3.

Case 3-1. n n is odd

  • According to Part 2. h(θ)=i=1nx(i)θ=i=1+knkx(i)θ+Ck=x(n+12)θ+Cn121 \begin{align*} h ( \theta ) =& \sum_{i=1}^{n} | x_{(i)} - \theta | \\ =& \sum_{i=1+k}^{n-k} | x_{(i)} - \theta | + C_{k} \\ =& \left| x_{\left( {{n+1} \over {2}} \right)} - \theta \right| + C_{{{n-1} \over {2}} - 1} \end{align*} Therefore, the value that minimizes h(θ)h( \theta ) is θ=x(n+12)\theta = x_{\left( {{n+1} \over {2}} \right)}.

Case 3-2. n n is even

  • According to Part 2. h(θ)=i=1nx(i)θ=i=1+knkx(i)θ+Ck=x(n2)θ+x(n2+1)θ+Cn22 \begin{align*} h ( \theta ) =& \sum_{i=1}^{n} | x_{(i)} - \theta | \\ =& \sum_{i=1+k}^{n-k} | x_{(i)} - \theta | + C_{k} \\ =& \left| x_{\left( {{n} \over {2}} \right)} - \theta \right| + \left| x_{\left( {{n} \over {2}} + 1 \right)} - \theta \right| + C_{{{n} \over {2}} - 2} \end{align*} 20190521\_115121.png In this case, all θ[x(n2),x(n2+1)]\displaystyle \theta \in \left[ x_{ \left( {{n} \over {2}} \right)} , x_{ \left( {{n} \over {2}} + 1 \right)} \right] make h(θ)h ( \theta ) minimized.

Eventually, whether nn is even or odd, the θ\theta that minimizes h(θ)h ( \theta) is the median of XX.

Mean

Strategy: It can be easily derived through differentiation.


ddθi=1n(xiθ)=i=1n2(xiθ)=0 {{ d } \over { d \theta }} \sum_{i=1}^{n} \left( x_{i} - \theta \right) = \sum_{i=1}^{n} 2 \left( x_{i} - \theta \right) = 0 The θ\theta that satisfies the above equation minimizes h(θ)=i=1nxiθ2\displaystyle h(\theta)=\sum_{i=1}^{n} {|x_i - \theta|}^{2}, thus i=1n2(xiθ)=0    i=1nxi=nθ    θ=1ni=1nxi \displaystyle\sum_{i=1}^{n} 2 \left( x_{i} - \theta \right) = 0 \implies \sum_{i=1}^{n} x_{i} = n \theta \implies \theta = {{ 1 } \over { n }} \sum_{i=1}^{n} x_{i}

See Also