[0]: Minimizing the l0-norm is equivalent to the mode.
[1]: Minimizing the l1-norm is equivalent to the median.
[2]: Minimizing the l2-norm is equivalent to the mean.
These theorems provide mathematical justifications as to why specific values are considered representative. Especially, the case [2] implies that the mean is the representative value that minimizes variance, which could answer the question of ‘why is variance defined this way’.
Proof
Mode
Strategy: The l0-norm counts not the degree of non-equality but the number of inequalities.
∣xi−θ∣0:={10,θ=xi,θ=xi
Therefore, the θ that minimizes h(θ)=i=1∑n∣xi−θ∣0=1+0+1+⋯1+1 is mode(X)
■
Median
Strategy: Start by simplifying according to the definition of absolute values. Pair the largest and smallest terms in the data to eliminate variables and reduce them to constant terms. This makes it easy to find the variable term that needs to be minimized last.
Let’s denote it as x(1)≤x(2)≤⋯≤x(n).
Part 1. θ∈[x(1),x(n)]
Assuming θ<x(1), all x(i) are smaller than θ, so
h(θ)=i=1∑n(x(i)−θ)>i=1∑n(x(i)−x(1))
Assuming x(n)<θ, all x(i) are larger than θ, so
h(θ)=i=1∑n(θ−x(i))>i=1∑n(x(n)−x(i))
Therefore, regardless of what θ specifically is, it must initially be θ∈[x(1),x(n)].
Part 2.
For θ0∈[x(1),x(n)]h(θ0)===i=1∑n∣x(i)−θ0∣i=2∑n−1∣x(i)−θ0∣+(θ0−x(1))+(x(n)−θ0)i=2∑n−1∣x(i)−θ0∣+(x(n)−x(1))
For θ1∈[x(2),x(n−1)]⊂[x(1),x(n)]h(θ1)===i=1∑n∣x(i)−θ1∣i=2∑n−1∣x(i)−θ1∣+(x(n)−x(1))i=3∑n−2∣x(i)−θ1∣+(x(n−1)−x(2))+(x(n)−x(1))
This way, whenever a suitable θk∈[x(1+k),x(n−k)] is chosen, (x(n−k)−x(1+k)) can be brought outside the sigma notation. Since these terms are determined by the dataX, they are constant terms. For convenience, let’s express their sum as follows.
Ck:=j=0∑k(x(n−j)−x(j+1))
Part 3.
Case 3-1. n is odd
According to Part 2.
h(θ)===i=1∑n∣x(i)−θ∣i=1+k∑n−k∣x(i)−θ∣+Ckx(2n+1)−θ+C2n−1−1
Therefore, the value that minimizes h(θ) is θ=x(2n+1).
Case 3-2. n is even
According to Part 2.
h(θ)===i=1∑n∣x(i)−θ∣i=1+k∑n−k∣x(i)−θ∣+Ckx(2n)−θ+x(2n+1)−θ+C2n−2
In this case, all θ∈[x(2n),x(2n+1)] make h(θ) minimized.
Eventually, whether n is even or odd, the θ that minimizes h(θ) is the median of X.
■
Mean
Strategy: It can be easily derived through differentiation.
dθdi=1∑n(xi−θ)=i=1∑n2(xi−θ)=0
The θ that satisfies the above equation minimizes h(θ)=i=1∑n∣xi−θ∣2, thus
i=1∑n2(xi−θ)=0⟹i=1∑nxi=nθ⟹θ=n1i=1∑nxi