Pythagorean Winning Percentage Derivation
📂SabermetricsPythagorean Winning Percentage Derivation
Let’s assume we have a team from a certain sports league. The Team Scores S and Team Allows A are random variables that each follow a Weibull distribution,
SA∼Weibull(αS,β,γ)∼Weibull(αA,β,γ)
and are also independent of each other independently. The team’s expected winning percentage p is given with respect to γ>0 as follows.
pγ=μSγ+μAγμSγ
Here, μS:=E(S) and μA:=E(A) represent the expected score and expected allows, respectively.
Derivation
Strategy: This is a statistical derivation of the Pythagorean winning percentage. It’s straightforwardly deduced through the joint probability density function. The function Γ:R→R represents the gamma function.
Mean and variance of the Weibull distribution: A probability distribution named Three-parameter Weibull Distribution has the probability density function as follows, with the scale parameter α>0, location parameter β>0, and shape parameter γ>0.
f(x)=αγ(αx−β)γ−1e−((x−β)/α)γ,x≥β
When X∼Weibull(α,β,γ), its mean and variance are as follows.
E(X)=Var(X)=αΓ(1+γ1)+βα2[Γ(1+γ2)−(Γ(1+γ1)2)]
μS=μA=E(S)=αSΓ(1+γ−1)+βE(A)=αAΓ(1+γ−1)+β
If we denote the population means of S and A as μS and μA respectively, then the first parameters of the Weibull distribution αS, αA
αS=αA=Γ(1+γ−1)μS−βΓ(1+γ−1)μA−β
are represented as above, and for the sake of simplification in the derivation, let’s define α as follows.
αγ1=αSγ1+αAγ1=αSγαAγαSγ+αAγ
Now, it’s time to calculate the expected winning percentage. In most sports, a win is defined as the event where the score S is greater than the allows A, hence the expected winning percentage is essentially P(S>A). If the probability density functions of S and A are fS and fA respectively, following the assumption that S and A are independent, their joint probability density function is fSfA.
==============P(S>A)∫β∞∫βxfS(x)fA(y)dydx∫β∞∫βxαSγ(αSx−β)γ−1e−((x−β)/αS)γαAγ(αAy−β)γ−1e−((y−β)/αA)γdydx∫0∞∫0xαSγ(αSx)γ−1e−(x/αS)γαAγ(αAy)γ−1e−(y/αA)γdydx∫0∞αSγ(αSx)γ−1e−(x/αS)γ[∫0xαAγ(αAy)γ−1e−(y/αA)γdy]dx∫0∞αSγ(αSx)γ−1e−(x/αS)γ[1−e−(x/αA)γ]dx1+∫0∞αSγ(αSx)γ−1e−(x/αS)γ[−e−(x/αA)γ]dx1−∫0∞αSγ(αSx)γ−1exp(−xγ(αSγ1+αAγ1))dx1−∫0∞αSγ(αSx)γ−1exp(−(αx)γ)dx1−αSγαγ∫0∞αγ(αx)γ−1e−(x/α)γdx1−αSγαγ⋅11−αSγ1αSγ+αAγαSγαAγ1−αSγ+αAγαAγαSγ+αAγαSγ(μS−β)γ+(μA−β)γ(μS−β)γ
Here, β represents the minimum value between allows and scores, so it’s reasonable to set β=0, leading to the following result.
P(S>A)=μSγ+μAγμSγ
■