Pythagorean Winning Percentage
Formula 1
Given a team from a particular sports league, let’s discuss Team Scores $S$ and Team Allows $A$. The expected winning percentage of the season for this team $p$ is as follows. $$ p = {{ S^{2} } \over { S^{2} + A^{2} }} = {{ 1 } \over { 1 + (A/S)^{2} }} $$
Explanation
The Pythagorean Expectation, proposed by Bill James, is a nonlinear model that uses team scores and team allows as independent variables to explain the season’s winning percentage. It’s self-evident that scoring more often results in more wins, and allowing more points results in more losses; however, quantitatively analyzing this is an entirely different matter.
The above diagram shows a scatter plot comparing each team’s scoring ratio and winning rate from ‘82 to ‘20 in a certain baseball league, indicating a curve slightly deviated from an ideal linear relationship. Bill James discovered an intuitive formula to explain this phenomenon, which exceedingly well described the actual data. Later on, Steven Miller justified the mathematical derivation process statistically.
Statistical Derivation
The name Pythagorean winning rate itself suggests the denominator $S^{2} + A^{2}$, reminiscent of the Pythagorean theorem. However, it can actually be generalized for a positive value $\gamma \ne 2$, and indeed, since 1954, Major League Baseball found $\gamma \approx 1.85$ to be the most appropriate. $$ p_{\gamma} = {{ S^{\gamma} } \over { S^{\gamma} + A^{\gamma} }} $$ Besides mathematical generalization, with the right assumptions, it can be adapted to other sports. In the NBA (Basketball), a very large value of $14 < \gamma < 17$ is suggested, and in the NFL (American football), it’s about $\gamma \approx 2.4$.
For a specific derivation, refer to the post summarizing Steven Miller’s paper.2
Code
The following is R code capable of reproducing the diagram in the explanation.
library(ggplot2)
post_url = "https://freshrimpsushi.github.io/posts/2217/"
team_pitch = read.csv(paste0(post_url, "팀투구82_20.csv"), header = TRUE, encoding = 'UTF-8')[,-1]
team_hit = read.csv(paste0(post_url, "팀타격82_20.csv"), header = TRUE, encoding = 'UTF-8')[,-1]
data = data.frame(
팀승률 = team_pitch$승 / team_pitch$선발,
득실점비 = team_hit$득점 / team_pitch$실점
)
ggplot(data, aes(x = 득실점비, y = 팀승률)) +
geom_point(alpha = 0.5, shape = 16) +
theme_bw() + coord_fixed(ratio = 2)
ggsave("득점비율vs팀승률.png", width = 480, height = 480, units = "px", dpi = 120)