Degrees of Freedom in Statistics 📂Data Science

Degrees of Freedom in Statistics

Terminology

When computing a statistic, the number of independent data values that can vary is called degree of freedom¹.

Explanation

Why It’s Hard to Explain Degrees of Freedom

When you start studying statistics as a freshman, you’re likely to find the concept of ‘degrees of freedom’ quite frustrating. It’s not only difficult and frequently used but almost impossible to find a textbook that provides a clear definition. This post, too, doesn’t precisely define degrees of freedom and only vaguely introduces it as a ’term,’ using expressions that aren’t mathematically rigorous, like ‘when computing’ or ‘values can vary.’

The issue is that it’s understandable why this is the case. It’s not because people are lazy; the concept of degrees of freedom is more about being ‘absorbed’ through experience rather than being ‘understood’ by study. By the time you reach your 2nd or 3rd year, you start to get a sense of what degrees of freedom mean, and by the time you enter graduate school, you can usually explain it, though reciting a definition is still challenging.

Firstly, the very expression of ‘degrees of freedom’ evokes a ‘positive emotion,’ which is a problem. Whether in fashion, open-world games, or democracy, higher degrees of freedom are generally considered better. Moreover, when you first encounter degrees of freedom as a freshman, it is often calculated as ’the sample size is $n$, so subtract $1$, yielding $(n-1)$ degrees of freedom.’ Without deeper reflection, one could easily assume that, just like having more samples is generally deemed better, the degrees of freedom in statistics might be perceived as a ’number carrying positive or negative connotations.’ However, in the context of dealing with them precisely as mathematical expressions, degrees of freedom are just a number.

Furthermore, the fact that degrees of freedom appear frequently and rather unexpectedly in many contexts is another issue. When studying analysis of variance or regression analysis, one suddenly encounters ‘degrees of freedom’ like $n-1$ or $n-p-1$, which seem inadequately explained in terms of how they are calculated. Then, when you learn mathematical statistics, parameters of t-distributions, chi-square distributions, and even two degrees of freedom for the F-distribution are introduced. At this stage, which is typically around the 2nd or 3rd year, you might have a vague understanding and feel awkward asking about degrees of freedom, not because you know nothing about it, but because questioning it now seems embarrassing, leading you to gloss over it.

Even if you understand the need for these numbers, calling them ‘degrees of freedom’ can seem almost meaningless. To explore why we need this term, let’s consider the following example.

Extreme Example: What if the Concept of Degrees of Freedom Didn’t Exist?

A good way to explain some seemingly useless concept is to show what ‘violations’ would be allowed if that concept did not exist. Let’s put aside the mathematical explanations about statistics and enjoy an interesting thought experiment. Suppose we have the following sample $A$.

$$ A = \left\{ 13, 7, 17, 3 \right\} $$ In this case, the sample size is $n = 4$. Then, let’s look at the sample $B$ brought by a junior claiming they have ‘improved’ the sample.

$$ B = \left\{ 13, 7, 17, 3 , 14, 8, 18, 4 \right\} $$ The junior states this sample contains $8$, which is twice as much as $A$. Moreover, they claim they can increase the sample size as much as they want and apply all statistical techniques available for samples, as this can be repeated up to $n \to \infty$ times. However, it’s clear at a glance that this sample is crudely forged, simply by adding $1$ to existing data to increase the sample size.

At this point, we must realize that, unbeknownst to us, we were focusing on the essence of $A$ rather than being deceived by the numbers in $B$ presented by the junior. We understand that the data generated by the junior is merely a counterfeit of $A$ $$ B = B(A) = A \cup (A+1) $$ The volume of samples is not simply about numbers. It’s reasonable to count it as a number of genuinely uncontrollable, naturally—put simply, ‘free’—samples, calling it degrees of freedom where no ‘violation’ is allowed.

Frequently Seen Example: $s^{2}$

Now, let’s consider the sample variance $s^{2}$ that almost every literature explaining degrees of freedom refers to. The sample variance is calculated as follows when the sample mean $\overline{x}$ is given.

$$ s^{2} = {{ 1 } \over { n-1 }} \sum_{k=1}^{n} \left( x_{k} - \overline{x} \right)^{2} $$ What matters here is that the constant $\overline{x} = \sum_{k} x_{k} / n$ is already given. Whatever $x_{k_{0}}$ you choose, that $x_{k_{0}}$ can be back-calculated as a function form dependent on other data.

$$ x_{k_{0}} = x_{k_{0}} \left( \left\{ x_{k} : k \ne k_{0} \right\} \right) = n \overline{x} - \sum_{k \ne k_{0}} x_{k} $$ This is similar to the appearance of the junior’s data in the form $B = B(A)$ in the previous paragraph. In this case, the number of samples needed to calculate the genuine $s^{2}$ isn’t $n$ but $(n-1)$. Once $x_{k_{0}}$ is fixed, the $(n-1)$ samples $\left\{ x_{k} : k \ne k_{0} \right\}$ derived under the constraint that the value of $\overline{x}$ is maintained can alter the computed value of $s^{2}$. Therefore, $(n-1)$ is referred to as the degree of freedom of $s^{2}$.