logo

The Overestimation of Accuracy in Data Science 📂Data Science

The Overestimation of Accuracy in Data Science

Definition

Let’s assume a given model that distinguishes positive $P$ from negative $N$ in a classification problem. Let’s denote the count of correctly identified positives as true positive $TP$, the count of correctly identified negatives as true negative $TN$, the count of positives incorrectly identified as negatives as false negative $FN$, and the count of negatives incorrectly identified as positives as false positive $FP$.

The following figure is referred to as the model’s Accuracy. $$ \textrm{Accuracy} := {{ TP + TN } \over { P + N }} $$

Description

The performance of classifiers is universally defined and evaluated by accuracy, which is calculated by the sum of true positives and true negatives out of the total number of samples $(P + N)$. This is the most basic indicator one might think of when measuring the performance of classification problems.

However, the problem with accuracy is that it might overestimate the performance depending on the distribution of the data. For instance, imagine a scenario where there are 1 million samples, but only 100 of them are positive. By randomly guessing negative without looking at the data, the accuracy can reach up to $99.99 \%$. Whether a performance is considered good or bad varies by field and the context of comparison, but generally, a model that makes a mistake once in ten thousand times is not deemed poor. The problem is that we are aware of a serious flaw in this model – if we’re just arbitrarily calling everything negative, there’s hardly any reason to call it a model.

A typical example of imbalanced data is weather forecasting. In Korea, it’s said that you can achieve an accuracy of about $89 \%$ just by predicting no rain all year round1. Although these figures are not manipulated, arguing that ‘weather forecasts are accurate’ based on them might backfire as ‘accuracy is overestimated.’

If we’re doing binary classification, the accuracy is unlikely to fall below $50 \%$ in the first place, because if unfavorable, one could simply reverse the prediction of positivity and negativity to ensure an accuracy above $50 \%$. Accuracy is certainly an excellent indicator that can be easily explained to the public, not just experts, and it can highlight the merits of a model without needing to compare. However, it is not a panacea and one should keep in mind that accuracy is not the only way to measure a model’s performance.

See Also


  1. The shock upon revealing the actual accuracy of weather forecasts… Everyone is aghast “Trusted the meteorological office and lost 100 million” https://www.salgoonews.com/news/articleView.html?idxno=21129 ↩︎