logo

Curse of Dimensionality 📂Data Science

Curse of Dimensionality

Definition 1

In Euclidean space Rd\mathbb{R}^{d}, the exponential increase in the amount of memory or computation required to solve a problem as the dimension dd increases is referred to as the curse of dimensionality.

Explanation

In many problems, increasing dimensionality means an increase in the number of variables considered. For instance, if the purity cc of a chemical substance depends on a model ff and oxygen concentration xx as in c=f(x)c = f(x), it is relatively easy to determine where cc is maximized by varying xx over n=10n = 10 values. However, if the carbon content yy is additionally considered, c=f(x,y)c = f(x, y) requires calculations over a plane n2=100n^{2} = 100 times, and accounting for temperature zz entails n3=1000n^{3} = 1000 calculations. As such, when the amount of computation required increases drastically with rising dimensions and cannot be reduced without affecting nn computational accuracy, we say we are ‘suffering from the curse of dimensionality.’

In fields like machine learning, the curse of dimensionality also refers to the phenomenon where handling high-dimensional space leads to a scarcity of data suitable for learning. While this may initially appear unrelated to the previous definition, it’s essentially the same issue: if there had been adequate data for ‘proper learning’ with increased dimensions, the volume would have been substantial.

Methods to overcome the curse of dimensionality vary greatly depending on the problem and data, but commonly dimension reduction techniques such as Principal Component Analysis are among the first to be considered.


  1. Breaking the Curse of Dimensionality, Or How to Use SVD in Many Dimensions I. V. Oseledets and E. E. Tyrtyshnikov SIAM Journal on Scientific Computing 2009 31:5, 3744-3759 https://doi.org/10.1137/090748330 ↩︎