Curse of Dimensionality 📂Data Science

Curse of Dimensionality

Definition ¹

In Euclidean space $\mathbb{R}^{d}$, the exponential increase in the amount of memory or computation required to solve a problem as the dimension $d$ increases is referred to as the curse of dimensionality.

Explanation

In many problems, increasing dimensionality means an increase in the number of variables considered. For instance, if the purity $c$ of a chemical substance depends on a model $f$ and oxygen concentration $x$ as in $c = f(x)$, it is relatively easy to determine where $c$ is maximized by varying $x$ over $n = 10$ values. However, if the carbon content $y$ is additionally considered, $c = f(x, y)$ requires calculations over a plane $n^{2} = 100$ times, and accounting for temperature $z$ entails $n^{3} = 1000$ calculations. As such, when the amount of computation required increases drastically with rising dimensions and cannot be reduced without affecting $n$ computational accuracy, we say we are ‘suffering from the curse of dimensionality.’

In fields like machine learning, the curse of dimensionality also refers to the phenomenon where handling high-dimensional space leads to a scarcity of data suitable for learning. While this may initially appear unrelated to the previous definition, it’s essentially the same issue: if there had been adequate data for ‘proper learning’ with increased dimensions, the volume would have been substantial.

Methods to overcome the curse of dimensionality vary greatly depending on the problem and data, but commonly dimension reduction techniques such as Principal Component Analysis are among the first to be considered.

Breaking the Curse of Dimensionality, Or How to Use SVD in Many Dimensions I. V. Oseledets and E. E. Tyrtyshnikov SIAM Journal on Scientific Computing 2009 31:5, 3744-3759 https://doi.org/10.1137/090748330 ↩︎

Curse of Dimensionality

Definition 1

Explanation

Definition ¹