What is Overfitting and Regularization in Machine Learning?
Overfitting
The phenomenon where the training loss decreases, but the test loss (or validation loss) does not decrease or rather increases is called overfitting.
Explanation
There is also a term called underfitting, which basically means the opposite, but frankly, it’s a meaningless term and not often used in practice.
A crucial point in machine learning is that the function trained with the available data must also work well with new data. Therefore, there is a term called generalization performance, referring to the model’s performance on unseen data. If likening to entrance exams, a student who scores perfect on mock exams but performs poorly on the actual college entrance exam can be considered as having overfitted to the mock exam questions. On the other hand, a student who scores well on mock exams and similarly well on the actual exam has good generalization performance.
Regularization
Any method that modifies the algorithm to reduce the test loss (not training loss) is called regularization.1
Goodfellow defines regularization as “any modification we make to a learning algorithm that is intended to reduce its generalization error but not its training error.”
In other words, all methods to prevent overfitting are collectively called regularization. The first encounter in machine learning or deep learning studies is usually dropout.
Types
- $\ell_{1}$ regularization
- $\ell_{2}$ regularization
- Weight decay
- Early stopping
- Dropout
- Batch normalization
- Label smoothing
- Data augmentation
- Flooding
See Also
- Standardization: Usually refers to the process in statistics of adjusting the mean of the data to $0$ and the variance to $1$.
- Normalization: Typically refers to the process of placing data within a specific range.
- Regularization: Usually refers to the process to prevent overfitting in machine learning.
Ian Goodfellow, Yoshua Bengio, and Aaron Courville. (2016) Deep Learning. MIT Press ↩︎