Paper Review: Do We Need Zero Training Loss After Achieving Zero Training Error? 📂Machine Learning

Paper Review: Do We Need Zero Training Loss After Achieving Zero Training Error?

Paper Review

Flooding refers to a regularization technique introduced in Do We Need Zero Training Loss After Achieving Zero Training Error?, presented at ICML 2020. According to the authors of the paper, the root cause of overfitting is the excessively low training loss as illustrated below.

Thus, the core idea of the paper is that controlling the training loss not to fall below a certain value during the learning process, as illustrated below, could reduce the test loss.

This act of setting a minimum value for the training loss is termed Flooding. The authors claim that the flooding technique can reduce test loss while being remarkably simple. If the original loss is referred to as $L$ , then the loss with flooding applied, $\tilde{L}$ , is as follows.

$\tilde{L}(\boldsymbol{\theta}) = \left| L(\boldsymbol{\theta})-b \right| +b,\quad b>0$

$\boldsymbol{\theta}$ represents the model’s parameters. The term $b$ is referred to as the Flooding Level. The gradient of $\tilde{J}$ is in the same direction as $J(\boldsymbol{\theta})$ when $J(\boldsymbol{\theta})>b$ and in the opposite direction of $J(\theta)$ when $J(\boldsymbol{\theta})<b$ . Below is actual training data.

$W$ denotes Weight decay, $E$ represents Early stopping, and $F$ stands for Flooding, with checks marking their application. The red highlight signifies the best performance. The results on the right half, where flooding was applied, are notably more highlighted. Below is a graph testing with CIFAR-10 while adjusting the flooding level.

The red line represents the test loss with flooding, and the orange line represents the test loss without flooding. The orange line diverges as the training loss decreases, but the red line does not, suggesting that overfitting did not occur.