SiLU or Swish Function in Machine Learning 📂Machine Learning

SiLU or Swish Function in Machine Learning

Definition ¹ ²

The SiLU or Swish function is defined as follows. $\operatorname{SiLU}(x) = x \cdot \sigma(x)$ Here, $\sigma$ is a particular case of the sigmoid function, specifically the logistic function $\sigma(x) = \left( 1 + e^{-x} \right)^{-1}$ .

Explanation

alt text

The SiLU resembles ReLU in shape, but unlike ReLU, it is not a monotonic function and is smooth. The logistic function has a problem known as gradient vanishing, which occurs when the derivative remains at $0$ , and the ReLU function has a problem called dying ReLU, where it becomes stuck at values below $0$ and learning does not progress. The SiLU function naturally avoids both of these issues.

Elfwing, S., Uchibe, E., & Doya, K. (2018). Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural networks, 107, 3-11. https://doi.org/10.48550/arXiv.1702.03118 ↩︎
Ramachandran, P., Zoph, B., & Le, Q. V. (2017). Searching for activation functions. arXiv preprint arXiv:1710.05941. https://doi.org/10.48550/arXiv.1710.05941 ↩︎

SiLU or Swish Function in Machine Learning

Definition 1 2

Explanation

Definition ¹ ²