logo

SiLU or Swish Function in Machine Learning 📂Machine Learning

SiLU or Swish Function in Machine Learning

Definition 1 2

The SiLU or Swish function is defined as follows. SiLU(x)=xσ(x) \operatorname{SiLU}(x) = x \cdot \sigma(x) Here, σ\sigma is a particular case of the sigmoid function, specifically the logistic function σ(x)=(1+ex)1\sigma(x) = \left( 1 + e^{-x} \right)^{-1}.

Explanation

alt text

The SiLU resembles ReLU in shape, but unlike ReLU, it is not a monotonic function and is smooth. The logistic function has a problem known as gradient vanishing, which occurs when the derivative remains at 00, and the ReLU function has a problem called dying ReLU, where it becomes stuck at values below 00 and learning does not progress. The SiLU function naturally avoids both of these issues.


  1. Elfwing, S., Uchibe, E., & Doya, K. (2018). Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural networks, 107, 3-11. https://doi.org/10.48550/arXiv.1702.03118 ↩︎

  2. Ramachandran, P., Zoph, B., & Le, Q. V. (2017). Searching for activation functions. arXiv preprint arXiv:1710.05941. https://doi.org/10.48550/arXiv.1710.05941 ↩︎