SiLU or Swish Function in Machine Learning
Definition 1 2
The SiLU or Swish function is defined as follows. Here, is a particular case of the sigmoid function, specifically the logistic function .
Explanation
The SiLU resembles ReLU in shape, but unlike ReLU, it is not a monotonic function and is smooth. The logistic function has a problem known as gradient vanishing, which occurs when the derivative remains at , and the ReLU function has a problem called dying ReLU, where it becomes stuck at values below and learning does not progress. The SiLU function naturally avoids both of these issues.
Elfwing, S., Uchibe, E., & Doya, K. (2018). Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural networks, 107, 3-11. https://doi.org/10.48550/arXiv.1702.03118 ↩︎
Ramachandran, P., Zoph, B., & Le, Q. V. (2017). Searching for activation functions. arXiv preprint arXiv:1710.05941. https://doi.org/10.48550/arXiv.1710.05941 ↩︎