What is Skip Connection in Artificial Neural Networks? 📂Machine Learning

What is Skip Connection in Artificial Neural Networks?

Definition

Let $\mathbf{W}$ be the weight, $\mathbf{x}$ the input, and $\sigma$ the nonlinear activation function. Let’s define layer $L_{\mathbf{w}}$ as follows.

$$ L_{\mathbf{W}}(\mathbf{x}) := \sigma(\mathbf{W} \mathbf{x}) $$

A function in the form that adds an identity function to the layer like the following is called a skip connection.

$$ L_{\mathbf{W}} + I : \mathbf{x} \mapsto \sigma(\mathbf{W} \mathbf{x}) + \mathbf{x} $$

Explanation

Typically, the input $\mathbf{x}$ and the weight $\mathbf{W}$ are multiplied, and this result $\sigma(\mathbf{W} \mathbf{x})$ is put into the activation function $\sigma$, which then becomes the input to the next layer. Skip connection delivers to the next layer the value obtained by adding the input back to this value. Simply put, it can be expressed as follows.

$$ \begin{align*} &\text{normal layer: } \\ &\qquad L_{\mathbf{W}_{2}} \circ L_{\mathbf{W}_{1}}(\mathbf{x}) = \sigma(\mathbf{W}_{2}\sigma(\mathbf{W}_{1} \mathbf{x})) \\ \\ &\text{skip connection: } \\ &\qquad (L_{\mathbf{W}_{2}} + I) \circ (L_{\mathbf{W}_{1}} + I)(\mathbf{x}) = \sigma(\mathbf{W}_{2}(\sigma(\mathbf{W}_{1} \mathbf{x}) + \mathbf{x})) + (\sigma(\mathbf{W}_{1} \mathbf{x}) + \mathbf{x}) \end{align*} $$

Strictly speaking, skip connection is not a concept strictly defined according to the above definition. It is called a skip connection as long as the input (or previous value) is transmitted to the next layer in any way. U-Net is a neural network that uses a form of skip connection in autoencoders, stacking the encoder’s output onto the decoder’s input.

Generally, stacking too many layers decreases the performance of neural networks, but ResNet has been designed to train well even with very deeply stacked layers by appropriately utilizing skip connections.