Fully Connected Layer (Linear Layer, Dense Layer) 📂Machine Learning

Fully Connected Layer (Linear Layer, Dense Layer)

Definition

Let’s refer to $L: \mathbb{R}^{n} \to \mathbb{R}^{m}$ as a layer. Consider $\mathbf{W}$ as the matrix representation of $L$ . When $\mathbf{W}$ consists only of components that are not $0$ , $L$ is called a fully connected layer.

Explanation

A fully connected layer is the most fundamental layer in an artificial neural network. In deep learning, most layers are linear; however, the term linear layer often refers to a fully connected layer. The term dense layer is also commonly used.

It is called fully connected because each component of the function’s value depends on all variables and is often represented by diagrams like the following:

Specifically, a fully connected layer is a function described as follows:

$\begin{align*} L : \mathbb{R}^{n} &\to \mathbb{R}^{m} \\ \mathbf{x} &\mapsto \mathbf{W}\mathbf{x}\\ \begin{bmatrix} x_{1} \\ x_{2} \\ \vdots \\ x_{n} \end{bmatrix} &\mapsto \begin{bmatrix} w_{11} & w_{12} & \cdots & w_{1n} \\ w_{21} & w_{22} & \cdots & w_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ w_{m1} & w_{m2} & \cdots & w_{mn} \end{bmatrix} \begin{bmatrix} x_{1} \\ x_{2} \\ \vdots \\ x_{n} \end{bmatrix} \end{align*}$

In practice, when constructing a neural network, a bias $\mathbf{b} \in \mathbb{R}^{m}$ is added, transforming it into an affine transformation as follows:

$L : \mathbf{x} \mapsto \mathbf{W}\mathbf{x} + \mathbf{b}$

Of course, even in this case, if we let $\overline{\mathbf{x}} = \begin{bmatrix}\mathbf{x} \\ 1 \end{bmatrix}$ and $\overline{\mathbf{W}} = \left[ \begin{array}{c|c}\mathbf{W} & \mathbf{b} \end{array} \right]$ , we can still recognize it as a linear transformation as follows:

$\overline{\mathbf{x}} \mapsto \overline{\mathbf{W}}\overline{\mathbf{x}} = \begin{bmatrix} \mathbf{W}\mathbf{x} + \mathbf{b} \end{bmatrix}$

The essence of machine learning and deep learning is to accurately approximate the function we desire using the structure known as an artificial neural network. Early deep learning utilized functions composed of fully connected layers and activation functions several times (i.e., deeply), as approximators. This is known as a (multi-layer) perceptron. In contemporary practice, convolutional layers are predominantly used for image-related problems, but linear layers are still employed in the final layer for classification tasks.