인공 신경망이란?

Definition

A network that mimics the nervous system of living organisms is called an artificial neural network.

Mathematical Definition

For a scalar function $\sigma : \mathbb{R} \to \mathbb{R}$ , the notation $\overline{\sigma}$ is defined as follows: $\overline{\sigma}(\mathbf{x}) = \begin{bmatrix} \sigma(x_{1}) \\ \sigma(x_{2}) \\ \vdots \\ \sigma(x_{n}) \end{bmatrix} \qquad \text{where } \mathbf{x} = \begin{bmatrix} x_{1} \\ x_{2} \\ \vdots \\ x_{n} \end{bmatrix}$

In deep learning, a linear vector function $L : \mathbb{R}^{n} \to \mathbb{R}^{m}$ is referred to as a layer. A non-linear scalar function $\sigma : \mathbb{R} \to \mathbb{R}$ is referred to as an activation function.

The repeated composition of layers and activation functions is known as an artificial neural network. $f(\mathbf{x}) = L_{N} \circ \overline{\sigma} \circ L_{N-1} \circ \overline{\sigma} \circ \cdots \circ \overline{\sigma} \circ L_{1}(\mathbf{x})$

Mathematical Description

An artificial neural network can be simply described as a composition of linear and non-linear functions. If $N = 1$ holds, then $f$ is referred to as a (single-layer) perceptron. If $N$ is sufficiently large, then $f$ is called a deep neural network. When it is clear from the context that we are discussing an artificial neural network, it is often simply referred to as a neural network or network. In Korean, it is commonly called a neural network as well.

The first layer $L_{1}$ is called the input layer, and the last layer $L_{N}$ is called the output layer. The intermediate layers $L_{2}, \dots, L_{N-1}$ are called hidden layers. $\mathbf{x}$ is commonly referred to as the input vector, and $f(\mathbf{x})$ as the output vector. In practice, it is common to add a constant vector $\mathbf{b}$ to the layer $L_{i}$ . This is referred to as the bias. $\overline{\sigma}(L_{i} (\mathbf{x}) + \mathbf{b})$ Therefore, the layer can be generalized and redefined as follows:

Let $L_{i} : \mathbb{R}^{n} \to \mathbb{R}^{m}$ be a linear transformation. Let $b \in \mathbb{R}^{m}$ be a fixed vector. In deep learning, an affine transformation $\mathbf{x} \mapsto L_{i}(\mathbf{x}) + \mathbf{b}$ is defined as a layer.

Let $W_{\ell}$ be a matrix representation of $L_{i}$ . The components of $W_{\ell}$ are referred to as weights. Thus, the function value (output) of an artificial neural network is represented by repeated matrix multiplication as follows:

$f(\mathbf{x}) = W_{N} \overline{\sigma}(W_{N-1} \overline{\sigma}(\cdots \overline{\sigma}(W_{1} \mathbf{x}))) \tag{1}$

If there is a bias, it can be represented as follows:

$f(\mathbf{x}) = W_{N} \overline{\sigma}(W_{N-1} \overline{\sigma}(\cdots \overline{\sigma}(W_{1} \mathbf{x} + \mathbf{b}_{1}) + \mathbf{b}_{2}) + + \mathbf{b}_{N-1}) + \mathbf{b}_{N}$

The phrase “it can be represented” implies that the notation is complex and thus not commonly used. Affine transformation can be represented using a slight trick as matrix multiplication. Therefore, if we redefine the notation as $W_{\ell} \equiv \big[ W_{\ell}\ |\ \mathbf{b}_{\ell} \big]$ and $\mathbf{x} \equiv \big[ \mathbf{x} \quad 1 \big]^{\mathsf{T}}$ , it can be simply represented as follows: It has the form $(1)$ .

$f(\mathbf{x}) = W_{N} \overline{\sigma}(W_{N-1} \overline{\sigma}(\cdots \overline{\sigma}(W_{1} \mathbf{x})))$

The set of (components of) all weights and biases is referred to as the parameters of $f$ . Commonly used notations include $\Theta$ , among others.

$\Theta = \theta = \left\{ \text{all elements of$

혹은

$\Theta = \theta = \left\{ W_{\ell}, b_{\ell} : \ell = 1, 2, \dots, N \right\}$

Biological Motivation

The nervous system is composed of connections between neurons. The neuron prototype receives stimuli through dendrites and transmits electrical signals through the axon. Many organisms, including humans, have evolved this simple connection of neurons to adapt to their environment. As a result, the nervous system can perform complex tasks like detecting light, moving limbs, remembering, or imagining.

$20190317\_195625.png$

$20190317\_201846.png$

An artificial neural network refers to a network that mimics neurons by using nodes as neural cell bodies and links as axons. Each node receives information and performs calculations to produce meaningful outputs, similarly to a neuron prototype.

Example

As a simple example, consider the problem of identifying the correlation between data $Y := \begin{bmatrix} 5 \\ 7 \\ 9 \end{bmatrix}$ 와 $X := \begin{bmatrix} 2.2 \\ 3.1 \\ 3.9 \end{bmatrix}$ 에 대해서 $Y$ 와 $X$ .

Since this problem is straightforward, it’s easy to surmise a linear correlation like $Y \approx {\color{red}2} X + \color{blue}{1}$ .

If solving this with simple linear regression $Y \gets X$ , it becomes a problem of finding the least squares solution $( \color{blue} {\beta_{0} } , {\color{red}\beta_{1}} )$ when expressed as a design matrix.

$\begin{bmatrix} 5 \\ 7 \\ 9 \end{bmatrix} = \begin{bmatrix} 1 & 2.2 \\ 1 & 3.1 \\ 1 & 3.9 \end{bmatrix} \begin{bmatrix} \color{blue} {\beta_{0} } \\ {\color{red}\beta_{1}} \end{bmatrix}$

An artificial neural network for this problem can be structured as follows:

$20190317\_205730.png$

We start by assuming a relation like $Y = {\color{red}w} X + \color{blue}{b}$ . Here, $\color{red}{w}$ is called the weight, and $\color{blue}{b}$ is called the bias. The given data $\begin{bmatrix} 2.2 \\ 3.1 \\ 3.9 \end{bmatrix}$ 를 입력 받은 노드 $X$ 는 일단 무작정 랜덤한 $( {\color{red}w_{1}} , \color{blue}{b_{1}} )$ 로 $\begin{bmatrix} {\color{red}w_{1}} 2.2 +\color{blue}{b_{1}} \\ {\color{red}w_{1}} 3.1 +\color{blue}{b_{1}} \\ {\color{red}w_{1}} 3.9 +\color{blue}{b_{1}} \end{bmatrix}$ 를 계산해서 노드 $Y$ is passed over.

$20190317\_205604.png$

If this randomly picked value is not satisfactory, you continue to update weights with better ones until achieving a satisfactory result.

$20190317\_210511.png$

In this sense, an artificial neural network implements the concept of machine learning, wherein a machine learns by itself, and the process becomes more efficient in a more sophisticated manner with deep learning.

Theoretical Aspect

Learners familiar with statistics or mathematics often express strong resistance to these techniques due to the lack of theoretical foundation. Conditions that specify when learning is optimized by minimizing error are not known, and there is often no understanding of why certain functions are used. However, regardless of personal opinions, if a new technique in a new paper improves performance on benchmarks, one cannot ignore it.

There may have been scholars attempting a mathematical approach to these individual elements. However, by the time substantial progress is made, academia and industry have already regarded these approaches as outdated. From a theoretical practitioner’s perspective, it’s not a worthwhile challenge.

Nevertheless, one cannot disregard these techniques because their performance is outstanding despite the uncertainty about their reliability. Deep learning represents an irresistible temptation in data science. Even if it might be a passing trend, its performance is impressively superior, making it worthwhile to learn. Although not as rigorous as mathematics, each field is developing its theoretical basis, so keeping an open mind could be beneficial.