logo

인공 신경망이란? 📂Machine Learning

인공 신경망이란?

Definition

A network that mimics the nervous system of living organisms is called an artificial neural network.

Mathematical Definition

  • For a scalar function σ:RR\sigma : \mathbb{R} \to \mathbb{R}, the notation σ\overline{\sigma} is defined as follows: σ(x)=[σ(x1)σ(x2)σ(xn)]where x=[x1x2xn] \overline{\sigma}(\mathbf{x}) = \begin{bmatrix} \sigma(x_{1}) \\ \sigma(x_{2}) \\ \vdots \\ \sigma(x_{n}) \end{bmatrix} \qquad \text{where } \mathbf{x} = \begin{bmatrix} x_{1} \\ x_{2} \\ \vdots \\ x_{n} \end{bmatrix}

In deep learning, a linear vector function L:RnRmL : \mathbb{R}^{n} \to \mathbb{R}^{m} is referred to as a layer. A non-linear scalar function σ:RR\sigma : \mathbb{R} \to \mathbb{R} is referred to as an activation function.

The repeated composition of layers and activation functions is known as an artificial neural network. f(x)=LNσLN1σσL1(x) f(\mathbf{x}) = L_{N} \circ \overline{\sigma} \circ L_{N-1} \circ \overline{\sigma} \circ \cdots \circ \overline{\sigma} \circ L_{1}(\mathbf{x})

Mathematical Description

An artificial neural network can be simply described as a composition of linear and non-linear functions. If N=1N = 1 holds, then ff is referred to as a (single-layer) perceptron. If NN is sufficiently large, then ff is called a deep neural network. When it is clear from the context that we are discussing an artificial neural network, it is often simply referred to as a neural network or network. In Korean, it is commonly called a neural network as well.

The first layer L1L_{1} is called the input layer, and the last layer LNL_{N} is called the output layer. The intermediate layers L2,,LN1L_{2}, \dots, L_{N-1} are called hidden layers. x\mathbf{x} is commonly referred to as the input vector, and f(x)f(\mathbf{x}) as the output vector. In practice, it is common to add a constant vector b\mathbf{b} to the layer LiL_{i}. This is referred to as the bias. σ(Li(x)+b) \overline{\sigma}(L_{i} (\mathbf{x}) + \mathbf{b}) Therefore, the layer can be generalized and redefined as follows:

Let Li:RnRmL_{i} : \mathbb{R}^{n} \to \mathbb{R}^{m} be a linear transformation. Let bRmb \in \mathbb{R}^{m} be a fixed vector. In deep learning, an affine transformation xLi(x)+b\mathbf{x} \mapsto L_{i}(\mathbf{x}) + \mathbf{b} is defined as a layer.

Let WW_{\ell} be a matrix representation of LiL_{i}. The components of WW_{\ell} are referred to as weights. Thus, the function value (output) of an artificial neural network is represented by repeated matrix multiplication as follows:

f(x)=WNσ(WN1σ(σ(W1x)))(1) f(\mathbf{x}) = W_{N} \overline{\sigma}(W_{N-1} \overline{\sigma}(\cdots \overline{\sigma}(W_{1} \mathbf{x}))) \tag{1}

If there is a bias, it can be represented as follows:

f(x)=WNσ(WN1σ(σ(W1x+b1)+b2)++bN1)+bN f(\mathbf{x}) = W_{N} \overline{\sigma}(W_{N-1} \overline{\sigma}(\cdots \overline{\sigma}(W_{1} \mathbf{x} + \mathbf{b}_{1}) + \mathbf{b}_{2}) + + \mathbf{b}_{N-1}) + \mathbf{b}_{N}

The phrase “it can be represented” implies that the notation is complex and thus not commonly used. Affine transformation can be represented using a slight trick as matrix multiplication. Therefore, if we redefine the notation as W[W  b]W_{\ell} \equiv \big[ W_{\ell}\ |\ \mathbf{b}_{\ell} \big] and x[x1]T\mathbf{x} \equiv \big[ \mathbf{x} \quad 1 \big]^{\mathsf{T}}, it can be simply represented as follows: It has the form (1)(1).

f(x)=WNσ(WN1σ(σ(W1x))) f(\mathbf{x}) = W_{N} \overline{\sigma}(W_{N-1} \overline{\sigma}(\cdots \overline{\sigma}(W_{1} \mathbf{x})))

The set of (components of) all weights and biases is referred to as the parameters of ff. Commonly used notations include Θ\Theta, among others.

Θ=θ={all elements of W and b for =1,2,,N} \Theta = \theta = \left\{ \text{all elements of WW_{\ell} and bb_{\ell} for } \ell = 1, 2, \dots, N \right\}

혹은

Θ=θ={W,b:=1,2,,N} \Theta = \theta = \left\{ W_{\ell}, b_{\ell} : \ell = 1, 2, \dots, N \right\}

Biological Motivation

The nervous system is composed of connections between neurons. The neuron prototype receives stimuli through dendrites and transmits electrical signals through the axon. Many organisms, including humans, have evolved this simple connection of neurons to adapt to their environment. As a result, the nervous system can perform complex tasks like detecting light, moving limbs, remembering, or imagining.

20190317\_195625.png

20190317\_201846.png

An artificial neural network refers to a network that mimics neurons by using nodes as neural cell bodies and links as axons. Each node receives information and performs calculations to produce meaningful outputs, similarly to a neuron prototype.

Example

As a simple example, consider the problem of identifying the correlation between data Y:=[579]Y := \begin{bmatrix} 5 \\ 7 \\ 9 \end{bmatrix}X:=[2.23.13.9]X := \begin{bmatrix} 2.2 \\ 3.1 \\ 3.9 \end{bmatrix} 에 대해서 YYXX.

Since this problem is straightforward, it’s easy to surmise a linear correlation like Y2X+1Y \approx {\color{red}2} X + \color{blue}{1}.

If solving this with simple linear regression YXY \gets X, it becomes a problem of finding the least squares solution (β0,β1)( \color{blue} {\beta_{0} } , {\color{red}\beta_{1}} ) when expressed as a design matrix.

[579]=[12.213.113.9][β0β1] \begin{bmatrix} 5 \\ 7 \\ 9 \end{bmatrix} = \begin{bmatrix} 1 & 2.2 \\ 1 & 3.1 \\ 1 & 3.9 \end{bmatrix} \begin{bmatrix} \color{blue} {\beta_{0} } \\ {\color{red}\beta_{1}} \end{bmatrix}

An artificial neural network for this problem can be structured as follows:

20190317\_205730.png

We start by assuming a relation like Y=wX+bY = {\color{red}w} X + \color{blue}{b}. Here, w\color{red}{w} is called the weight, and b\color{blue}{b} is called the bias. The given data [2.23.13.9]\begin{bmatrix} 2.2 \\ 3.1 \\ 3.9 \end{bmatrix} 를 입력 받은 노드 XX 는 일단 무작정 랜덤한 (w1,b1)( {\color{red}w_{1}} , \color{blue}{b_{1}} )[w12.2+b1w13.1+b1w13.9+b1]\begin{bmatrix} {\color{red}w_{1}} 2.2 +\color{blue}{b_{1}} \\ {\color{red}w_{1}} 3.1 +\color{blue}{b_{1}} \\ {\color{red}w_{1}} 3.9 +\color{blue}{b_{1}} \end{bmatrix} 를 계산해서 노드 YY is passed over.

20190317\_205604.png

If this randomly picked value is not satisfactory, you continue to update weights with better ones until achieving a satisfactory result.

20190317\_210511.png

In this sense, an artificial neural network implements the concept of machine learning, wherein a machine learns by itself, and the process becomes more efficient in a more sophisticated manner with deep learning.

Theoretical Aspect

Learners familiar with statistics or mathematics often express strong resistance to these techniques due to the lack of theoretical foundation. Conditions that specify when learning is optimized by minimizing error are not known, and there is often no understanding of why certain functions are used. However, regardless of personal opinions, if a new technique in a new paper improves performance on benchmarks, one cannot ignore it.

There may have been scholars attempting a mathematical approach to these individual elements. However, by the time substantial progress is made, academia and industry have already regarded these approaches as outdated. From a theoretical practitioner’s perspective, it’s not a worthwhile challenge.

Nevertheless, one cannot disregard these techniques because their performance is outstanding despite the uncertainty about their reliability. Deep learning represents an irresistible temptation in data science. Even if it might be a passing trend, its performance is impressively superior, making it worthwhile to learn. Although not as rigorous as mathematics, each field is developing its theoretical basis, so keeping an open mind could be beneficial.

See Also