What is an Artificial Neural Network?
Definition
An artificial neural network (ANN) is a network mimicking the nervous system of actual organisms.
Mathematical Definition
- In deep learning, a vector function $W : \mathbb{R}^{n} \to \mathbb{R}^{m}$ is referred to as a layer.
- In deep learning, a nonlinear scalar function $\sigma : \mathbb{R} \to \mathbb{R}$ is referred to as an activation function.
- The composition $\sigma \circ W$ of layers and activation functions is called an artificial neural network.
Motivation
The nervous system is composed of neurons. A nerve cell body receives stimuli through dendrites and transmits electrical stimuli through axons. Many organisms, including humans, have evolved such simple neuronal connections to be suitable for their environments. As a result, the nervous system is capable of complex tasks such as detecting light, moving legs, remembering, or imagining.
The artificial neural network refers to a network that mimics neurons, with the nerve cell body as a node and the axon as a link. Each node, like a nerve cell body, performs calculations that can yield meaningful results from receiving and transmitting information.
Example
As a simple example, consider the problem of understanding the correlation between data $Y := \begin{bmatrix} 5 \\ 7 \\ 9 \end{bmatrix}$ and $X := \begin{bmatrix} 2.2 \\ 3.1 \\ 3.9 \end{bmatrix}$ regarding $Y$ and $X$.
Since this problem is quite easy, one can guess without difficulty that there is a linear correlation like $Y \approx {\color{red}2} X + \color{blue}{1}$.
If we solve this problem through simple regression analysis $Y \gets X$, it becomes a problem of finding the least square solution $( \color{blue} {\beta_{0} } , {\color{red}\beta_{1}} )$ when represented by a design matrix. $$ \begin{bmatrix} 5 \\ 7 \\ 9 \end{bmatrix} = \begin{bmatrix} 1 & 2.2 \\ 1 & 3.1 \\ 1 & 3.9 \end{bmatrix} \begin{bmatrix} \color{blue} {\beta_{0} } \\ {\color{red}\beta_{1}} \end{bmatrix} $$
On the other hand, the artificial neural network for this problem can be configured as follows:
First, let’s assume a relationship like $Y = {\color{red}w} X + \color{blue}{b}$. In this case, $\color{red}{w}$ is called the Weight, and $\color{blue}{b}$ is called the Bias. The node receiving the given data $\begin{bmatrix} 2.2 \\ 3.1 \\ 3.9 \end{bmatrix}$ first randomly calculates $\begin{bmatrix} {\color{red}w_{1}} 2.2 +\color{blue}{b_{1}} \\ {\color{red}w_{1}} 3.1 +\color{blue}{b_{1}} \\ {\color{red}w_{1}} 3.9 +\color{blue}{b_{1}} \end{bmatrix}$ using $( {\color{red}w_{1}} , \color{blue}{b_{1}} )$ and passes it to node $Y$.
If these roughly guessed values are unsatisfactory, one continues to update with better weights until satisfactory results are obtained.
In this sense, artificial neural networks can be seen as implementing the concept of Machine Learning, where machines learn on their own, and this process has evolved into Deep Learning, which is more complex yet efficient.
Theoretical Aspect
Those familiar with statistics or mathematics often express strong aversion to these techniques due to the lack of theoretical foundations. There is no known condition for minimizing errors or optimizing learning, and often the reasons why certain functions are used are unknown. But if a new paper’s technique improves performance in benchmarks, there is nothing to argue against it.
While there might have been scholars who attempted a mathematical approach to these things, the sad reality is that by the time some progress is made in research, the industry has already moved on, rendering these efforts outdated. From the perspective of someone studying theory, it seems hardly worthwhile.
Nevertheless, these techniques cannot be underestimated because the performance is too good to mistrust the results. Deep learning is an irresistible temptation in data science. Even if the trend passes quickly, the performance is overwhelmingly beneficial to learn, and while it may not be as rigorous as mathematics, the field is laying its own theoretical foundations so it’s not bad to keep an open mind.