Perceptron Definition
Definition
A perceptron is defined as the composition of a linear function $f(x) = wx + b$ and a unit step function $H$.
$$ \text{Perceptron} := H \circ f (x) = H(wx + b) $$
In the case of a multivariable function, $f(\mathbf{x}) = \mathbf{w}\cdot \mathbf{x} + b = w_{1}x_{1} + \cdots w_{n}x_{n} + b$ and,
$$ \text{Perceptron} := H \circ f (\mathbf{x}) = H(\mathbf{w} \cdot \mathbf{x} + b) $$
This is also referred to as a single layer perceptron.
Description
The perceptron was devised by Rosenblatt in 1957 and is the first model of supervised learning. In machine learning, it is explained in complicated terms such as weights, biases, and activation functions, but it is simply a composition of linear functions and step functions. The weight represents the coefficient of the linear term, and the bias represents the constant term.
The unit step function can be defined as it pleases for convenience.
$$ H(x) = \begin{cases} 1 & x \gt 0 \\ 0 & x \lt 0\end{cases} \quad \text{or} \quad H(x) = \begin{cases} 1 & x \gt 0 \\ -1 & x \lt 0\end{cases} $$
Furthermore, it is not necessary that the unit step function must occupy $H$; other functions that serve as activation functions are also applicable. Hence, generically speaking, a perceptron is a composition of a linear function and an activation function.
Based on whether the value is greater or smaller than $0$, it is classified accordingly. Thus, by reformulating the equation, the following result can be obtained.
$$ \begin{align*} && w_{1}x_{1} +w_{2}x_{2}+b &= 0 \\ \implies && w_{2}x_{2} &= -w_{1}x_{1}-b \\ \implies && x_{2} &= -\frac{w_{1}}{w_{2}}x_{1}-\frac{b}{w_{2}} \end{align*} $$
This is similar to the equation of a line in 2D space. When the input data is of dimension $N$, it can be represented as follows.
$$ \begin{align*} \hat{y}&=\varphi \left( \begin{bmatrix} b & w_{1} & \dots &w_{N} \end{bmatrix} ^{T} \begin{bmatrix} 1& x_{1} & \dots & x_{N} \end{bmatrix} \right) \\ &= \varphi \left( \bf{w}^T \bf{x} \right) \end{align*} $$ Therefore, the learning goal of the perceptron is to draw a line that can distinguish between two types of data, as shown in the figure below.
Then, the weight vector $\mathbf{w}$ is represented as shown in the following figure.
As shown in the figure, since the angle between every blue dot and $\mathbf{w}$ is within 90 degrees, the dot product is positive, and since the angle with every green dot is more than 90 degrees, the dot product becomes negative. Thus, such conditions for $\mathbf{w}$ are not unique. In this way, when the perceptron can draw a line that separates the data as shown in the figure below1, it is said to be linearly separable.
Learning Method
The learning method is simple. Suppose the $n$th input data $\mathbf{x} (n)$ should be classified as $-1$, but the result is $\hat{y}(n)=\varphi \left( \mathbf{w}^{T}(n)\mathbf{x}(n) \right)=1$. This means that for the current weight, $\mathbf{w}^{T}(n)\mathbf{x}(n)$ is positive, and for the perceptron to predict correctly, the value of $\mathbf{w}^{T}(n)\mathbf{x}(n)$ needs to be negative. Therefore, it should be updated by subtracting some value from $\mathbf{w}^T(n)\mathbf{x}(n)$. This can be represented as follows, making the weight update depend on the input data $\mathbf{x}(n)$. $$ \begin{align*} && \mathbf{w}^T(n+1)\mathbf{x}(n) &=\mathbf{w}^T(n)\mathbf{x}(n)-\eta\mathbf{x}^T(n)\mathbf{x}(n) \\ \implies && \mathbf{w}(n+1) &= \mathbf{w}(n) -\eta \mathbf{x}(n) \end{align*} $$ Here, $\eta>0$ is a constant called the learning rate. Using the same method, for the opposite case where $+1$ is $\mathbf{x}$ predicted to be $-1$, the weight can be updated as follows. $$ \mathbf{w}(n+1) = \mathbf{w}(n) +\eta \mathbf{x} $$ The two cases can be represented at once as follows. $$ \mathbf{w}(n+1) = \mathbf{w}(n) +\eta y(n) \mathbf{x}(n) $$ Here, $y(n)$ is the label (correct answer) of $\mathbf{x}(n)$. In reality, the process of perceptron learning for 2D and 3D data can be seen in the following animations.
Simon Haykin. (2009). Neural Network and Learning Machines(3rd Edition) –> ↩︎