logo

Perceptron Definition 📂Machine Learning

Perceptron Definition

Definition

A perceptron is defined as the composition of a linear function f(x)=wx+bf(x) = wx + b and a unit step function HH.

Perceptron:=Hf(x)=H(wx+b) \text{Perceptron} := H \circ f (x) = H(wx + b)

In the case of a multivariable function, f(x)=wx+b=w1x1+wnxn+bf(\mathbf{x}) = \mathbf{w}\cdot \mathbf{x} + b = w_{1}x_{1} + \cdots w_{n}x_{n} + b and,

Perceptron:=Hf(x)=H(wx+b) \text{Perceptron} := H \circ f (\mathbf{x}) = H(\mathbf{w} \cdot \mathbf{x} + b)

This is also referred to as a single layer perceptron.

Description

The perceptron was devised by Rosenblatt in 1957 and is the first model of supervised learning. In machine learning, it is explained in complicated terms such as weights, biases, and activation functions, but it is simply a composition of linear functions and step functions. The weight represents the coefficient of the linear term, and the bias represents the constant term.

The unit step function can be defined as it pleases for convenience.

H(x)={1x>00x<0orH(x)={1x>01x<0 H(x) = \begin{cases} 1 & x \gt 0 \\ 0 & x \lt 0\end{cases} \quad \text{or} \quad H(x) = \begin{cases} 1 & x \gt 0 \\ -1 & x \lt 0\end{cases}

Furthermore, it is not necessary that the unit step function must occupy HH; other functions that serve as activation functions are also applicable. Hence, generically speaking, a perceptron is a composition of a linear function and an activation function.

Based on whether the value is greater or smaller than 00, it is classified accordingly. Thus, by reformulating the equation, the following result can be obtained.

w1x1+w2x2+b=0    w2x2=w1x1b    x2=w1w2x1bw2 \begin{align*} && w_{1}x_{1} +w_{2}x_{2}+b &= 0 \\ \implies && w_{2}x_{2} &= -w_{1}x_{1}-b \\ \implies && x_{2} &= -\frac{w_{1}}{w_{2}}x_{1}-\frac{b}{w_{2}} \end{align*}

This is similar to the equation of a line in 2D space. When the input data is of dimension NN, it can be represented as follows.

y^=φ([bw1wN]T[1x1xN])=φ(wTx) \begin{align*} \hat{y}&=\varphi \left( \begin{bmatrix} b & w_{1} & \dots &w_{N} \end{bmatrix} ^{T} \begin{bmatrix} 1& x_{1} & \dots & x_{N} \end{bmatrix} \right) \\ &= \varphi \left( \bf{w}^T \bf{x} \right) \end{align*} Therefore, the learning goal of the perceptron is to draw a line that can distinguish between two types of data, as shown in the figure below. 1.PNG

Then, the weight vector w\mathbf{w} is represented as shown in the following figure. 3.PNG

As shown in the figure, since the angle between every blue dot and w\mathbf{w} is within 90 degrees, the dot product is positive, and since the angle with every green dot is more than 90 degrees, the dot product becomes negative. Thus, such conditions for w\mathbf{w} are not unique. In this way, when the perceptron can draw a line that separates the data as shown in the figure below1, it is said to be linearly separable. 2.PNG

Learning Method

The learning method is simple. Suppose the nnth input data x(n)\mathbf{x} (n) should be classified as 1-1, but the result is y^(n)=φ(wT(n)x(n))=1\hat{y}(n)=\varphi \left( \mathbf{w}^{T}(n)\mathbf{x}(n) \right)=1. This means that for the current weight, wT(n)x(n)\mathbf{w}^{T}(n)\mathbf{x}(n) is positive, and for the perceptron to predict correctly, the value of wT(n)x(n)\mathbf{w}^{T}(n)\mathbf{x}(n) needs to be negative. Therefore, it should be updated by subtracting some value from wT(n)x(n)\mathbf{w}^T(n)\mathbf{x}(n). This can be represented as follows, making the weight update depend on the input data x(n)\mathbf{x}(n). wT(n+1)x(n)=wT(n)x(n)ηxT(n)x(n)    w(n+1)=w(n)ηx(n) \begin{align*} && \mathbf{w}^T(n+1)\mathbf{x}(n) &=\mathbf{w}^T(n)\mathbf{x}(n)-\eta\mathbf{x}^T(n)\mathbf{x}(n) \\ \implies && \mathbf{w}(n+1) &= \mathbf{w}(n) -\eta \mathbf{x}(n) \end{align*} Here, η>0\eta>0 is a constant called the learning rate. Using the same method, for the opposite case where +1+1 is x\mathbf{x} predicted to be 1-1, the weight can be updated as follows. w(n+1)=w(n)+ηx \mathbf{w}(n+1) = \mathbf{w}(n) +\eta \mathbf{x} The two cases can be represented at once as follows. w(n+1)=w(n)+ηy(n)x(n) \mathbf{w}(n+1) = \mathbf{w}(n) +\eta y(n) \mathbf{x}(n) Here, y(n)y(n) is the label (correct answer) of x(n)\mathbf{x}(n). In reality, the process of perceptron learning for 2D and 3D data can be seen in the following animations.

perceptron3.gif

perceptron3d10.gif


  1. Simon Haykin. (2009). Neural Network and Learning Machines(3rd Edition) –> ↩︎