Gradient Descent in Mathematics 📂Optimization

Gradient Descent in Mathematics

Definition ¹

A scalar function $\varphi : \mathbb{R}^{n} \to \mathbb{R}$ is called a Cost Function. An algorithm that finds $\mathbf{x}_{n+1}$ that satisfies $\varphi ( \mathbf{x}_{n+1} ) < \varphi ( \mathbf{x}_{n} )$ in $\mathbf{x} = \mathbf{x}_{n}$ to minimize the cost function $\varphi ( \mathbf{x} )$ is called the Descent Method.

Explanation

Let’s consider building a house as an example for a cost function, $\varphi$ . The resources required for building a house would include wood, stone, steel, glass, labor costs, real estate, etc., which ultimately boil down to the issue of ‘how much money needs to be spent’. In this scenario, $\varphi$ becomes the function that maps the vector of resources into a scalar of cost. A common question would be ‘what is the minimum cost’. Meanwhile, in fields like machine learning, the cost function is also referred to as a loss function, where the question becomes ‘when is the difference between actual values and predicted values minimized’.

No matter the problem, it can be abstractly summarized as ‘finding the minimum value of a scalar function’. The descent method is a method to solve this issue by finding the scalar’s local minimum on a $n$ dimensional manifold. It’s important to note that it finds a local minimum, not necessarily the minimum. If $\varphi$ is convex, the descent method might locate the global minimum, but generally, it can only guarantee finding local minima.

Gradient Descent

Among various descent methods, gradient descent is the most popular. It finds the local minimum by utilizing the gradient of the cost function. For practical applications of this method, it is recommended to learn about Gradient Descent in Machine Learning.

For an appropriate $\alpha>0$ , $\mathbf{x}_{n+1} := \mathbf{x}_{n} - \alpha \nabla \varphi ( \mathbf{x}_{n} )$ is called Gradient Descent.

$- \nabla \varphi ( \mathbf{x}_{n} )$ indicates the direction in which the manifold decreases most steeply, so if $\alpha$ is suitable, $\left\{ \mathbf{x}_{n} \right\}$ is expected to converge to a local minimum.

However, while this method is straightforward and easy, it’s not necessarily faster compared to other methods in terms of convergence speed and is a greedy algorithm. This means it optimizes for the largest decrease at each step locally but might not be the best approach globally.

Code

Below is an implementation of gradient descent in R code, using the numDeriv package to calculate the gradient.

library(numDeriv)
 
optimizer<-function(f,x0,alpha=0.01){
  while(abs(f(x0))>10^(-8)){
    x0<-x0-alpha*grad(f,x0)
  }
  return(x0)
}
 
z<-function(v){
  return((v[1]-2)^2+(v[2]-3)^2)
}
 
optimizer(z,c(0,0))
z(c(1.999945,2.999918))

The result of executing the above code is as follows:

$20190329\_151329.png$

This result is a simple example that demonstrates finding the point which minimizes $z(x,y) = (x-2)^2 + (y-3)^2$ . On the following surface, the point where $z$ is minimized is $(2,3)$ .

$20190329\_152428.png$