logo

Levenshtein Algorithm 📂Algorithm

Levenshtein Algorithm

Algorithm

Input

Let’s denote the string as $A,B$, $A=[a_{i}]=(a_{1}, a_{2} , \cdots, a_{n})$, and $B=[b_{j}]=(b_{1}, b_{2} , \cdots, b_{m})$.


Step 1. Initialization

Create the matrix $M_{(n+1) \times (m+1)} = [m_{x y }]$ and assign $M_{11} ← 0$. Then fill the $1$ row and $1$ column as follows: $$ M_{(i+1) 1} ← i \\ M_{ 1 (j+1)} ← j $$


Step 2. Dynamic Programming

for $i = 1, 2, \cdots , n$ and $j=1,2, \cdots , m$

  if $a_{i}==b_{j}$

    $M_{i,j} ← M_{(i-1)(j-1)}$

  else

    $M_{i,j} ← \min \left\{ M_{(i-1)(j)}, M_{(i)(j-1)}, M_{(i-1)(j-1)}\right\} + 1 $


Output

The minimum edit distance between $A$ and $B$ is $m_{nm}$.

Explanation

Edit distance is a measure of similarity between two strings, showing how many times one needs to change to convert $A$ to $B$. Among these, Levenshtein distance allows three edits: insertion, deletion, and substitution, but does not allow transposition.

Example

For instance, with the strings ‘cats’ and ‘facts’, the modifications are:

  1. (substitution) cats → fats
  2. (insertion) fats → facts

Thus, the total modifications required equals $2$.

However, such edit distances can be measured as more extended as inefficient methods could be used, such as cats → ats → fats → facts. The Levenshtein algorithm provides a way to calculate this distance in the smallest possible length.

Code

R

20180407\_081358.png

This screenshot is an example solved with the following R code.

LED<-function(A,B)
{
  A<-strsplit(A,'')[[1]]
  B<-strsplit(B,'')[[1]]
  lA<-length(A)
  lB<-length(B)
  
  M<-matrix(NA,ncol=lA+1,nrow=lB+1,dimnames = list(c('',B),c('',A)))
  M[1,]<-0:lA
  M[,1]<-0:lB
  
  for(i in (1:lB)+1)
  {
    for(j in (1:lA)+1)
    {
      if (B[i-1]==A[j-1])
      {
        M[i,j]<-M[i-1,j-1]
      }
      else
      {
        M[i,j]<-min(M[i-1,j-1],M[i,j-1],M[i-1,j])+1
      }
    }
  }
 
  return(list(distance=c(M[lB+1,lA+1]),matrix=M))
}
 
LED("cats","facts")

Python

The following is the same code written in Python.

def ED(a,b) :
    a = ":".join(a)
    A = a.split(":")
    a = len(A)
    b = ":".join(b)
    B = b.split(":")
    b = len(B)
    M = [[j for i in range(a+1)] for j in range(b+1)]
    M[0] = [i for i in range(a+1)]
    for i in (range(1,b+1)) :
        for j in (range(1,a+1)) :
            if B[i-1]==A[j-1] :
                M[i][j] = M[i-1][j-1]
            else :
                M[i][j] = min(M[i-1][j-1],M[i][j-1],M[i-1][j]) + 1
    return M[b][a]