Levenshtein Algorithm
Algorithm
Input
Let’s denote the string as , , and .
Step 1. Initialization
Create the matrix and assign . Then fill the row and column as follows:
Step 2. Dynamic Programming
for and
if
else
Output
The minimum edit distance between and is .
Explanation
Edit distance is a measure of similarity between two strings, showing how many times one needs to change to convert to . Among these, Levenshtein distance allows three edits: insertion, deletion, and substitution, but does not allow transposition.
Example
For instance, with the strings ‘cats’ and ‘facts’, the modifications are:
- (substitution) cats → fats
- (insertion) fats → facts
Thus, the total modifications required equals .
However, such edit distances can be measured as more extended as inefficient methods could be used, such as cats → ats → fats → facts. The Levenshtein algorithm provides a way to calculate this distance in the smallest possible length.
Code
R
This screenshot is an example solved with the following R code.
LED<-function(A,B)
{
A<-strsplit(A,'')[[1]]
B<-strsplit(B,'')[[1]]
lA<-length(A)
lB<-length(B)
M<-matrix(NA,ncol=lA+1,nrow=lB+1,dimnames = list(c('',B),c('',A)))
M[1,]<-0:lA
M[,1]<-0:lB
for(i in (1:lB)+1)
{
for(j in (1:lA)+1)
{
if (B[i-1]==A[j-1])
{
M[i,j]<-M[i-1,j-1]
}
else
{
M[i,j]<-min(M[i-1,j-1],M[i,j-1],M[i-1,j])+1
}
}
}
return(list(distance=c(M[lB+1,lA+1]),matrix=M))
}
LED("cats","facts")
Python
The following is the same code written in Python.
def ED(a,b) :
a = ":".join(a)
A = a.split(":")
a = len(A)
b = ":".join(b)
B = b.split(":")
b = len(B)
M = [[j for i in range(a+1)] for j in range(b+1)]
M[0] = [i for i in range(a+1)]
for i in (range(1,b+1)) :
for j in (range(1,a+1)) :
if B[i-1]==A[j-1] :
M[i][j] = M[i-1][j-1]
else :
M[i][j] = min(M[i-1][j-1],M[i][j-1],M[i-1][j]) + 1
return M[b][a]