Levenshtein Algorithm
Algorithm
Input
Let’s denote the string as $A,B$, $A=[a_{i}]=(a_{1}, a_{2} , \cdots, a_{n})$, and $B=[b_{j}]=(b_{1}, b_{2} , \cdots, b_{m})$.
Step 1. Initialization
Create the matrix $M_{(n+1) \times (m+1)} = [m_{x y }]$ and assign $M_{11} ← 0$. Then fill the $1$ row and $1$ column as follows: $$ M_{(i+1) 1} ← i \\ M_{ 1 (j+1)} ← j $$
Step 2. Dynamic Programming
for $i = 1, 2, \cdots , n$ and $j=1,2, \cdots , m$
if $a_{i}==b_{j}$
$M_{i,j} ← M_{(i-1)(j-1)}$
else
$M_{i,j} ← \min \left\{ M_{(i-1)(j)}, M_{(i)(j-1)}, M_{(i-1)(j-1)}\right\} + 1 $
Output
The minimum edit distance between $A$ and $B$ is $m_{nm}$.
Explanation
Edit distance is a measure of similarity between two strings, showing how many times one needs to change to convert $A$ to $B$. Among these, Levenshtein distance allows three edits: insertion, deletion, and substitution, but does not allow transposition.
Example
For instance, with the strings ‘cats’ and ‘facts’, the modifications are:
- (substitution) cats → fats
- (insertion) fats → facts
Thus, the total modifications required equals $2$.
However, such edit distances can be measured as more extended as inefficient methods could be used, such as cats → ats → fats → facts. The Levenshtein algorithm provides a way to calculate this distance in the smallest possible length.
Code
R
This screenshot is an example solved with the following R code.
LED<-function(A,B)
{
A<-strsplit(A,'')[[1]]
B<-strsplit(B,'')[[1]]
lA<-length(A)
lB<-length(B)
M<-matrix(NA,ncol=lA+1,nrow=lB+1,dimnames = list(c('',B),c('',A)))
M[1,]<-0:lA
M[,1]<-0:lB
for(i in (1:lB)+1)
{
for(j in (1:lA)+1)
{
if (B[i-1]==A[j-1])
{
M[i,j]<-M[i-1,j-1]
}
else
{
M[i,j]<-min(M[i-1,j-1],M[i,j-1],M[i-1,j])+1
}
}
}
return(list(distance=c(M[lB+1,lA+1]),matrix=M))
}
LED("cats","facts")
Python
The following is the same code written in Python.
def ED(a,b) :
a = ":".join(a)
A = a.split(":")
a = len(A)
b = ":".join(b)
B = b.split(":")
b = len(B)
M = [[j for i in range(a+1)] for j in range(b+1)]
M[0] = [i for i in range(a+1)]
for i in (range(1,b+1)) :
for j in (range(1,a+1)) :
if B[i-1]==A[j-1] :
M[i][j] = M[i-1][j-1]
else :
M[i][j] = min(M[i-1][j-1],M[i][j-1],M[i-1][j]) + 1
return M[b][a]