Sunday, September 2, 2007

Compute the similarity between two words using Levenshtein distance algorithm

 

Compute the similarity between two words using Levenshtein distance algorithm

In order to check similarties between two strings many methods have been implemented. Levenshtein method is famous, and invented by Russian scientist called Vladimir Levenshtein,in 1965. 'It is implemented in PHP languge 'It is faster than any other methods algorithm but maybe less accurate ' It needs two input strings then it will return the distance between the two strings. ' The larger the number, the bigger the difference. 'The Levenshtein distance algorithm has been used in: 'Spell checking 'Speech recognition 'DNA analysis 'Plagiarism detection
 
//**************************************
//
// Name: Compute the similarity between
// two words using Levenshtein distance alg
// orithm
// Description:In order to check similar
// ties between two strings many methods ha
// ve been implemented. Levenshtein method
// is famous, and invented by Russian scien
// tist called Vladimir Levenshtein,in 1965
// .
'It is implemented in PHP languge
'It is faster than any other methods algorithm but maybe less accurate
' It needs two input strings then it will return the distance between the two strings.
' The larger the number, the bigger the difference.
'The Levenshtein distance algorithm has been used in:
'Spell checking
'Speech recognition
'DNA analysis
'Plagiarism detection
I did not see this in PSC so I post it. I translated it from C# to VB .NET
// By: Salem Al Shekaili
//
// Inputs:'For example If s = "sale" and
// t = "sale", then Levenshtein_distance(s,
// t) = 0, because no transformations are n
// eeded. The strings are already identical
// .
'If s is "sale" and t is "nale", Levenshtein_distance(s,t) = 1, because one substitution (change "s" to "n") is sufficient to transform s into t.
'read more information here http://www.merriampark.com/ld.htm
'and see http://www.let.rug.nl/~kleiweg/lev/ for demo
//
// Returns:Numerical
//
// Side Effects:Nothing
//
//This code is copyrighted and has // limited warranties.Please see http://
// www.Planet-Source-Code.com/vb/scripts/Sh
// owCode.asp?txtCodeId=5858&lngWId=10 //for details. //**************************************
//

Public Function Levenshtein_distance(ByVal s As String, ByVal t As String) As Integer
Dim i As Integer ' iterates through s
Dim j As Integer ' iterates through t
Dim s_i As String ' ith character of s
Dim t_j As String ' jth character of t
Dim cost As Integer ' cost
' Step 1
Dim n As Integer = s.Length
'length of s
Dim m As Integer = t.Length
'length of t
If n = 0 Then
Return m
End If
If m = 0 Then
Return n
End If
Dim d(0 To n, 0 To m) As Integer
' Step 2
For i = 0 To n
d(i, 0) = i
Next i
For j = 0 To m
d(0, j) = j
Next j
' Step 3
For i = 1 To n
s_i = s.Substring(i - 1, 1)
' Step 4
For j = 1 To m
t_j = t.Substring(j - 1, 1)
' Step 5
If s_i = t_j Then
cost = 0
Else
cost = 1
End If
' Step 6
d(i, j) = System.Math.Min(System.Math.Min((d((i - 1), j) + 1), (d(i, (j - 1)) + 1)), (d((i - 1), (j - 1)) + cost))
Next j
Next i
' Step 7
Return d(n, m)
End Function