编辑距离 (edit distance)

问题:

给定两个字符串 A和B,由A转成B所需的最少编辑操作次数。允许的编辑操作包括将一个字符替换成另一个字符,插入一个字符,删除一个字符。
例如将A(kitten)转成B(sitting):
sitten (k→s)替换
sittin (e→i)替换
sitting (→g)插入

思路:

如果我们用 i 表示当前字符串 A 的下标,j 表示当前字符串 B 的下标。 如果我们用d[i, j] 来表示A[1, ... , i] B[1, ... , j] 之间的最少编辑操作数。那么我们会有以下发现:

1. d[0, j] = j;

2. d[i, 0] = i;

3. d[i, j] = d[i-1, j - 1] if A[i] == B[j]

4. d[i, j] = min(d[i-1, j - 1], d[i, j - 1], d[i-1, j]) + 1  if A[i] != B[j]

所以,要找出最小编辑操作数,只需要从底自上判断就可以了。伪代码如下:

int LevenshteinDistance(char s[1..m], char t[1..n])
{
  // for all i and j, d[i,j] will hold the Levenshtein distance between
  // the first i characters of s and the first j characters of t;
  // note that d has (m+1)x(n+1) values
  declare int d[0..m, 0..n]

  for i from 0 to m
    d[i, 0] := i // the distance of any first string to an empty second string
  for j from 0 to n
    d[0, j] := j // the distance of any second string to an empty first string

  for j from 1 to n
  {
    for i from 1 to m
    {
      if s[i] = t[j] then  
        d[i, j] := d[i-1, j-1]       // no operation required
      else
        d[i, j] := minimum
                   (
                     d[i-1, j] + 1,  // a deletion
                     d[i, j-1] + 1,  // an insertion
                     d[i-1, j-1] + 1 // a substitution
                   )
    }
  }

  return d[m,n]
}

public class Solution {
	public static void main(String[] args) {
		System.out.println(minDistance("a", "ab"));
	}
	
    public static int minDistance(String word1, String word2) {
        if (word1.length() == 0) return word2.length();
        if (word2.length() == 0) return word1.length();
        
        int[][] distance = new int[word1.length() + 1][word2.length() + 1];
        
        for (int i = 0; i <= word1.length(); i++) {
            distance[i][0] = i;
        }
        
        for (int i = 0; i <= word2.length(); i++) {
            distance[0][i] = i;
        }
        
        for (int i = 1; i <= word1.length(); i++) {
            for (int j = 1; j <= word2.length(); j++) {
                if (word1.charAt(i - 1) == word2.charAt(j - 1)) {
                    distance[i][j] = distance[i-1][j-1];
                } else {
                    distance[i][j] = min(distance[i-1][j], distance[i][j-1], distance[i-1][j-1]) + 1;
                }
            }
        }
        return distance[word1.length()][word2.length()];
    }

参考:http://en.wikipedia.org/wiki/Levenshtein_distance


你可能感兴趣的:(编辑距离 (edit distance))