字符串编辑距离(Edit Distance)分析和源代码

1. 字符串编辑距离定义

    定义:字符串编辑距离指的是字符串s1变为s2最少需要多少次替换,插入,删除操作。

                

               Ed(s1,s2) = minimum # of operations (insertion, deletion,substitution) to change s1 to s2
     
    Example:

  s1: Tom Hanks

  s2: Ton Hank

  ed(s1, s2) = 2


2.  字符串编辑距离求解

     动态规划求解,时间复杂度O(mn)

A = a1 a2...am
B = b1 b2...bn
Di,j = edit distance of a1 a2 ... ai and b1 b2... bj.

Di,0 = i,     0<=i <= m

D0,j = j,     0 <= j<= n

Di,j = min{Di-1, j + 1,

                  Di, j-1 + 1,

                  Di-1, j-1 + ti, j},           1 <= i <=m, 1 <=j <= n

      where ti, j = 0 if ai = bj and ti, j = 1 if ai != bj.

2. 源程序

int getMin(int a, int b, int c)
{
	int min1 = (a<=b)?a:b;
	int min2 =(min1<=c)?min1:c;
	return min2;
}

int getED(const char*str1, const char*str2) {
	int m = strlen(str1);
	int n = strlen(str2);
	int **dis = new int*[m+1];
	for(int i=0; i<m+1; i++) {
		dis[i] = new int[n+1];
	}
	for(int i=0; i<=m; i++) {
		dis[i][0]=i;
	}
	for(int j=0; j<=n; j++) {
		dis[0][j]=j;
	}
	for(int i=1; i<=m; i++) {
		for(int j=1; j<=n; j++) {	
			int tij=(str1[i-1]==str2[j-1])?0:1;		
			dis[i][j] = getMin(dis[i-1][j]+1, dis[i][j-1]+1, dis[i-1][j-1]+tij);
		}
	}
	return dis[m][n];
}



你可能感兴趣的:(c,IM,distance)