经典算法题07-最短编辑距离

这篇我们看看最长公共子序列的另一个版本,求字符串相似度(求最短编辑距离),这是一个非常实用的算法,在DNA对比,网页聚类等方面都有用武之地。

一:概念

对于两个字符串A和B,通过基本的增删改将字符串A改成B,或者将B改成A,在改变的过程中我们使用的最少步骤称之为“编辑距离”。

比如如下的字符串:

dcga
edcb

我们通过种种操作,痉挛之后编辑距离为3,不知道你看出来了没有?

二:解析

设A和B是2个字符串。要用最少的字符操作将字符串A转换为字符串B。这里所说的字符操作包括:

  • (1)删除一个字符;
  • (2)插入一个字符;
  • (3)将一个字符改为另一个字符。
算法:

首先给定第一行和第一列,然后,每个值d[i,j]这样计算:

d[i][j] = min(d[i-1][j]+1, d[i][j-1]+1, d[i-1][j-1]+(s1[i]==s2[j]?0:1));

最后一行,最后一列的那个值就是最小编辑距离

分析

dp[i][j]表示a的前i个和b的前j个相同后的最短距离,dp[i][j]来自于三种状态:

    1、删除,dp[i-1][j]+1;
    2、插入,dp[i][j-1]+1;
    3、替换,if(a[i]==b[j]) dp[i][j]=dp[i-1][j-1],else dp[i][j]=dp[i-1][j-1]+1;

另一种写法:

①: 当 Xi = Yi 时,则C[i,j]=C[i-1,j-1];

②:当 Xi != Yi 时, 则C[i,j]=Min{(C[i-1,j-1],C[i-1,j],C[i,j-1])+1}

三:编码

/** * Created by xuming on 2016/6/16. */
public class minEditDistance {

    public static void main(String[] args) {
        String str1 = "dcga";
        String str2 = "edcb";
        int dis = getEditDistance(str1, str2);
        System.out.println("str1:" + str1 + ";str2:" + str2 + "; the distance is :" + dis);

    }

    public static int getEditDistance(String str1, String str2) {
        int[][] martix =new int[str1.length()+1][str2.length()+1];
        //init boundary = 0
        for (int i = 0; i <= str1.length(); i++) {
            martix[i][0] = i;
        }
        for (int j = 0; j <= str2.length(); j++) {
            martix[0][j] = j;
        }

        //martix x
        for (int i = 1; i <= str1.length(); i++) {
            //martix y
            for (int j = 1; j <= str2.length(); j++) {
                //equal
                if (str1.charAt(i - 1) == str2.charAt(j - 1)) {
                    martix[i][j] = martix[i - 1][j - 1];
                } else {
                    //get min value:leftfront vs below
                    int temp = Math.min(martix[i - 1][j], martix[i][j - 1]);
                    //get min martix
                    martix[i][j] = Math.min(temp, martix[i - 1][j - 1]) + 1;

                }
            }
        }
        int result = martix[str1.length()][str2.length()];
        return result;


    }


}

四:结果


五:附件(类似题目)

Time Limit: 1000MS Memory Limit: 65536K
Total Submissions: 12332 Accepted: 4623

Description

Let x and y be two strings over some finite alphabet A. We would like to transform x into y allowing only operations given below:

  • Deletion: a letter in x is missing in y at a corresponding position.
  • Insertion: a letter in y is missing in x at a corresponding position.
  • Change: letters at corresponding positions are distinct

Certainly, we would like to minimize the number of all possible operations.

Illustration

A G T A A G T * A G G C 

| | | | | | |

A G T * C * T G A C G C

Deletion: * in the bottom line
Insertion: * in the top line
Change: when the letters at the top and bottom are distinct

This tells us that to transform x = AGTCTGACGC into y = AGTAAGTAGGC we would be required to perform 5 operations (2 changes, 2 deletions and 1 insertion). If we want to minimize the number operations, we should do it like

A  G  T  A  A  G  T  A  G  G  C 

| | | | | | |

A G T C T G * A C G C

and 4 moves would be required (3 changes and 1 deletion).

In this problem we would always consider strings x and y to be fixed, such that the number of letters in x is m and the number of letters in y is n where nm.

Assign 1 as the cost of an operation performed. Otherwise, assign 0 if there is no operation performed.

Write a program that would minimize the number of possible operations to transform any string x into a string y.

Input

The input consists of the strings x and y prefixed by their respective lengths, which are within 1000.

Output

An integer representing the minimum number of possible operations to transform any string x into a string y.

Sample Input

10 AGTCTGACGC 
11 AGTAAGTAGGC

Sample Output

4

Source

Manila 2006

你可能感兴趣的:(算法)