对于编辑距离为1的字符串,我们只需要通过一次操作(添加、删除或修改)一个字符可以使得两个字符串相等。
使用递归来计算编辑距离。
- package stringsimilarity;
- public class SimilarityFactory {
- public static int getDistance(String a, String b){
- if(a.length()==0){
- return b.length();
- }
- if(b.length()==0){
- return a.length();
- }
- if(a.charAt(0)==b.charAt(0)){
- return getDistance(a.substring(1),b.substring(1));
- }else{
- int s1= getDistance(a,b.substring(1));
- int s2= getDistance(a.substring(1),b);
- int s3= getDistance(a.substring(1),b.substring(1));
- int min = s1;
- if(s2<min)
- min= s2;
- if(s3<min)
- min= s3;
- return min+1;
- }
- }
- public static double getSimilarity(String a, String b){
- int d = getDistance(a,b);
- return 1.0/(d+1);
- }
- }
上述程序可以改进
1. 将字符串传递参数换成传递char数组指针
- public static double getSimilarity2(String a, String b){
- //int d = getDistance(a,b);
- int d = getDistance(a.toCharArray(),0,b.toCharArray(),0);
- return 1.0/(d+1);
- }
- public static int getDistance(char[] a, int beginA, char[] b, int beginB){
- if(a.length - beginA ==0){
- return b.length - beginB;
- }
- if(b.length - beginB ==0){
- return a.length - beginA;
- }
- if(a[beginA]==b[beginB]){
- return getDistance(a, beginA+1, b, beginB+1);
- }else{
- int s1= getDistance(a, beginA, b, beginB+1);
- int s2= getDistance(a, beginA+1, b, beginB);
- int s3= getDistance(a, beginA+1, b, beginB+1);
- int min = s1;
- if(s2<min)
- min= s2;
- if(s3<min)
- min= s3;
- return min+1;
- }
- }
2. 在递归过程中有些递归调用是重复的,可以通过记录中间值避免重新计算
可以使用HashMap来保存,或使用数组
- public static double getSimilarity4(String a, String b){
- int[][] dis = new int[a.length()+1][b.length()+1];
- for(int i = 0;i<dis.length;i++){
- for (int j = 0; j < dis[0].length; j++)
- dis[i][j]=-1;
- }
- int d = getDistance3(a.toCharArray(),0,b.toCharArray(),0, dis);
- return 1.0/(d+1);
- }
- public static int getDistance3(char[] a, int beginA, char[] b, int beginB, int[][] dis){
- if(a.length - beginA ==0){
- return b.length - beginB;
- }
- if(b.length - beginB ==0){
- return a.length - beginA;
- }
- if(a[beginA]==b[beginB]){
- int s = dis[beginA+1][beginB+1];
- if(s == -1){
- s = getDistance3(a, beginA+1, b, beginB+1,dis);
- dis[beginA+1][beginB+1] = s;
- }
- return s;
- }else{
- int s1 = dis[beginA][beginB+1];
- if(s1 == -1){
- s1= getDistance3(a, beginA, b, beginB+1,dis);
- dis[beginA][beginB+1] = s1;
- }
- int s2 = dis[beginA+1][beginB];
- if(s2 == -1){
- s2= getDistance3(a, beginA+1, b, beginB,dis);
- dis[beginA+1][beginB] = s2;
- }
- int s3 = dis[beginA+1][beginB+1];
- if(s3 == -1){
- s3= getDistance3(a, beginA+1, b, beginB+1,dis);
- dis[beginA+1][beginB+1] = s3;
- }
- int min = s1;
- if(s2<min)
- min= s2;
- if(s3<min)
- min= s3;
- return min+1;
- }
- }
测试:
- package stringsimilarity;
- import junit.framework.TestCase;
- public class TestSimilarity extends TestCase {
- public void testSimilarity1(){//添加一个字符
- String a = "traveling";
- String b = "travelling";
- double s = SimilarityFactory.getSimilarity(a, b);
- System.out.println(s);
- assertEquals(s,0.5);
- }
- public void testSimilarity2(){//删除一个字符
- String a = "about";
- String b = "bout";
- double s = SimilarityFactory.getSimilarity(a, b);
- System.out.println(s);
- assertEquals(s,0.5);
- }
- public void testSimilarity3(){//修改一个字符
- String a = "flying";
- String b = "flyong";
- double s = SimilarityFactory.getSimilarity(a, b);
- System.out.println(s);
- assertEquals(s,0.5);
- }
- public void testSimilarity4(){//修改一个字符
- long start = System.currentTimeMillis();
- String a = "appidimiologist";
- String b = "epidemiologist";
- //double s = SimilarityFactory.getSimilarity(a, b);//传字符串作参数21.734s
- //double s = SimilarityFactory.getSimilarity2(a, b);//传char数组引用6.141s
- //double s = SimilarityFactory.getSimilarity3(a, b);//使用hashmap保持计算过的值
- double s = SimilarityFactory.getSimilarity4(a, b);//使用数组保存计算过的值0.015s
- System.out.println("result:"+s);
- assertEquals(s,0.25);
- long end = System.currentTimeMillis();
- System.out.println("Total time:"+ (end - start)/1000.0);
- }
- }
改进后运行时间效率有很大的改善。