java 比较两个字符串的相似度 org.springframework.beans包里面有

org.springframework.beans.PropertyMatches 的 

calculateStringDistance 方法

/**
	 * Calculate the distance between the given two Strings
	 * according to the Levenshtein algorithm.
	 * @param s1 the first String
	 * @param s2 the second String
	 * @return the distance value
	 */
	private static int calculateStringDistance(String s1, String s2) {
		if (s1.isEmpty()) {
			return s2.length();
		}
		if (s2.isEmpty()) {
			return s1.length();
		}
		int d[][] = new int[s1.length() + 1][s2.length() + 1];

		for (int i = 0; i <= s1.length(); i++) {
			d[i][0] = i;
		}
		for (int j = 0; j <= s2.length(); j++) {
			d[0][j] = j;
		}

		for (int i = 1; i <= s1.length(); i++) {
			char s_i = s1.charAt(i - 1);
			for (int j = 1; j <= s2.length(); j++) {
				int cost;
				char t_j = s2.charAt(j - 1);
				if (s_i == t_j) {
					cost = 0;
				}
				else {
					cost = 1;
				}
				d[i][j] = Math.min(Math.min(d[i - 1][j] + 1, d[i][j - 1] + 1),
						d[i - 1][j - 1] + cost);
			}
		}

		return d[s1.length()][s2.length()];
	}

 

对于两个字符串ab,长度分别为\left | a \right |\left | b \right |,它们的Levenshtein Distance \operatorname{lev}_{a,b}(|a|,|b|)为:

\operatorname{lev}_{a,b}(i,j)=\left\{\begin{matrix} max(j,j)& \text{if min}(i,j)=0\\ min\left\{\begin{matrix} \operatorname{lev}_{a,b}(i-1,j)+1 \\ \operatorname{lev}_{a,b}(i,j-1)+1 \\ \operatorname{lev}_{a,b}(i-1,j-1)+1_{(a_{i}\neq b_{j})} \end{matrix}\right.& \text{otherwise} \end{matrix}\right.

其中当a_{i}=b_{j}时,1_{(a_{i}\neq b_{j})}为0,否则为1。\operatorname{lev}_{a,b}(i,j)就是a的前i个字符与b的前j个字符的编辑距离。

ab的相似度Sim_{a,b}Sim_{a,b}=1-(\operatorname{lev}_{a,b}(\left | a \right |,\left | b \right |)/max(\left | a \right |,\left | b \right |))

 

float similarity =1-calculateStringDistance(s1,s2)/Math.max(s1.length,s2.length());

相似度为similarity。

你可能感兴趣的:(笔记,源码分析)