Knuth-Morris-Pratt算法
KMP: 单模式匹配, 判断s1是否是s2的子串
*** 是将学习了很多地方的KMP算法,整理出来的笔记移到csdn博客上,因为没有记录原来参考的文章,所以不能提供引用的链接了.sorry.
原理:
通过一个辅助函数next(),实现跳过不必要的目标字符串,已达到优化效果
时间复杂度: O(m+n)
主要思想:
在失配之后,并不简单的从目标串的下一个字符开始新一轮的检测, 而是依据在检测之前得到的有用信息,直接跳过必要的检测
* 有用信息: 前缀函数next
let P = 已经匹配的字符串 exp. ababa, P = 5
L = len(特殊字符串), 指即使自身真后缀(不等于自己), 又是自身最长前缀的字符串, for ababa, 特殊字符串 = aba, L = 3
则有效位移 S = P - L = 5-3 = 2,
这里有很详细的讲述kmp算法的例子(但文章的next数组初始化为0,而本文的初始化为-1,不过不影响理解,只是使用next时,有区别而已)
http://www.ruanyifeng.com/blog/2013/05/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm.html
代码:
KMP的核心: 获得记录跳转状态的next数组
// exp: "cdf": [-1, -1, -1] // exp: "ababa": [-1, -1, 0, 1, 2] public int[] next(String sub){ int[] a = new int[sub.length()]; char[] c = sub.toCharArray(); int i = 0, j; // initial the first bit, start from the second a[0] = -1; for(j = 1; j < sub.length(); j++){ i = a[j-1]; while(i >= 0 && c[j]!= c[i+1]) i = a[i]; if(c[j] == c[i+1]) a[j] = i+1; else a[j] = -1; } return a; }
匹配方法:
public int pattern(String str, String sub){ int[] next = next(sub); char[] ch1 = str.toCharArray(); char[] ch2 = str.toCharArray(); int i = 0, j = 0; // i->ch1, j->ch2 for(;i < ch1.length;){ // if there is a match if(ch1[i] == ch2[j]){ if(j == ch2.length - 1){ return (i - ch2.length + 1); } i++; j++; }else if(j = 0){ // the first char of the target is a dismatch i++; }else{ // jump some already parsed chars // ch1[i] still need to be checked with ch2[j_new] j = next[j-1] + 1; } } return -1; }