先粘上我入门KMP时看的大佬的博客:orz orz
从头到尾彻底理解KMP
我觉得这篇已经讲的很详细了,希望大家能坚持看下去。
步骤
①寻找前缀后缀最长公共元素长度
对于P = p0 p1 ...pj-1 pj,寻找模式串P中长度最大且相等的前缀和后缀。如果存在p0 p1 ...pk-1 pk = pj- k pj-k+1...pj-1 pj,那么在包含pj的模式串中有最大长度为k+1的相同前缀后缀。
举个例子,如果给定的模式串为“abab”,那么它的各个子串的前缀后缀的公共元素的最大长度如下表格所示:
比如对于字符串aba来说,它有长度为1的相同前缀后缀a;而对于字符串abab来说,它有长度为2的相同前缀后缀ab(相同前缀后缀的长度为k + 1,k + 1 = 2)。
②求next数组
next 数组考虑的是除当前字符外的最长相同前缀后缀,所以通过第①步骤求得各个前缀后缀的公共元素的最大长度后,只要稍作变形即可:将第①步骤中求得的值整体右移一位,然后初值赋为-1,如下表格所示:
比如对于aba来说,第3个字符a之前的字符串ab中有长度为0的相同前缀后缀,所以第3个字符a对应的next值为0;而对于abab来说,第4个字符b之前的字符串aba中有长度为1的相同前缀后缀a,所以第4个字符b对应的next值为1(相同前缀后缀的长度为k,k = 1)。
③根据next数组进行匹配
匹配失配,j = next [j],模式串向右移动的位数为:j - next[j]。换言之,当模式串的后缀pj-k pj-k+1, ..., pj-1 跟文本串si-k si-k+1, ..., si-1匹配成功,但pj 跟si匹配失败时,因为next[j] = k,相当于在不包含pj的模式串中有最大长度为k 的相同前缀后缀,即p0 p1 ...pk-1 = pj-k pj-k+1...pj-1,故令j = next[j],从而让模式串右移j - next[j] 位,使得模式串的前缀p0 p1, ..., pk-1对应着文本串 si-k si-k+1, ..., si-1,而后让pk 跟si 继续匹配。如下图所示:
综上,KMP的next 数组相当于告诉我们:
当模式串中的某个字符跟文本串中的某个字符匹配失配时,模式串下一步应该跳到哪个位置。
如模式串中在j 处的字符跟文本串在i 处的字符匹配失配时,下一步用next [j] 处的字符继续跟文本串i 处的字符匹配,相当于模式串向右移动 j - next[j] 位。
下面给出模板:
next数组的求法:
1 void GetNext(char* p,int next[]) 2 { 3 int pLen = strlen(p); 4 next[0] = -1; 5 int k = -1; 6 int j = 0; 7 while (j < pLen - 1) 8 { 9 //p[k]表示前缀,p[j]表示后缀 10 if (k == -1 || p[j] == p[k]) 11 { 12 ++k; 13 ++j; 14 next[j] = k; 15 } 16 else 17 { 18 k = next[k]; 19 } 20 } 21 }
求next数组的改进版:
1 //优化过后的next 数组求法 2 void GetNextval(char* p, int next[]) 3 { 4 int pLen = strlen(p); 5 next[0] = -1; 6 int k = -1; 7 int j = 0; 8 while (j < pLen - 1) 9 { 10 //p[k]表示前缀,p[j]表示后缀 11 if (k == -1 || p[j] == p[k]) 12 { 13 ++j; 14 ++k; 15 //较之前next数组求法,改动在下面4行 16 if (p[j] != p[k]) 17 next[j] = k; //之前只有这一行 18 else 19 //因为不能出现p[j] = p[ next[j ]],所以当出现时需要继续递归,k = next[k] = next[next[k]] 20 next[j] = next[k]; 21 } 22 else 23 { 24 k = next[k]; 25 } 26 } 27 }
KMP算法:
1 int KmpSearch(char* s, char* p) 2 { 3 int i = 0; 4 int j = 0; 5 int sLen = strlen(s); 6 int pLen = strlen(p); 7 while (i < sLen && j < pLen) 8 { 9 //①如果j = -1,或者当前字符匹配成功(即S[i] == P[j]),都令i++,j++ 10 if (j == -1 || s[i] == p[j]) 11 { 12 i++; 13 j++; 14 } 15 else 16 { 17 //②如果j != -1,且当前字符匹配失败(即S[i] != P[j]),则令 i 不变,j = next[j] 18 //next[j]即为j所对应的next值 19 j = next[j]; 20 } 21 } 22 if (j == pLen) 23 return i - j; 24 else 25 return -1; 26 }
自己习惯用的模板(可忽略)
1 int Next[1000010]; 2 char str1[1000010]; 3 char str2[1000010]; 4 5 void getnext(char *str) 6 { 7 int len=strlen(str); 8 int j=0; 9 int k=-1; 10 Next[0]=-1; 11 while(j<len) 12 { 13 if(k==-1||str[j]==str[k]) 14 { 15 j++; 16 k++; 17 if(str[j]!=str[k]) 18 { 19 Next[j]=k; 20 } 21 else 22 Next[j]=Next[k]; 23 } 24 else 25 k=Next[k]; 26 } 27 } 28 29 int KMP(char *str1,char *str2) 30 { 31 int len1=strlen(str1); 32 int len2=strlen(str2); 33 int i=0; 34 int j=0; 35 while(i<len1) 36 { 37 if(j==-1||str1[i]==str2[j]) 38 { 39 i++; 40 j++; 41 } 42 else 43 { 44 j=Next[j]; 45 } 46 47 } 48 if(j==len2) 49 return i-j; 50 else 51 return -1; 52 }
觉得模板太长?看看这种方法吧:
https://www.cnblogs.com/eternhope/p/9481643.html
下面是一些入门例题:
HDU-1711 Number Sequence
http://acm.hdu.edu.cn/showproblem.php?pid=1711
Problem Description
Input
Output
Sample Input
2 13 5 1 2 1 2 3 1 2 3 1 3 2 1 2 1 2 3 1 3 13 5 1 2 1 2 3 1 2 3 1 3 2 1 2 1 2 3 2 1
Sample Output
6 -1
题意:查找子串B在字符串A中第一次出现位置
模板题,直接套板子
1 #include2 #include <string.h> 3 #include 4 #include <string> 5 #include 6 #include 7 #include 8 #include 9 #include 10 #include <set> 11 #include
POJ-3461 Oulipo
http://poj.org/problem?id=3461
Description
The French author Georges Perec (1936–1982) once wrote a book, La disparition, without the letter 'e'. He was a member of the Oulipo group. A quote from the book:
Tout avait Pair normal, mais tout s’affirmait faux. Tout avait Fair normal, d’abord, puis surgissait l’inhumain, l’affolant. Il aurait voulu savoir où s’articulait l’association qui l’unissait au roman : stir son tapis, assaillant à tout instant son imagination, l’intuition d’un tabou, la vision d’un mal obscur, d’un quoi vacant, d’un non-dit : la vision, l’avision d’un oubli commandant tout, où s’abolissait la raison : tout avait l’air normal mais…
Perec would probably have scored high (or rather, low) in the following contest. People are asked to write a perhaps even meaningful text on some subject with as few occurrences of a given “word” as possible. Our task is to provide the jury with a program that counts these occurrences, in order to obtain a ranking of the competitors. These competitors often write very long texts with nonsense meaning; a sequence of 500,000 consecutive 'T's is not unusual. And they never use spaces.
So we want to quickly find out how often a word, i.e., a given string, occurs in a text. More formally: given the alphabet {'A', 'B', 'C', …, 'Z'} and two finite strings over that alphabet, a word W and a text T, count the number of occurrences of W in T. All the consecutive characters of W must exactly match consecutive characters of T. Occurrences may overlap.
Input
The first line of the input file contains a single number: the number of test cases to follow. Each test case has the following format:
- One line with the word W, a string over {'A', 'B', 'C', …, 'Z'}, with 1 ≤ |W| ≤ 10,000 (here |W| denotes the length of the string W).
- One line with the text T, a string over {'A', 'B', 'C', …, 'Z'}, with |W| ≤ |T| ≤ 1,000,000.
Output
For every test case in the input file, the output should contain a single number, on a single line: the number of occurrences of the word W in the text T.
Sample Input
3 BAPC BAPC AZA AZAZAZA VERDI AVERDXIVYERDIAN
Sample Output
1 3 0
题意:查找子串A在字符串B中出现的次数
标准KMP是返回子串第一次出现的位置,这个是统计次数,稍微改一下就好了
1 #include2 #include <string.h> 3 #include 4 #include <string> 5 #include 6 using namespace std; 7 int Next[1000010]; 8 char str1[1000010]; 9 char str2[1000010]; 10 11 void getnext(char *str) 12 { 13 int len=strlen(str); 14 int j=0; 15 int k=-1; 16 Next[0]=-1; 17 while(j<len) 18 { 19 if(k==-1||str[j]==str[k]) 20 { 21 j++; 22 k++; 23 if(str[j]!=str[k]) 24 { 25 Next[j]=k; 26 } 27 else 28 Next[j]=Next[k]; 29 } 30 else 31 k=Next[k]; 32 } 33 } 34 35 int KMP(char *str1,char *str2) 36 { 37 int ans=0; 38 int len1=strlen(str1); 39 int len2=strlen(str2); 40 int i=0,j=0; 41 while(i<len1) 42 { 43 if(j==-1||str1[i]==str2[j]) 44 { 45 i++; 46 j++; 47 } 48 else 49 { 50 j=Next[j]; 51 } 52 if(j==len2) 53 { 54 ans++; 55 j=Next[j]; 56 // i=i-j+1; //这两行换上一行会超时 57 // j=0; 58 } 59 } 60 return ans; 61 } 62 63 int main() 64 { 65 int t; 66 scanf("%d",&t); 67 68 while(t--) 69 { 70 scanf("%s %s",str1,str2); 71 getnext(str1); 72 printf("%d\n",KMP(str2,str1)); 73 } 74 return 0; 75 }
先溜了,以后在填坑