KMP匹配(模板)

先粘上我入门KMP时看的大佬的博客:orz orz

从头到尾彻底理解KMP

我觉得这篇已经讲的很详细了,希望大家能坚持看下去。


 

步骤

①寻找前缀后缀最长公共元素长度
对于P = p0 p1 ...pj-1 pj,寻找模式串P中长度最大且相等的前缀和后缀。如果存在p0 p1 ...pk-1 pk = pj- k pj-k+1...pj-1 pj,那么在包含pj的模式串中有最大长度为k+1的相同前缀后缀。

举个例子,如果给定的模式串为“abab”,那么它的各个子串的前缀后缀的公共元素的最大长度如下表格所示:

比如对于字符串aba来说,它有长度为1的相同前缀后缀a;而对于字符串abab来说,它有长度为2的相同前缀后缀ab(相同前缀后缀的长度为k + 1,k + 1 = 2)。

②求next数组
next 数组考虑的是除当前字符外的最长相同前缀后缀,所以通过第①步骤求得各个前缀后缀的公共元素的最大长度后,只要稍作变形即可:将第①步骤中求得的值整体右移一位,然后初值赋为-1,如下表格所示:

比如对于aba来说,第3个字符a之前的字符串ab中有长度为0的相同前缀后缀,所以第3个字符a对应的next值为0;而对于abab来说,第4个字符b之前的字符串aba中有长度为1的相同前缀后缀a,所以第4个字符b对应的next值为1(相同前缀后缀的长度为k,k = 1)。

③根据next数组进行匹配
匹配失配,j = next [j],模式串向右移动的位数为:j - next[j]。换言之,当模式串的后缀pj-k pj-k+1, ..., pj-1 跟文本串si-k si-k+1, ..., si-1匹配成功,但pj 跟si匹配失败时,因为next[j] = k,相当于在不包含pj的模式串中有最大长度为k 的相同前缀后缀,即p0 p1 ...pk-1 = pj-k pj-k+1...pj-1,故令j = next[j],从而让模式串右移j - next[j] 位,使得模式串的前缀p0 p1, ..., pk-1对应着文本串 si-k si-k+1, ..., si-1,而后让pk 跟si 继续匹配。如下图所示:

KMP匹配(模板)_第1张图片

 

 

综上,KMP的next 数组相当于告诉我们:

当模式串中的某个字符跟文本串中的某个字符匹配失配时,模式串下一步应该跳到哪个位置。

如模式串中在j 处的字符跟文本串在i 处的字符匹配失配时,下一步用next [j] 处的字符继续跟文本串i 处的字符匹配,相当于模式串向右移动 j - next[j] 位。

 

 

 


 

下面给出模板:

next数组的求法:

 1 void GetNext(char* p,int next[])
 2 {
 3     int pLen = strlen(p);
 4     next[0] = -1;
 5     int k = -1;
 6     int j = 0;
 7     while (j < pLen - 1)
 8     {
 9         //p[k]表示前缀,p[j]表示后缀
10         if (k == -1 || p[j] == p[k]) 
11         {
12             ++k;
13             ++j;
14             next[j] = k;
15         }
16         else 
17         {
18             k = next[k];
19         }
20     }
21 }

求next数组的改进版:

 1 //优化过后的next 数组求法
 2 void GetNextval(char* p, int next[])
 3 {
 4     int pLen = strlen(p);
 5     next[0] = -1;
 6     int k = -1;
 7     int j = 0;
 8     while (j < pLen - 1)
 9     {
10         //p[k]表示前缀,p[j]表示后缀  
11         if (k == -1 || p[j] == p[k])
12         {
13             ++j;
14             ++k;
15             //较之前next数组求法,改动在下面4行
16             if (p[j] != p[k])
17                 next[j] = k;   //之前只有这一行
18             else
19                 //因为不能出现p[j] = p[ next[j ]],所以当出现时需要继续递归,k = next[k] = next[next[k]]
20                 next[j] = next[k];
21         }
22         else
23         {
24             k = next[k];
25         }
26     }
27 }

KMP算法:

 1 int KmpSearch(char* s, char* p)
 2 {
 3     int i = 0;
 4     int j = 0;
 5     int sLen = strlen(s);
 6     int pLen = strlen(p);
 7     while (i < sLen && j < pLen)
 8     {
 9         //①如果j = -1,或者当前字符匹配成功(即S[i] == P[j]),都令i++,j++    
10         if (j == -1 || s[i] == p[j])
11         {
12             i++;
13             j++;
14         }
15         else
16         {
17             //②如果j != -1,且当前字符匹配失败(即S[i] != P[j]),则令 i 不变,j = next[j]    
18             //next[j]即为j所对应的next值      
19             j = next[j];
20         }
21     }
22     if (j == pLen)
23         return i - j;
24     else
25         return -1;
26 }

 

 

自己习惯用的模板(可忽略)

 1 int Next[1000010];
 2 char str1[1000010];
 3 char str2[1000010];
 4 
 5 void getnext(char *str)
 6 {
 7     int len=strlen(str);
 8     int j=0;
 9     int k=-1;
10     Next[0]=-1;
11     while(j<len)
12     {
13         if(k==-1||str[j]==str[k])
14         {
15             j++;
16             k++;
17             if(str[j]!=str[k])
18             {
19                 Next[j]=k;
20             }
21             else
22                 Next[j]=Next[k];
23         }
24         else
25             k=Next[k];
26     }
27 }
28 
29 int KMP(char *str1,char *str2)
30 {
31     int len1=strlen(str1);
32     int len2=strlen(str2);
33     int i=0;
34     int j=0;
35     while(i<len1)
36     {
37         if(j==-1||str1[i]==str2[j])
38         {
39             i++;
40             j++;
41         }
42         else
43         {
44             j=Next[j];
45         }
46 
47     }
48     if(j==len2)
49         return  i-j;
50     else
51         return -1;
52 }

 

 

觉得模板太长?看看这种方法吧:

https://www.cnblogs.com/eternhope/p/9481643.html

 

下面是一些入门例题:


 HDU-1711 Number Sequence

http://acm.hdu.edu.cn/showproblem.php?pid=1711

Problem Description

Given two sequences of numbers : a[1], a[2], ...... , a[N], and b[1], b[2], ...... , b[M] (1 <= M <= 10000, 1 <= N <= 1000000). Your task is to find a number K which make a[K] = b[1], a[K + 1] = b[2], ...... , a[K + M - 1] = b[M]. If there are more than one K exist, output the smallest one.

Input

The first line of input is a number T which indicate the number of cases. Each case contains three lines. The first line is two numbers N and M (1 <= M <= 10000, 1 <= N <= 1000000). The second line contains N integers which indicate a[1], a[2], ...... , a[N]. The third line contains M integers which indicate b[1], b[2], ...... , b[M]. All integers are in the range of [-1000000, 1000000].

Output

For each test case, you should output one line which only contain K described above. If no such K exists, output -1 instead.

Sample Input

2
13 5
1 2 1 2 3 1 2 3 1 3 2 1 2
1 2 3 1 3
13 5
1 2 1 2 3 1 2 3 1 3 2 1 2
1 2 3 2 1

Sample Output

6
-1

 

题意:查找子串B在字符串A中第一次出现位置

模板题,直接套板子

 1 #include 
 2 #include <string.h>
 3 #include 
 4 #include <string>
 5 #include 
 6 #include 
 7 #include 
 8 #include 
 9 #include 
10 #include <set>
11 #include 
12 #include 
13 const int INF=0x3f3f3f3f;
14 typedef long long LL;
15 const int mod=1e9+7;
16 const int maxn=1e6+10;
17 const int maxm=1e4+10;
18 using namespace std;
19 
20 int Next[maxm];
21 int n,m;
22 int A1[maxn];
23 int A2[maxm];
24 
25 void Get_Next()
26 {
27     int j=1;
28     int k=0;
29     Next[1]=0;
30     while(j<m)
31     {
32         if(k==0||A2[j]==A2[k])
33         {
34             j++;
35             k++;
36             if(A2[j]!=A2[k])
37                 Next[j]=k;
38             else
39                 Next[j]=Next[k];
40         }
41         else
42             k=Next[k];
43     }
44 }
45 
46 int KMP()
47 {
48     int i=1;
49     int j=1;
50     while(i<=n&&j<=m)
51     {
52         if(j==0||A1[i]==A2[j])
53         {
54             i++;
55             j++;
56         }
57         else
58             j=Next[j];
59     }
60     if(j==m+1)
61         return i-j+1;
62     else
63         return -1;
64 }
65 
66 int main()
67 {
68     int T;
69     scanf("%d",&T);
70     while(T--)
71     {
72         scanf("%d %d",&n,&m);
73         for(int i=1;i<=n;i++)
74         {
75             scanf("%d",&A1[i]);
76         }
77         for(int i=1;i<=m;i++)
78         {
79             scanf("%d",&A2[i]);
80         }
81         Get_Next();
82         printf("%d\n",KMP());
83     }
84     return 0;
85 }

 

 

 

POJ-3461 Oulipo

http://poj.org/problem?id=3461

Description

The French author Georges Perec (1936–1982) once wrote a book, La disparition, without the letter 'e'. He was a member of the Oulipo group. A quote from the book:

Tout avait Pair normal, mais tout s’affirmait faux. Tout avait Fair normal, d’abord, puis surgissait l’inhumain, l’affolant. Il aurait voulu savoir où s’articulait l’association qui l’unissait au roman : stir son tapis, assaillant à tout instant son imagination, l’intuition d’un tabou, la vision d’un mal obscur, d’un quoi vacant, d’un non-dit : la vision, l’avision d’un oubli commandant tout, où s’abolissait la raison : tout avait l’air normal mais…

Perec would probably have scored high (or rather, low) in the following contest. People are asked to write a perhaps even meaningful text on some subject with as few occurrences of a given “word” as possible. Our task is to provide the jury with a program that counts these occurrences, in order to obtain a ranking of the competitors. These competitors often write very long texts with nonsense meaning; a sequence of 500,000 consecutive 'T's is not unusual. And they never use spaces.

So we want to quickly find out how often a word, i.e., a given string, occurs in a text. More formally: given the alphabet {'A''B''C', …, 'Z'} and two finite strings over that alphabet, a word W and a text T, count the number of occurrences of W in T. All the consecutive characters of W must exactly match consecutive characters of T. Occurrences may overlap.

Input

The first line of the input file contains a single number: the number of test cases to follow. Each test case has the following format:

  • One line with the word W, a string over {'A''B''C', …, 'Z'}, with 1 ≤ |W| ≤ 10,000 (here |W| denotes the length of the string W).
  • One line with the text T, a string over {'A''B''C', …, 'Z'}, with |W| ≤ |T| ≤ 1,000,000.

Output

For every test case in the input file, the output should contain a single number, on a single line: the number of occurrences of the word W in the text T.

Sample Input

3
BAPC
BAPC
AZA
AZAZAZA
VERDI
AVERDXIVYERDIAN

Sample Output

1
3
0

 

题意:查找子串A在字符串B中出现的次数

标准KMP是返回子串第一次出现的位置,这个是统计次数,稍微改一下就好了

 1 #include 
 2 #include <string.h>
 3 #include 
 4 #include <string>
 5 #include 
 6 using namespace std;
 7 int Next[1000010];
 8 char str1[1000010];
 9 char str2[1000010];
10 
11 void getnext(char *str)
12 {
13     int len=strlen(str);
14     int j=0;
15     int k=-1;
16     Next[0]=-1;
17     while(j<len)
18     {
19         if(k==-1||str[j]==str[k])
20         {
21             j++;
22             k++;
23             if(str[j]!=str[k])
24             {
25                 Next[j]=k;
26             }
27             else
28                 Next[j]=Next[k];
29         }
30         else
31             k=Next[k];
32     }
33 }
34 
35 int KMP(char *str1,char *str2)
36 {
37     int ans=0;
38     int len1=strlen(str1);
39     int len2=strlen(str2);
40     int i=0,j=0;
41     while(i<len1)
42     {
43         if(j==-1||str1[i]==str2[j])
44         {
45             i++;
46             j++;
47         }
48         else
49         {
50             j=Next[j];
51         }
52         if(j==len2)
53         {
54             ans++;
55             j=Next[j];
56 //            i=i-j+1; //这两行换上一行会超时 
57 //            j=0; 
58         } 
59     }
60     return ans;
61 }
62 
63 int main()
64 {
65     int t;
66     scanf("%d",&t);
67 
68     while(t--)
69     {
70         scanf("%s %s",str1,str2);
71         getnext(str1);
72         printf("%d\n",KMP(str2,str1));
73     }
74     return 0;
75 }

 

 

 

先溜了,以后在填坑

 

转载于:https://www.cnblogs.com/jiamian/p/11243519.html

你可能感兴趣的:(KMP匹配(模板))