题目来源:http://poj.org/problem?id=3461
Time Limit: 1000MS |
Memory Limit: 65536K |
|
Total Submissions: 49522 |
Accepted: 19665 |
Description
The French author Georges Perec (1936–1982) once wrote a book, La disparition, without the letter 'e'. He was a member of the Oulipo group. A quote from the book:
Tout avait Pair normal, mais tout s’affirmait faux. Tout avait Fair normal, d’abord, puis surgissait l’inhumain, l’affolant. Il aurait voulu savoir où s’articulait l’association qui l’unissait au roman : stir son tapis, assaillant à tout instant son imagination, l’intuition d’un tabou, la vision d’un mal obscur, d’un quoi vacant, d’un non-dit : la vision, l’avision d’un oubli commandant tout, où s’abolissait la raison : tout avait l’air normal mais…
Perec would probably have scored high (or rather, low) in the following contest. People are asked to write a perhaps even meaningful text on some subject with as few occurrences of a given “word” as possible. Our task is to provide the jury with a program that counts these occurrences, in order to obtain a ranking of the competitors. These competitors often write very long texts with nonsense meaning; a sequence of 500,000 consecutive 'T's is not unusual. And they never use spaces.
So we want to quickly find out how often a word, i.e., a given string, occurs in a text. More formally: given the alphabet {'A', 'B', 'C', …, 'Z'} and two finite strings over that alphabet, a word W and a text T, count the number of occurrences of W in T. All the consecutive characters of W must exactly match consecutive characters of T. Occurrences may overlap.
Input
The first line of the input file contains a single number: the number of test cases to follow. Each test case has the following format:
Output
For every test case in the input file, the output should contain a single number, on a single line: the number of occurrences of the word W in the text T.
Sample Input
3
BAPC
BAPC
AZA
AZAZAZA
VERDI
AVERDXIVYERDIAN
Sample Output
1
3
0
------------------------------------------------------------
给定模式串WORD和目标串TEXT, 计算WORD在TEXT中出现的次数。
------------------------------------------------------------
KMP算法的应用。记WORD的长度为m,TEXT的长度为n, 暴力法的时间复杂度为O(nm), KMP算法可将复杂度降至O(n+m).
KMP算法的总体思想是让遍历TEXT的指针不回溯,从而保证关于n的线性复杂度。
为了实现TEXT指针j不回溯这一点,则只能要求WORD指针i回溯。现在问题在于,每次WORD[i]与TEXT[j]匹配失败后,i应该回溯到哪里呢?答案是回溯到k处,k等于字符串WORD[0:(j-1)]公共前缀后缀的最大长度。例如,字符串”aba”对应的最大公共前缀后缀是”a”, 字符串”abaab”对应的最大公共前缀后缀是”ab”,字符串”aaaa”对应的最大公共前缀后缀是”aaa”(注意前缀和后缀不能是字符串本身)。由于后缀WORD[(j-k):(j-1)] == TEXT[(i-k):(i-1)],则前缀WORD[0:(k-1)] == TEXT[(i-k):(i-1)],接下来只要继续比较WORD[k]与TEXT[i]是否相同即可。
将WORD数组的最大公共前缀后缀信息保存在数组next中,next[j]表示WORD[0:(j-1)]的最大共同前缀后缀长,并定义next[0] = -1表示当WORD的第一个字符与TEXT[i]不匹配时需要TEXT指针i后移。从而KMP算法的程序分为两个部分,一部分计算next数组,另一部分遍历TEXT数组计算匹配个数。
遍历TEXT数组计算匹配个数:
int kmp()
{
int cnt = 0, i = 0, j = 0, word_len = strlen(word);
while (j < strlen(text))
{
if (i == -1 || word[i] == text[j])
{
i++;
j++;
}
else
{
i = next[i];
}
if (i == word_len)
{
cnt ++;
i = next[i];
}
}
return cnt;
}
该部分复杂度为O(n),因为两次执行j++之间每次执行i = next[i]的次数不会超过上次j++连续执行的次数,因此while循环的执行次数上界为两倍的j++执行次数,即2n.
计算next数组:
void cal_next() // 计算word字符串的最大共同前缀后缀长数组, next[i]: word[0:i-1]的最大共同前缀后缀长
{
next[0] = -1;
int i = -1, j = 0;
while (j < strlen(word))
{
if (i == -1 || word[i] == word[j])
{
next[++j] = ++i;
}
else
{
i = next[i];
}
}
}
该部分复杂度为O(m),分析同上。
------------------------------------------------------------
#include
#include
const int WMAX = 10005, TMAX = 1000005;
char word[WMAX] = {};
char text[TMAX] = {};
int next[WMAX] = {};
void cal_next() // 计算word字符串的最大共同前缀后缀长数组, next[i]: word[0:i-1]的最大共同前缀后缀长
{
next[0] = -1;
int i = -1, j = 0;
while (j < strlen(word))
{
if (i == -1 || word[i] == word[j])
{
next[++j] = ++i;
}
else
{
i = next[i];
}
}
}
int kmp()
{
int cnt = 0, i = 0, j = 0, word_len = strlen(word);
while (j < strlen(text))
{
if (i == -1 || word[i] == text[j])
{
i++;
j++;
}
else
{
i = next[i];
}
if (i == word_len)
{
cnt ++;
i = next[i];
}
}
return cnt;
}
int main()
{
#ifndef ONLINE_JUDGE
freopen("3461.txt", "r", stdin);
#endif
int t;
scanf("%d", &t);
while (t--)
{
scanf("%s%s", word, text);
cal_next();
printf("%d\n", kmp());
}
return 0;
}