POJ 3461 Oulipo(字符串匹配,KMP算法)

题目来源:http://poj.org/problem?id=3461

Oulipo

Time Limit: 1000MS

 

Memory Limit: 65536K

Total Submissions: 49522

 

Accepted: 19665

Description

The French author Georges Perec (1936–1982) once wrote a book, La disparition, without the letter 'e'. He was a member of the Oulipo group. A quote from the book:

Tout avait Pair normal, mais tout s’affirmait faux. Tout avait Fair normal, d’abord, puis surgissait l’inhumain, l’affolant. Il aurait voulu savoir où s’articulait l’association qui l’unissait au roman : stir son tapis, assaillant à tout instant son imagination, l’intuition d’un tabou, la vision d’un mal obscur, d’un quoi vacant, d’un non-dit : la vision, l’avision d’un oubli commandant tout, où s’abolissait la raison : tout avait l’air normal mais…

Perec would probably have scored high (or rather, low) in the following contest. People are asked to write a perhaps even meaningful text on some subject with as few occurrences of a given “word” as possible. Our task is to provide the jury with a program that counts these occurrences, in order to obtain a ranking of the competitors. These competitors often write very long texts with nonsense meaning; a sequence of 500,000 consecutive 'T's is not unusual. And they never use spaces.

So we want to quickly find out how often a word, i.e., a given string, occurs in a text. More formally: given the alphabet {'A''B''C', …, 'Z'} and two finite strings over that alphabet, a word W and a text T, count the number of occurrences of W in T. All the consecutive characters of W must exactly match consecutive characters of T. Occurrences may overlap.

Input

The first line of the input file contains a single number: the number of test cases to follow. Each test case has the following format:

  • One line with the word W, a string over {'A', 'B', 'C', …, 'Z'}, with 1 ≤ |W| ≤ 10,000 (here |W| denotes the length of the string W).
  • One line with the text T, a string over {'A', 'B', 'C', …, 'Z'}, with |W| ≤ |T| ≤ 1,000,000.

Output

For every test case in the input file, the output should contain a single number, on a single line: the number of occurrences of the word W in the text T.

Sample Input

3

BAPC

BAPC

AZA

AZAZAZA

VERDI

AVERDXIVYERDIAN

Sample Output

1

3

0

------------------------------------------------------------

题意

给定模式串WORD和目标串TEXT, 计算WORD在TEXT中出现的次数。

------------------------------------------------------------

思路

KMP算法的应用。记WORD的长度为m,TEXT的长度为n, 暴力法的时间复杂度为O(nm), KMP算法可将复杂度降至O(n+m).

KMP算法的总体思想是让遍历TEXT的指针不回溯,从而保证关于n的线性复杂度。

为了实现TEXT指针j不回溯这一点,则只能要求WORD指针i回溯。现在问题在于,每次WORD[i]与TEXT[j]匹配失败后,i应该回溯到哪里呢?答案是回溯到k处,k等于字符串WORD[0:(j-1)]公共前缀后缀的最大长度。例如,字符串”aba”对应的最大公共前缀后缀是”a”, 字符串”abaab”对应的最大公共前缀后缀是”ab”,字符串”aaaa”对应的最大公共前缀后缀是”aaa”(注意前缀和后缀不能是字符串本身)。由于后缀WORD[(j-k):(j-1)] == TEXT[(i-k):(i-1)],则前缀WORD[0:(k-1)] == TEXT[(i-k):(i-1)],接下来只要继续比较WORD[k]与TEXT[i]是否相同即可。

将WORD数组的最大公共前缀后缀信息保存在数组next中,next[j]表示WORD[0:(j-1)]的最大共同前缀后缀长,并定义next[0] = -1表示当WORD的第一个字符与TEXT[i]不匹配时需要TEXT指针i后移。从而KMP算法的程序分为两个部分,一部分计算next数组,另一部分遍历TEXT数组计算匹配个数。

遍历TEXT数组计算匹配个数:

int kmp()
{
	int cnt = 0, i = 0, j = 0, word_len = strlen(word);
	while (j < strlen(text))
	{
		if (i == -1 || word[i] == text[j])
		{
			i++;
			j++;
		}
		else
		{
			i = next[i];
		}
		if (i == word_len)
		{
			cnt ++;
			i = next[i];
		}
	}
	return cnt;
}

该部分复杂度为O(n),因为两次执行j++之间每次执行i = next[i]的次数不会超过上次j++连续执行的次数,因此while循环的执行次数上界为两倍的j++执行次数,即2n.

计算next数组:

void cal_next()			// 计算word字符串的最大共同前缀后缀长数组, next[i]: word[0:i-1]的最大共同前缀后缀长
{
	next[0] = -1;
	int i = -1, j = 0;
	while (j < strlen(word))
	{
		if (i == -1 || word[i] == word[j])
		{
			next[++j] = ++i;
		}
		else
		{
			i = next[i];
		}
	}
}

该部分复杂度为O(m),分析同上。

------------------------------------------------------------

代码

#include
#include

const int WMAX = 10005, TMAX = 1000005;
char word[WMAX] = {};
char text[TMAX] = {};
int next[WMAX] = {};

void cal_next()			// 计算word字符串的最大共同前缀后缀长数组, next[i]: word[0:i-1]的最大共同前缀后缀长
{
	next[0] = -1;
	int i = -1, j = 0;
	while (j < strlen(word))
	{
		if (i == -1 || word[i] == word[j])
		{
			next[++j] = ++i;
		}
		else
		{
			i = next[i];
		}
	}
}

int kmp()
{
	int cnt = 0, i = 0, j = 0, word_len = strlen(word);
	while (j < strlen(text))
	{
		if (i == -1 || word[i] == text[j])
		{
			i++;
			j++;
		}
		else
		{
			i = next[i];
		}
		if (i == word_len)
		{
			cnt ++;
			i = next[i];
		}
	}
	return cnt;
}

int main()
{
#ifndef ONLINE_JUDGE
	freopen("3461.txt", "r", stdin);
#endif
	int t;
	scanf("%d", &t);
	while (t--)
	{
		scanf("%s%s", word, text);
		cal_next();
		printf("%d\n", kmp());
	}
	return 0;
}

 

你可能感兴趣的:(百练OJ/poj,基础算法)