POJ3693 Maximum repetition substring 后缀数组

Maximum repetition substring

Time Limit: 1000MS Memory Limit: 65536K

Total Submissions: 4671 Accepted: 1381


Description

The repetition number of a string is defined as the maximum number R such that the string can be partitioned into R same consecutive substrings. For example, the repetition number of "ababab" is 3 and "ababa" is 1.

Given a string containing lowercase letters, you are to find a substring of it with maximum repetition number.

Input

The input consists of multiple test cases. Each test case contains exactly one line, which
gives a non-empty string consisting of lowercase letters. The length of the string will not be greater than 100,000.

The last test case is followed by a line containing a '#'.

Output

For each test case, print a line containing the test case number( beginning with 1) followed by the substring of maximum repetition number. If there are multiple substrings of maximum repetition number, print the lexicographically smallest one.

Sample Input
ccabababc
daabbccaa
#

Sample Output
Case 1: ababab
Case 2: aa

Source

2008 Asia Hefei Regional Contest Online by USTC

--------------------

最近在学习后缀数组,这道题写了好久。。。

开始时瞥了一眼别人代码,发现就是暴力搞,枚举重复子串的起点和长度,于是果断TLE。。。

后来发现,人枚举起点时,只枚举了长度的整数倍,算了算,这样就是O(nlogn)了,觉得好神奇啊。。。

另外这道题数据太弱,导致错误的算法也能过,如下:

int main(){
	int i, j, k, t, n, ans, pos, len, cas;
	cas = 0;
	while(scanf("%s", str) != EOF && str[0] != '#'){
		for (i = 0; str[i]; i++){
			s.r[i] = str[i] - 'a' + 1;
		}
		s.r[i] = 0;
		s.n = i;
		
		s.getsa(30);
		s.getheight();
		s.initRMQ();

		ans = len = 1;
		pos = 0;
		for (i = 1; i < s.n; i++){
			if (ans == 1 && s.r[i] < s.r[pos]) pos = i;
			t = s.height[i];
			for (j = i; j < s.n && t && s.height[j] >= t; j++){
				k = s.sa[i - 1] - s.sa[j];
				if (k < 0) k = -k;
//				printf("%d - %d, len = %d, t = %d\n", s.sa[i - 1], s.sa[j], k, t);
				if ((t + k) / k > ans){
					ans = (t + k) / k;
					if (s.sa[i - 1] < s.sa[j])
						pos = s.sa[i - 1];
					else pos = s.sa[j];
					len = k;
				}
			}
		}
		printf("Case %d: ", ++cas);
		for (i = 0; i < ans * len; i++)
			printf("%c", s.r[i + pos] + 'a' - 1);
		printf("\n");
	}
	return 0;
}
大致想法是:

设以i为开头,重复的子串长为k,j = i + k,lcp(i,j)/k 就是重复次数。
suffix[i]和suffix[j]肯定非常相似,
因为lcp(i,j)= min{height[rank[i]]..height[rank[j]]}(假设rank[i] < rank[j])
所以lcp(i,j)最大 = height[rank[i+1]];
这里用到个结论,lcp(i,j)一定等于height[rank[i+1]];
当然这是有问题的!!!
反例:abcabcpabcabd


靠谱的解法,首先枚举重复子串长度k,想想把原串分成n/k段,每段长为k,如果要寻找的子串长为L,重复了L/k次,则这个子串至少会经过L/k次划分边界,或者说子串完全覆盖的那几个段,一定是相等。即每次只枚举划分的边界i = p*k,p=0,1,2...如果lcp(i,i+k)>0,则从i往前找被截断的部分,同时注意取字典序最小的,好在rank已经把字典序排出来了。对于每个k,只枚举n/k遍,总的是nlogn的时间复杂度。当然里面还有“往前找”,但实际花费不了多少时间,如果实在担心,可以维护一个前缀数组(把串倒过来,后缀数组一下),在维护个rank的RMQ,就可以比较快的往前找了,好麻烦哈。。膜拜“学姐”。。

int main(){
	int i, j, k, p, t, n, ans, pos, len, cas;
	cas = 0;
	while(scanf("%s", str) != EOF && str[0] != '#'){
		ans = len = 1;
		pos = 0;
		for (i = 0; str[i]; i++){
			s.r[i] = str[i] - 'a' + 1;
			if (s.r[pos] > s.r[i]) pos = i;
		}
		s.r[i] = 0;
		s.n = i;
		
		s.getsa(30);
		s.getheight();
		s.initRMQ();

		for (k = 1; k <= s.n / 2; k++){
			for (p = 0; p + k < s.n; p += k){
				i = p;
				j = i + k;
				t = s.lcp(i, j);
				for (; i >= 0 && j >= 0 && s.r[i] == s.r[j]; i--, j--, t++){
//				printf("%d - %d, len = %d, t = %d\n", i, j, k, t);
					if (t >= k && ((t + k) / k > ans
						|| ((t + k) / k == ans 
						    && s.rank[i] < s.rank[pos]))){
						ans = (t + k) / k;
						pos = i;
						len = k;
					}
				}
			}
		}
		printf("Case %d: ", ++cas);
		for (i = 0; i < ans * len; i++)
			printf("%c", s.r[i + pos] + 'a' - 1);
		printf("\n");
	}
	return 0;
}

贡献几个测试用例

abababjklpabababjklq

bbabba

abcabcpabcabd

babbabb

ba

ab

zzbaba

aabzbz

至于答案,自己算吧。。也不长

你可能感兴趣的:(POJ3693 Maximum repetition substring 后缀数组)