POJ 2774(后缀数组的应用)

Long Long Message
Time Limit: 4000MS   Memory Limit: 131072K
Total Submissions: 25926   Accepted: 10560
Case Time Limit: 1000MS

Description

The little cat is majoring in physics in the capital of Byterland. A piece of sad news comes to him these days: his mother is getting ill. Being worried about spending so much on railway tickets (Byterland is such a big country, and he has to spend 16 shours on train to his hometown), he decided only to send SMS with his mother. 

The little cat lives in an unrich family, so he frequently comes to the mobile service center, to check how much money he has spent on SMS. Yesterday, the computer of service center was broken, and printed two very long messages. The brilliant little cat soon found out: 

1. All characters in messages are lowercase Latin letters, without punctuations and spaces. 
2. All SMS has been appended to each other – (i+1)-th SMS comes directly after the i-th one – that is why those two messages are quite long. 
3. His own SMS has been appended together, but possibly a great many redundancy characters appear leftwards and rightwards due to the broken computer. 
E.g: if his SMS is “motheriloveyou”, either long message printed by that machine, would possibly be one of “hahamotheriloveyou”, “motheriloveyoureally”, “motheriloveyouornot”, “bbbmotheriloveyouaaa”, etc. 
4. For these broken issues, the little cat has printed his original text twice (so there appears two very long messages). Even though the original text remains the same in two printed messages, the redundancy characters on both sides would be possibly different. 

You are given those two very long messages, and you have to output the length of the longest possible original text written by the little cat. 

Background: 
The SMS in Byterland mobile service are charging in dollars-per-byte. That is why the little cat is worrying about how long could the longest original text be. 

Why ask you to write a program? There are four resions: 
1. The little cat is so busy these days with physics lessons; 
2. The little cat wants to keep what he said to his mother seceret; 
3. POJ is such a great Online Judge; 
4. The little cat wants to earn some money from POJ, and try to persuade his mother to see the doctor :( 

Input

Two strings with lowercase letters on two of the input lines individually. Number of characters in each one will never exceed 100000.

Output

A single line with a single integer number – what is the maximum length of the original text written by the little cat.

Sample Input

yeshowmuchiloveyoumydearmotherreallyicannotbelieveit
yeaphowmuchiloveyoumydearmother

Sample Output

27

Source

POJ Monthly--2006.03.26,Zeyuan Zhu,"Dedicate to my great beloved mother."





题意:求2个字符串的最长公共子串



题解:第一次使用后缀数组,这一题是将2个字符串合并为1个字符串,使用分隔符将2个数组分隔开,先求出后缀数组,然后通过后缀数组就能求出所有后缀的LCP(最长公共前缀),找出最长的符合题意的LCP值,就是答案啦


http://blog.sina.com.cn/s/blog_6635898a0102duef.html

 那么是不是所有的 height 值中的最大值就是答案呢?不一定!有可能这两个后缀是在同一个字符串中的,所以实际上只有当suffix(sa[i-1])和suffix(sa[i]) 不是同一个字符串中的两个后缀时,height[i]才是满足条件的。而这其中的最大值就是答案。记字符串 A 和字符串 B 的长度分别为|A|和|B|。求新的字符串的后缀数组和 height 数组的时间是 O(|A|+|B|) ,然后求排名相邻 但原来不在同一个字符串中的两个后缀的height值的最大值,时间也是O(|A|+|B|),所以整个做法的时间复杂度为 O(|A|+|B|) 。时间复杂度已经取到下限,由此看出,这是一个非常优秀的算法。



#include<cstdio>  
#include<cstring>  
#include<cstdlib>  
#include<cmath>  
#include<iostream>  
#include<algorithm>  
#include<vector>  
#include<map>  
#include<set>  
#include<queue>  
#include<string>  
#include<bitset>  
#include<utility>  
#include<functional>  
#include<iomanip>  
#include<sstream>  
#include<ctime>  
using namespace std;  
  
#define N int(5e5)  
#define inf int(0x3f3f3f3f)  
#define mod int(1e9+7)  
typedef long long LL;  
  
  
#ifdef CDZSC  
#define debug(...) fprintf(stderr, __VA_ARGS__)  
#else  
#define debug(...)   
#endif  
  
char s[N];
int sa[N],t[N],t2[N],c[N];

void build_sa(int n,int m)//n表示字符串的长度,m表示基数,用于m基数排序,数字的上限
{
	int i,*x=t,*y=t2;
	for(i=0;i<m;i++)c[i]=0;
	for(i=0;i<n;i++)c[x[i]=s[i]]++;
	for(i=1;i<m;i++)c[i]+=c[i-1];
	for(i=n-1;i>=0;i--)sa[--c[x[i]]]=i;
	for(int k=1;k<=n;k<<=1)
	{
		int p=0;
		for(i=n-k;i<n;i++)y[p++]=i;
		for(i=0;i<n;i++)if(sa[i]>=k)y[p++]=sa[i]-k;
		
		for(i=0;i<m;i++)c[i]=0;
		for(i=0;i<n;i++)c[x[y[i]]]++;
		for(i=1;i<m;i++)c[i]+=c[i-1];
		for(i=n-1;i>=0;i--)sa[--c[x[y[i]]]]=y[i];
		swap(x,y);
		p=1;x[sa[0]]=0;
		for(i=1;i<n;i++)
			x[sa[i]]=y[sa[i-1]]==y[sa[i]]&&y[sa[i-1]+k]==y[sa[i]+k]?p-1:p++;
		if(p>=n)break;
		m=p;
	}
}

int Rank[N],height[N];
void gethight(int n)//n表示字符串的长度
{
	int i,j,k=0;
	for(int i=0;i<n;i++)Rank[sa[i]]=i;
	for(int i=0;i<n;i++)
	{
		if(k)k--;
		else k=0;
		int j=sa[Rank[i]-1];
		while(s[i+k]==s[j+k])k++;
		height[Rank[i]]=k;
	}
}

char s1[N],s2[N];
int main()  
{  
#ifdef CDZSC  
    freopen("i.txt", "r", stdin);  
    //freopen("o.txt","w",stdout);  
    int _time_jc = clock();  
#endif  
	while(~scanf("%s%s",s1,s2))
	{
		int pos=0;
		for(int i=0;s1[i];i++)s[pos++]=s1[i]-'a'+1;
		s[pos++]=28;//分隔符ascii码=28
		for(int i=0;s2[i];i++)s[pos++]=s2[i]-'a'+1;
		s[pos++]=0;
		///将2个字符串合并为一个串,并加上分隔符ascii码:28
		build_sa(pos,29);//求后缀数组
		gethight(pos);//求LCP
		int ans=0,len=strlen(s1);
		for(int i=2;s[i];i++)
		{
			if(height[i]>ans)
			{
				if(0<=sa[i-1]&&sa[i-1]<len&&len<sa[i])
					ans=height[i];
				if(0<=sa[i]&&sa[i]<len&&len<sa[i-1])
					ans=height[i];
				/* 那么是不是所有的 height 值中的最大值就是答案呢?不一定!有可能这两个后缀是在同一个字符串中的,
				所以实际上只有当suffix(sa[i-1])和suffix(sa[i]) 不是同一个字符串中的两个后缀时,height[i]才是满足
				条件的。而这其中的最大值就是答案。记字符串 A 和字符串 B 的长度分别为|A|和|B|。求新的字符串的后缀
				数组和 height 数组的时间是 O(|A|+|B|) ,然后求排名相邻 但原来不在同一个字符串中的两个后缀的height
				值的最大值,时间也是O(|A|+|B|),所以整个做法的时间复杂度为 O(|A|+|B|) 。时间复杂度已经取到下限,
				由此看出,这是一个非常优秀的算法。
				原文链接:http://blog.sina.com.cn/s/blog_6635898a0102duef.html
				*/
			}
		}
		printf("%d\n",ans);
	}
#ifdef CDZSC  
    debug("time: %d\n", int(clock() - _time_jc));  
#endif  
    return 0;  
}  





你可能感兴趣的:(POJ 2774(后缀数组的应用))