POJ-3294 后缀数组 求大于k/2个字符串中的最长子串(对height分组)

Life Forms

Time Limit: 5000MS   Memory Limit: 65536K
Total Submissions: 18828   Accepted: 5546

Description

You may have wondered why most extraterrestrial life forms resemble humans, differing by superficial traits such as height, colour, wrinkles, ears, eyebrows and the like. A few bear no human resemblance; these typically have geometric or amorphous shapes like cubes, oil slicks or clouds of dust.

The answer is given in the 146th episode of Star Trek - The Next Generation, titled The Chase. It turns out that in the vast majority of the quadrant's life forms ended up with a large fragment of common DNA.

Given the DNA sequences of several life forms represented as strings of letters, you are to find the longest substring that is shared by more than half of them.

Input

Standard input contains several test cases. Each test case begins with 1 ≤ n ≤ 100, the number of life forms. n lines follow; each contains a string of lower case letters representing the DNA sequence of a life form. Each DNA sequence contains at least one and not more than 1000 letters. A line containing 0 follows the last test case.

Output

For each test case, output the longest string or strings shared by more than half of the life forms. If there are many, output all of them in alphabetical order. If there is no solution with at least one letter, output "?". Leave an empty line between test cases.

Sample Input

3
abcdefg
bcdefgh
cdefghi
3
xxx
yyy
zzz
0

Sample Output

bcdefg
cdefgh

?

题意:给定n个字符串,求出现在不小于k/2个字符串中的最长子串。

分析:链接所有的字符串,套后缀数组模板求出height,然后二分答案,对height分组(分组为height大于等于mid 和height小于mid两组)取height[i]>=mid的,用vis[]数组做标记。如果连续公共子串长超过mid的个数出现在大于k/2个串中,则记录在ans [] 数组中。 用ans[0] 记录答案串的个数。

反思:最近在做后缀数组的题,上一道做的是求n个串中的最长公共子串,这个是求大于k/2个字符串中的最长子串。做完发现两道题代码真的很像。但是这道题中让我纠结最久的地方是 if(kase>k/2) ans[++d]=sa[i-1]; 这句话该放在哪里。 如果放在height[i]>=mid,会出现很多重复的(我测试的时候会出现很多重复的“aa”),应该把它放在height=mid分组中,那么当height[i]连续大于mid时,就重复的把串记录在ans[]数组中了,导致了我之前出现很多重复答案的错误。

还有一点就是在while多组输入中,while(~scanf("%d",&num),num),要在样例末尾加0会结束多组输入,不然会一直死循环输出。还是用while(scanf("%d",&num)==1&&num)比较好一点吧。

这题RE,OLE好几次,感觉后缀数组稍不注意就会RE,我出现好几次了。

代码参考的下面两篇题解,自己没做出来,太菜了。题解1  题解2

 

#include
#include
#include
#include
#include
using namespace std;
const int maxn=100105;
int sa[maxn],Rank[maxn],c[maxn],height[maxn];
int t1[maxn],t2[maxn];
int s[maxn];
char str[1010];
int id[maxn];
int vis[110];
int ans[maxn];
bool cmp(int *r,int a,int b,int l)
{
    return r[a]==r[b]&&r[a+l]==r[b+l];
}
void da(int *r,int *sa,int n,int m)
{
    int i,j,p,*x=t1,*y=t2;
    for(i=0;i=0;i--) sa[--c[x[i]]]=i;

    for(j=1;j<=n;j<<=1)
    {
        p=0;
        for(i=n-j;i=j) y[p++]=sa[i]-j;

        for(i=0;i=0;i--) sa[--c[x[y[i]]]]=y[i];

        swap(x,y);
        p=1;
        x[sa[0]]=0;
        for(i=1;i=n) break;
        m=p;
    }
    int k=0;
    n--;
    for(int i=0;i<=n;i++) Rank[sa[i]]=i;
    for(i=0;i=mid)
        {
            if(!vis[id[sa[i]]]) kase++,vis[id[sa[i]]]=1;
            if(!vis[id[sa[i-1]]]) kase++,vis[id[sa[i-1]]]=1;
        }
        else
        {
            if(kase>k/2) ans[++d]=sa[i-1];///一定要写在else中
            kase=0;
            memset(vis,0,sizeof(vis));
        }
    }
    ///在for循环外面也要加上,因为可能在数组中一直连续大于mid知道for循环结束,这样我们也要记录答案
    if(kase>k/2) ans[++d]=sa[n];
    if(d)
    {
        ans[0]=d;///ans[0]保存的就是最长公共字串的个数
        return true;
    }
    return false;
}
int main()
{
//    freopen("in.txt","r",stdin);
    int num;
    int flag=0;
    while(scanf("%d",&num)==1&&num)///!!!
    {
        if(flag) printf("\n");
        else flag=1;
        memset(ans,0,sizeof(ans));
        int n=0;
        int len;
        for(int i=0;i

 

你可能感兴趣的:(后缀数组)