hdu 3518 Boring counting(后缀数组)

Boring counting

                                                                      Time Limit: 2000/1000 MS (Java/Others)    Memory Limit: 65536/32768 K (Java/Others)

Problem Description
035 now faced a tough problem,his english teacher gives him a string,which consists with n lower case letter,he must figure out how many substrings appear at least twice,moreover,such apearances can not overlap each other.
Take aaaa as an example.”a” apears four times,”aa” apears two times without overlaping.however,aaa can’t apear more than one time without overlaping.since we can get “aaa” from [0-2](The position of string begins with 0) and [1-3]. But the interval [0-2] and [1-3] overlaps each other.So “aaa” can not take into account.Therefore,the answer is 2(“a”,and “aa”).
 

Input
The input data consist with several test cases.The input ends with a line “#”.each test case contain a string consists with lower letter,the length n won’t exceed 1000(n <= 1000).
 

Output
For each test case output an integer ans,which represent the answer for the test case.you’d better use int64 to avoid unnecessary trouble.
 

Sample Input
   
   
   
   
aaaa ababcabb aaaaaa #
 

Sample Output
   
   
   
   
2 3 3
 
题意:求有多少个子串在字符串中出现了至少两次,且子串没有重叠。
分析:若在假设重复子串的长度最多为L的限制下有解, 则对于任意一个比L小的限制L'< L, 也一定有解. 这就说明存在解的连续性
因为LCP(sa[i], sa[j]) = RMQ(height[i+1..j]). 由此, 若存在k, 满足height[k] < L, 则对于所有i, j 满足i < k < j, 有LCP(sa[i], sa[j]) < L. 
即公共长度至少为L的两个后缀, 不会跨过一个小于L的height低谷k, 所以我们可以得到一些由这些低谷划分开的连续的段.
解题方法:枚举字串长度L
对于每一次的h,利用height数组,找出连续的height大于等于h的里面最左端和最右端得为之l和r。
如果l + L <= r的话,说明没有重叠,答案加1.
因为在同一连续height大于等于L的区间中,公共前缀至少为L,这样就不用担心重复计数了,其他长度为L的重复子串会出现在另一个连续的大于等于L的height

#include<cstdio>
#include<cstring>
#include<algorithm>
using namespace std;
const int N = 1010;
char str[N];
int *rank, r[N], sa[N], height[N];
int wa[N], wb[N], wm[N];
bool comp(int *r, int a, int b, int l)
{
    return r[a] == r[b] && r[a+l] == r[b+l];
}
void get_sa(int *r, int *sa, int n, int m)
{
    int *x = wa, *y = wb, *t, i, j, p;
    for(i = 0; i < m; ++i) wm[i] = 0;
    for(i = 0; i < n; ++i) wm[x[i] = r[i]]++;
    for(i = 1; i < m; ++i) wm[i] += wm[i-1];
    for(i = n-1; i >= 0; --i) sa[--wm[x[i]]] = i;
    for(i = 0, j = 1, p = 0; p < n; j <<= 1, m = p) {
        for(p = 0, i = n - j; i < n; ++i) y[p++] = i;
        for(i = 0; i < n; ++i) if(sa[i] >= j) y[p++] = sa[i] - j;
        for(i = 0; i < m; ++i) wm[i] = 0;
        for(i = 0; i < n; ++i) wm[x[y[i]]]++;
        for(i = 1; i < m; ++i) wm[i] += wm[i-1];
        for(i = n-1; i >= 0; --i) sa[--wm[x[y[i]]]] = y[i];
        for(t = x, x = y, y = t, i = p = 1, x[sa[0]] = 0; i < n; ++i) {
            x[sa[i]] = comp(y, sa[i], sa[i-1], j) ? p-1 : p++;
        }
    }
    rank = x;
}
void get_height(int *r, int *sa, int n)
{
    for(int i = 0, j = 0, k = 0; i < n; height[rank[i++]] = k) {
        for(k ? --k : 0, j = sa[rank[i]-1]; r[i+k] == r[j+k]; ++k);
    }
}
int main()
{
    while(~scanf("%s",str)) {
        if(str[0] == '#') break;
        int len = strlen(str);
        for(int i = 0; i < len; i++)
            r[i] = str[i];
        r[len] = 0; //要比可能出现的所有值都要小
        get_sa(r, sa, len+1, 256);
        get_height(r, sa, len);
        int ans = 0, minid, maxid;
        for(int i = 1; i <= (len+1)/2; ++i) { //查一半就好了,长度大于(len+1)/2的子串不可能重复两次
            minid = 1010, maxid = -1;
            for(int j = 1; j <= len; ++j) {
                if(height[j] >= i) {
                    if(sa[j-1] < minid) minid = sa[j-1];
                    if(sa[j-1] > maxid) maxid = sa[j-1];
                    if(sa[j] < minid) minid = sa[j];
                    if(sa[j] > maxid) maxid = sa[j];
                }
                else {
                    if(maxid != -1 && minid + i <= maxid) ans++;
                    minid = 1010, maxid = -1;
                }
            }
            if(maxid != -1 && minid + i <= maxid) ans++;
        }
        printf("%d\n", ans);
    }
    return 0;
}


你可能感兴趣的:(hdu 3518 Boring counting(后缀数组))