SCU - 4438 Censor 哈希

frog is now a editor to censor so-called sensitive words .
She has a long text p. Her job is relatively simple – just to find the first occurence of sensitive word w and remove it.
frog repeats over and over again. Help her do the tedious work.

Input
The input consists of multiple tests. For each test:
The first line contains 1 string w. The second line contains 1 string p.(1≤length of w,p≤5⋅106, w,p consists of only lowercase letter)

Output
For each test, write 1 string which denotes the censored text.

Sample Input

abc
aaabcbc
b
bbb
abc
ab

Sample Output

a

ab

因为判断两个字符串相等需要匹配每一位是否相等,如果能将字符串映射为一个独一无二的数,那么判断两个字符串只需要判断其映射值是否相同即可。
与二进制相似的,我们可以给字符串的每一个设定一个权值,每一位字母X该位权值,就能得到一个独一无二字符编码。但是这样的问题在于,如果字符串特别长,导致编码无法被记录,超出记录和运算的范围。
但是幸运的是我们可以取余, 因为对权值求余后,得到的值大概率是不相同的,这样我们仍然可以将余数设置为权值。
所以这里我们选用unsigned long long 作为记录哈希值的数据类型,因为它会自动对264取余。
在这里我们把字符串记录为如下形式
H a s h [ n ] = S [ 1 ] ⋅ b a s e n + S [ 2 ] ⋅ b a s e n − 1 + ⋅ ⋅ ⋅ + S [ n ] ⋅ b a s e Hash[n] = S[1] \cdot base^n +S[2] \cdot base^{n-1} +\cdot \cdot \cdot + S[n] \cdot base Hash[n]=S[1]basen+S[2]basen1++S[n]base

base最好为质数,这样求余后较大可能为不同值。

现在给出查询字符串子串的方法。如果我们要查询区间[left,right]内的字符串,则需要其对应的哈希值
H a s h = S [ l ] ⋅ b a s e r + S [ l + 1 ] ⋅ b a s e r − 1 + ⋅ ⋅ ⋅ + S [ r ] ⋅ b a s e Hash = S[l] \cdot base^r +S[l+1] \cdot base^{r-1} +\cdot \cdot \cdot + S[r] \cdot base Hash=S[l]baser+S[l+1]baser1++S[r]base
不难得出
H a s h = H a s h [ r ] − H a s h [ l − 1 ] ⋅ b a s e r − l + 1 Hash=Hash[r]-Hash[l-1 ]\cdot base^{r-l+1} Hash=Hash[r]Hash[l1]baserl+1

最后说一下这个题,因为要求去除B中所有的字符串A,所以从B左端取出一个字符添加到字符串C中,C中不含有字符串A,所以添加一个字符后,只考虑从最后一位起,向前长度为A.len的后缀子串是否和A相同,如果是,则在C中删除这个后缀再记录下一位;不是则添加这个新的字符。


#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#define INF 0x3f3f3f3f
#define ll long long
#define Pair pair
#define re return

#define getLen(name,index) name[index].size()
#define mem(a,b) memset(a,b,sizeof(a))
#define Make(a,b) make_pair(a,b)
#define Push(num) push_back(num)
#define rep(index,star,finish) for(register int index=star;index=star;index--)
using namespace std;
typedef unsigned long long ULL;
const int maxn=5e6+5;
const ULL base=1e9+7;
ULL p[maxn];
ULL Hash[maxn];

char A[maxn],B[maxn];
char ans[maxn];
inline ULL getSection(int r,int l);
int main(){
    p[0]=1;
    rep(i,1,maxn)
        p[i]=p[i-1]*base;

    while(~scanf("%s%s",A,B+1)){
        int lenA=strlen(A),lenB=strlen(B+1);

        ULL ha=0;
        rep(i,0,lenA){
            ha=ha*base+A[i]-'a';
        }

        mem(Hash,0);
        int pos=1;
        rep(i,1,lenB+1){
            ans[pos]=B[i];
            Hash[pos]=Hash[pos-1]*base+B[i]-'a';
            
            if(pos>=lenA && getSection(pos,pos-lenA+1)==ha){
                pos-=lenA;
            }
            pos++;
        }

        rep(i,1,pos)
            printf("%c",ans[i]);
        printf("\n");

    }
    re 0;
}
inline ULL getSection(int r,int l){
    re Hash[r]-Hash[l-1]*p[r+1-l];
}

你可能感兴趣的:(字符处理)