【hdu 2222】Keywords Search 中文题意&题解&代码(C++)

Time Limit: 2000/1000 MS (Java/Others)
Memory Limit: 131072/131072 K (Java/Others)

Problem Description
In the modern time, Search engine came into the life of everybody like Google, Baidu, etc.
Wiskey also wants to bring this feature to his image retrieval system.
Every image have a long description, when users type some keywords to find the image, the system will match the keywords with description of image and show the image which the most keywords be matched.
To simplify the problem, giving you a description of image, and some keywords, you should tell me how many keywords will be match.

Input
First line will contain one integer means how many cases will follow by.
Each case will contain two integers N means the number of keywords and N keywords follow. (N <= 10000)
Each keyword will only contains characters ‘a’-‘z’, and the length will be not longer than 50.
The last line is the description, and the length will be not longer than 1000000.

Output
Print how many keywords are contained in the description.

Sample Input
1
5
she
he
say
shr
her
yasherhs

Sample Output
3

中文题意:
首先给出测试数据的数量,然后再给出n个字符串,然后给一篇文章(也就是一个长长的字符串啦..),问这n个串有多少个在文章里面出现过。

题解:
处理多串的匹配,AC自动机的入门题以及模板题,有kmp基础的话知道kmp在实现过程中要构建一个fail指针,即匹配失败时指向的地方,而AC自动机则是一个在字典树上来实现这个fail指针,具体的构造思路博主太弱还是写不了,推荐大家在这里学习一下。
http://blog.csdn.net/morgan_xww/article/details/7831074/
http://blog.csdn.net/mobius_strip/article/details/22549517

觉得这两篇文章讲解的还是不错的,代码风格倒还凑合,但本人还是觉得这篇风格要略好一点,思路都一样,懂了思路两篇都能看懂
http://www.cnblogs.com/kuangbin/p/3164106.html

代码:

#include<iostream>
#include<algorithm>
#include<stdio.h>
#include<queue>
#include<string.h>
using namespace std;
char t[1000005],s[500010];
queue<int>q;
struct node{
    int ch[26];
    void init()//清零函数
    {
        for (int w=0;w<=25;w++)
        ch[w]=0;
    }
}tr[500010];//字典树的常规构造
int T,n,ans,tot,fail[500010],flag[500010],vis[500010];
//flag[i]表示在字典树上以节点i为结尾的单词数
//fail[i]表示fail指针
void add(char x[])
{
    int now=0;
    int len=strlen(x);
    for (int i=0;i<len;i++)
    {
        int tmp=x[i]-'a';
        if (!tr[now].ch[tmp]) 
        {
            tot++,tr[now].ch[tmp]=tot;
            //因为多组数据因此每新建一个点时,注意清零flag与fail及子节点
            flag[tot]=0;
            fail[tot]=0;
            vis[tot]=0;
            tr[tot].init();
        }
        now=tr[now].ch[tmp];
    }
    flag[now]++;
}
inline void bfs()
{
    for (int i=0;i<=25;i++)
    if (tr[0].ch[i]) q.push(tr[0].ch[i]);
    //首先把第一层push进去,因为第一层的fail必须是0
    //如果不在这里先push进去在下面的地方会出错
    while(!q.empty())
    {
        int now=q.front();q.pop();
        for (int i=0;i<26;i++)
        if (tr[now].ch[i])
        {
            fail[tr[now].ch[i]]=tr[fail[now]].ch[i];
            //刚才提到过的地方,可以自己脑内模拟一下
            //而且这里的构建tree树的优化,大家可以自己去阅读我提到的博客仔细思考一下,为什么不用while()循环一直找fail来查询是否存在ch[i]
            q.push(tr[now].ch[i]);
        }
        else
        tr[now].ch[i]=tr[fail[now]].ch[i];
        //如果没有节点的话查找时就要查找tr[fail[now]].ch[i],那么我们就直接建一条连接两个点的边,查找时就不用再判断是否存在节点之类的问题
    }
}
inline void atm()
{
    int now=0;
    int len=strlen(t);
    for (int i=0;i<len;i++)
    {
        int nex=t[i]-'a';
        now=tr[now].ch[nex];
        int tmp=now;
        //这里的vis数组表示此节点是否被统计过
        while(tmp&&!vis[tmp])//将以nex位结尾的单词数统计一遍
        {
            ans+=flag[tmp];
            flag[tmp]=0;
            vis[tmp]=1;
            tmp=fail[tmp];
        }
    }
}
int main()
{
    scanf("%d",&T);
    while(T--)
    {
        tot=0;ans=0;
        tr[0].init();
        scanf("%d",&n);
        for (int i=1;i<=n;i++)
        {
            scanf("%s",s);
            add(s);
        }
        bfs();
        scanf("%s",t);
        atm();
        printf("%d\n",ans);
    }
    return 0;
}

这里的 vis 数组可以节省时间,防止多次回溯到同一个点浪费时间,比如
用aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa………aaaaaaaaaaa匹配
aaaaaaaaaaaaaaa…………aaaaaaaaaaaaaaaaaaaa的话,不加vis就会退化为L方的暴力。。

你可能感兴趣的:(字符串,AC自动机)