AC自动机+dp(CodeForces - 86C )

"Multidimensional spaces are completely out of style these days, unlike genetics problems" — thought physicist Woll and changed his subject of study to bioinformatics. Analysing results of sequencing he faced the following problem concerning DNA sequences. We will further think of a DNA sequence as an arbitrary string of uppercase letters "A", "C", "G" and "T" (of course, this is a simplified interpretation).

Let w be a long DNA sequence and s1, s2, ..., sm — collection of short DNA sequences. Let us say that the collection filters w iff w can be covered with the sequences from the collection. Certainly, substrings corresponding to the different positions of the string may intersect or even cover each other. More formally: denote by |w| the length of w, let symbols of w be numbered from 1 to |w|. Then for each position i in w there exist pair of indices l, r (1 ≤ l ≤ i ≤ r ≤ |w|) such that the substring w[l ... r] equals one of the elements s1, s2, ..., sm of the collection.

Woll wants to calculate the number of DNA sequences of a given length filtered by a given collection, but he doesn't know how to deal with it. Help him! Your task is to find the number of different DNA sequences of length n filtered by the collection {si}.

Answer may appear very large, so output it modulo 1000000009.

Input

First line contains two integer numbers n and m (1 ≤ n ≤ 1000, 1 ≤ m ≤ 10) — the length of the string and the number of sequences in the collection correspondently.

Next m lines contain the collection sequences si, one per line. Each si is a nonempty string of length not greater than 10. All the strings consist of uppercase letters "A", "C", "G", "T". The collection may contain identical strings.

Output

Output should contain a single integer — the number of strings filtered by the collection modulo 1000000009 (109 + 9).

Example
Input
2 1
A
Output
1
Input
6 2
CAT
TACT
Output
2
Note

In the first sample, a string has to be filtered by "A". Clearly, there is only one such string: "AA".

In the second sample, there exist exactly two different strings satisfying the condition (see the pictures below).

题意:简单说一下吧,就是让你构造一个长度为n的字符串,在总共m个子串中选择,构造的字符串中的每一个字符都需要有子串提供,求构成的方案。

题解:我写这题也是煞费苦心啊,写了两个晚上,重构了n次,ac自动机掌握的不扎实啊。

讲一下我的思考过程吧,也就是我抄题解的过程。

首先,由于串的匹配,并且有多个串,就很容易(??)想到ac自动机这个算法。

然后既然是dp专题里的,就是用dp去做(formally这些数方案的差不多都是dp。。)

我们用区间dp的套路设 dp[i][j]表示匹配到i这个位置,选j的方案。发现并不能成功,于是我们再把j改成ac自动机上的节点,

于是方程就变成了dp[i][j]表示匹配到串i,到达ac自动机节点j的方案数。

然后会发现这样并不是最优解,因为串和串之间是有可能重叠的。

我们再设dp[i][j][k]表示。。。后面有k个剩余的答案。剩余表示这个串的后k个还没有匹配,接下来有可能有串匹配

于是我们非常容易推导出dp方程

dp[i][j][k]->dp[i+1][j][0]剩余k->0匹配上了。

dp[i][j][k]->dp[i+1][j][k+1]没有匹配上。

于是答案就是sigma dp[n][所有ac自动机的节点][0]

边界dp[0][0][0]=1;

于是我们就在o(100nm)的时间内解决了这个问题。

放上臭但是不长的代码。

#include
#define int long long
#define N 1100
using namespace std;
const int mod =1e9+9;
int f[N][N][11],n,m,ch[N][4],val[N],fail[N],sz,root=0,mx=0;
void add(int &x,int y)
{
	x+=y;
	if(x<0) x+=mod;
	while(x>=mod) x-=mod;
}

int idx(char c)
{
	if(c=='A') return 0;
	if(c=='C') return 1;
	return c=='T'?3:2;
}

int newnode()
{
	for(int i=0;i<4;i++) ch[sz][i]=0;
	val[sz++]=0;return sz-1;
}

void insert(char *s,int len)
{
	int u=0;
	for(int i=0;i q;
	for(int i=0;i<4;i++)
	{
		if(ch[root][i]==0) ch[root][i]=root;
		else fail[ch[root][i]]=root,q.push(ch[root][i]);
	}
	while(!q.empty())
	{
		int u=q.front();q.pop();
		for(int i=0;i<4;i++)
		{
			int &v=ch[u][i];
			if(v==0) v=ch[fail[u]][i];
			else fail[v]=ch[fail[u]][i],q.push(v),val[v]=max(val[v],val[fail[v]]);
		}
	}
	
}

void dp()
{
	f[0][0][0]=1;
	for(int i=0;ik) add(f[i+1][now][0],v);
					else if(k>n>>m;
	for(int i=1,len;i<=m;i++) 
		scanf("%s",str),len=strlen(str),mx=max(mx,len),insert(str,len);
	build();
	dp();
	return 0;	
}

 

你可能感兴趣的:(AC自动机+dp(CodeForces - 86C ))