[AC自动机]DNA Repair

DNA repair
Time Limit: 2000MS   Memory Limit: 65536K
Total Submissions: 4955   Accepted: 2301

Description

Biologists finally invent techniques of repairing DNA that contains segments causing kinds of inherited diseases. For the sake of simplicity, a DNA is represented as a string containing characters 'A', 'G' , 'C' and 'T'. The repairing techniques are simply to change some characters to eliminate all segments causing diseases. For example, we can repair a DNA "AAGCAG" to "AGGCAC" to eliminate the initial causing disease segments "AAG", "AGC" and "CAG" by changing two characters. Note that the repaired DNA can still contain only characters 'A', 'G', 'C' and 'T'.

You are to help the biologists to repair a DNA by changing least number of characters.

Input

The input consists of multiple test cases. Each test case starts with a line containing one integers N (1 ≤ N ≤ 50), which is the number of DNA segments causing inherited diseases.
The following N lines gives N non-empty strings of length not greater than 20 containing only characters in "AGCT", which are the DNA segments causing inherited disease.
The last line of the test case is a non-empty string of length not greater than 1000 containing only characters in "AGCT", which is the DNA to be repaired.

The last test case is followed by a line containing one zeros.

Output

For each test case, print a line containing the test case number( beginning with 1) followed by the
number of characters which need to be changed. If it's impossible to repair the given DNA, print -1.

Sample Input

2
AAA
AAG
AAAG    
2
A
TG
TGAATG
4
A
G
C
T
AGT
0

Sample Output

Case 1: 1
Case 2: 4
Case 3: -1

Source

2008 Asia Hefei Regional Contest Online by USTC

这道题加深了我对AC_automation的理解,尤其是fail指针的指向关系。充分挖掘出了fail指针的优势,节约了空间。


我们容易想到,建立节点数为4^maxl的一棵树,因为这样可以表示出所有的状态(某一步是否改变),在这棵树上进行动态规划,这样能够实现,但是空间不能承受,原因很简单,它退化成了搜索。

两点关键思路:

1、父亲节点如果匹配成功,则儿子节点必定匹配成功(因为父亲表示的是前缀),则父亲和儿子都是无效状态(自顶向下遍历时传递关系)。这个结论能减少很大一部分无效枚举。

2、fail指针的作用在于:当匹配失败时,能够继续匹配后缀。这就表明,经fail指针跳转的是有效状态,且正好我们需要继续从这里继续匹配。因此利用fail指针,我们可以将不存在的next边都表示出来,而边权为1,这样做也就表示出了所有状态。但是由于节点重用,空间非常小,上界为O(N*maxl)。


#include 
#include 
#define TOIND(a) ((a)=='A'?1:(a)=='T'?2:(a)=='C'?3:4)

bool danger[1010];
int f[1010][1010];
int next[1010][10];
int fail[1010];
char str[1010];
char pattern[1010];
int poolsize = 1;
int que[200000];
const int root = 1;

void init()
{
	memset(danger,0,sizeof danger);
	memset(f,0x3f,sizeof f);
	memset(fail,0,sizeof fail);
	memset(next,0,sizeof next);
	poolsize = 1;
}

void reads(char* ss,int &ll)
{
	scanf("%s",ss+1);
	ll = 0;
	while (ss[++ll])
		ss[ll] = TOIND(ss[ll]);
	ll --;
}

void insert()
{
	int len = 0;
	reads(pattern,len);

	int u = root;
	for (int i=1;i<=len;i++)
	{
		if (next[u][pattern[i]])
			u = next[u][pattern[i]];
		else
		{
			poolsize ++;
			u = next[u][pattern[i]] = poolsize;
		}
		if (danger[u])
			break;
	}
	danger[u] = true;
}

void build_ac_automation()
{
	int l = 0;
	int r = 0;

	for (int i=1;i<=4;i++)
	{
		if (next[root][i])
		{
			fail[next[root][i]] = root;
			r ++;
			que[r] = next[root][i];
		}
		else
			next[root][i] = root;
	}

	while (l < r)
	{
		l ++;
		int u = que[l];
		danger[u] |= danger[fail[u]];//
		if (danger[u]) continue;//
		for (int i=1;i<=4;i++)
		{
			if (!next[u][i])
				next[u][i] = next[fail[u]][i];//
			else
			{
				fail[next[u][i]] = next[fail[u]][i];
				r ++;
				que[r] = next[u][i];
			}
		}
	}
}

void updatamin(int& a,int b)
{
	if (b < a)
		a = b;
}

int main()
{
	freopen("dna.in","r",stdin);
	freopen("dna.out","w",stdout);
	int n=0,T=0;
	while (1)
	{
		init();
		T ++;
		scanf("%d",&n);
		if (n == 0) break;
		
		//vvvvvvvvvvvvvvvvvvvvv
		for (int i=1;i


你可能感兴趣的:(ACM)