找出一个字符串中的最长palindrome子串

例如:字符串“babcbabcbaccba”中的最长回文字串是abcbabcba,回文就是从字符串的开头遍历和从结尾遍历得到的结果都是一样的。那么我们怎么来解决这个问题呢?

很容易想到的就是应用brute-force方法,即对每一个给定字符串的子串都做一次判断_is_palindrome:

bool _is_palindrome(string s) {
	int n = s.length();
	int i = 0; 
	int j = n -1;
	while(i < j) {
		if(s[i] != s[j]) {
			return false;
		}
	}
	if(i >= j) {
		return true;
	}
}


对于字符串babcbabcbaccba的子串应该有n^2,故所用的时间复杂度为O(n^3).如果字符串很长,这种方法是不可取的。


有没有可以改进的方法呢?我们观察一下这个回文的特点:abcbabcba,长度为奇数,最中间的字母是a,然后a左右两边第一个字母都是b,第二个字母都是c,直到最后一个字母都是a,这种情况是以字母作为中间节点。如果是这个呢:abcbaabcba, 长度为偶数,那应该从哪里开始判断其左右两边对等的字符是不是相同呢,应该从两个a中间开始判断,这种情况则是以空作为中间节点,称为虚拟节点。设想两个a之间的中间线为标准来判断离他对等距离的两个字符是否相等。给定一个字符串,我们并不能够判断它的最长回文是奇数还是偶数,所以这两种情况都要做一下尝试。总结上述两种情况,需要做判断的节点一共是2n-1个:即字母个数加上字母之间的虚拟节点。对于每个给定的节点,都做如下操作:判断它两边的对应字母是否相等,直到对应左右两边等距离的位置的字符不相等位置。然后记录这个回文的长度和起始位置。下面是算法的代码:

string expand_palindrome(string s, int l, int r) {
	int n = s.length();
	while(l >= 0 && r < n && s[l] == s[r]) {
		l--;
		r++;
	}
	return s.substr(l + 1, r - l - 1);
}

string get_longest_palin(string s) {
	int i;
	int n = s.length();
	string longest = s.substr(0, 1);
	string t;
	for(i = 0; i < n; i++) {
		t = expand_palindrome(s, i, i);
		if(t.length() > longest.length()) {
			longest = t;
		}
		t = expand_palindrome(s, i, i + 1);
		if(t.length() > longest.length()) {
			longest = t;
		}
	}
	return longest;
}

这个算法的时间复杂度是O(n^2),空间复杂度是O(1)。


还有一个O(n)时间复杂度和O(n)空间复杂度的算法,我觉得我自己的能力有限,讲不好,又怕以后的链接失效,所以复制到自己的博客中:

An O(N) Solution (Manacher’s Algorithm):
First, we transform the input string, S, to another string T by inserting a special character ‘#’ in between letters. The reason for doing so will be immediately clear to you soon.

For example: S = “abaaba”, T = “#a#b#a#a#b#a#”.

To find the longest palindromic substring, we need to expand around each Ti such that Ti-d … Ti+d forms a palindrome. You should immediately see that d is the length of the palindrome itself centered at Ti.

We store intermediate result in an array P, where P[ i ] equals to the length of the palindrome centers at Ti. The longest palindromic substring would then be the maximum element in P.

Using the above example, we populate P as below (from left to right):

T = # a # b # a # a # b # a #
P = 0 1 0 3 0 1 6 1 0 3 0 1 0

Looking at P, we immediately see that the longest palindrome is “abaaba”, as indicated by P6 = 6.

Did you notice by inserting special characters (#) in between letters, both palindromes of odd and even lengths are handled graciously? (Please note: This is to demonstrate the idea more easily and is not necessarily needed to code the algorithm.)

Now, imagine that you draw an imaginary vertical line at the center of the palindrome “abaaba”. Did you notice the numbers in P are symmetric around this center? That’s not only it, try another palindrome “aba”, the numbers also reflect similar symmetric property. Is this a coincidence? The answer is yes and no. This is only true subjected to a condition, but anyway, we have great progress, since we can eliminate recomputing part of P[ i ]‘s.

Let us move on to a slightly more sophisticated example with more some overlapping palindromes, where S = “babcbabcbaccba”.


Above image shows T transformed from S = “babcbabcbaccba”. Assumed that you reached a state where table P is partially completed. The solid vertical line indicates the center (C) of the palindrome “abcbabcba”. The two dotted vertical line indicate its left (L) and right (R) edges respectively. You are at index i and its mirrored index around C is i’. How would you calculate P[ i ] efficiently?

Assume that we have arrived at index i = 13, and we need to calculate P[ 13 ] (indicated by the question mark ?). We first look at its mirrored index i’ around the palindrome’s center C, which is index i’ = 9.


The two green solid lines above indicate the covered region by the two palindromes centered at i and i’. We look at the mirrored index of i around C, which is index i’. P[ i' ] = P[ 9 ] = 1. It is clear that P[ i ] must also be 1, due to the symmetric property of a palindrome around its center.

As you can see above, it is very obvious that P[ i ] = P[ i' ] = 1, which must be true due to the symmetric property around a palindrome’s center. In fact, all three elements after C follow the symmetric property (that is, P[ 12 ] = P[ 10 ] = 0, P[ 13 ] = P[ 9 ] = 1, P[ 14 ] = P[ 8 ] = 0).


Now we are at index i = 15, and its mirrored index around C is i’ = 7. Is P[ 15 ] = P[ 7 ] = 7?

Now we are at index i = 15. What’s the value of P[ i ]? If we follow the symmetric property, the value of P[ i ]should be the same as P[ i' ] = 7. But this is wrong. If we expand around the center at T15, it forms the palindrome “a#b#c#b#a”, which is actually shorter than what is indicated by its symmetric counterpart. Why?


Colored lines are overlaid around the center at index i and i’. Solid green lines show the region that must match for both sides due to symmetric property around C. Solid red lines show the region that might not match for both sides. Dotted green lines show the region that crosses over the center.

It is clear that the two substrings in the region indicated by the two solid green lines must match exactly. Areas across the center (indicated by dotted green lines) must also be symmetric. Notice carefully that P[ i ' ] is 7 and it expands all the way across the left edge (L) of the palindrome (indicated by the solid red lines), which does not fall under the symmetric property of the palindrome anymore. All we know is P[ i ] ≥ 5, and to find the real value of P[ i ] we have to do character matching by expanding past the right edge (R). In this case, since P[ 21 ] ≠ P[ 1 ], we conclude that P[ i ] = 5.

Let’s summarize the key part of this algorithm as below:

if P[ i' ] ≤ R – i,
then P[ i ] ← P[ i' ]
else P[ i ] ≥ P[ i' ]. (Which we have to expand past the right edge (R) to find P[ i ].

See how elegant it is? If you are able to grasp the above summary fully, you already obtained the essence of this algorithm, which is also the hardest part.

The final part is to determine when should we move the position of C together with R to the right, which is easy:

If the palindrome centered at i does expand past R, we update C to i, (the center of this new palindrom
string pre_process(string s) {
	int n = s.length();
	if(n == 0) {
		return "^$";
	}
	string ret = "^$";
	int i;
	for(i = 0; i < n; i++) {
		ret += "#" + s.substr(i, 1);
	}
	ret += "#";
	return ret;
}

string get_longest_palindrome(string s) {
	string t = pre_process(s);
	int n = t.length();
	int *p = new int[n];
	memset(p, 0, sizeof(int) * n);
	int id = 0;
	int mx = 0;
	int i;
	for(i = 1; i < n - 1; i++) {
		int i_mirror = 2 * id - i;
		p[i] = mx > i ? min(p[i_mirror], mx - i) : 0;
		while(t[i - 1 - p[i]] == t[i + 1 + p[i]]) p[i]++;
		if(i + p[i] > mx) {
			mx = i + p[i];
			id = i;
		}
	}
	int max_len = 0;
	int current_id = 0;
	for(i = 1; i < n - 1; i++) {
		if(p[i] > max_len) {
			max_len = p[i];
			current_id = i;
		}
	}
	delete []p;
	return s.substr((current_id - 1 - max_len)/2, max_len);
}

e), and extend R to the new palindrome’s right edge.

In each step, there are two possibilities. If P[ i ] ≤ R – i, we set P[ i ] to P[ i' ] which takes exactly one step. Otherwise we attempt to change the palindrome’s center to i by expanding it starting at the right edge, R. Extending R (the inner while loop) takes at most a total of N steps, and positioning and testing each centers take a total of N steps too. Therefore, this algorithm guarantees to finish in at most 2*N steps, giving a linear time solution.


还有一篇中文的:

首先用一个非常巧妙的方式,将所有可能的奇数/偶数长度的回文子串都转换成了奇数长度:在每个字符的两边都插入一个特殊的符号。比如 abba 变成 #a#b#b#a#, aba变成 #a#b#a#。 为了进一步减少编码的复杂度,可以在字符串的开始加入另一个特殊字符,这样就不用特殊处理越界问题,比如$#a#b#a#。

下面以字符串12212321为例,经过上一步,变成了 S[] = "$#1#2#2#1#2#3#2#1#";

然后用一个数组 P[i] 来记录以字符S[i]为中心的最长回文子串向左/右扩张的长度(包括S[i]),比如S和P的对应关系:

S  #  1  #  2  #  2  #  1  #  2  #  3  #  2  #  1  #
P  1  2  1  2  5  2  1  4  1  2  1  6  1  2  1  2  1
(p.s. 可以看出,P[i]-1正好是原字符串中回文串的总长度)

那么怎么计算P[i]呢?该算法增加两个辅助变量(其实一个就够了,两个更清晰)id和mx,其中id表示最大回文子串中心的位置,mx则为id+P[id],也就是最大回文子串的边界。

然后可以得到一个非常神奇的结论,这个算法的关键点就在这里了:如果mx > i,那么P[i] >= MIN(P[2 * id - i], mx - i)。就是这个串卡了我非常久。实际上如果把它写得复杂一点,理解起来会简单很多:
//记j = 2 * id - i,也就是说 j 是 i 关于 id 的对称点。
if (mx - i > P[j]) 
    P[i] = P[j];
else /* P[j] >= mx - i */
    P[i] = mx - i; // P[i] >= mx - i,取最小值,之后再匹配更新。

当然光看代码还是不够清晰,还是借助图来理解比较容易。

当 mx - i > P[j] 的时候,以S[j]为中心的回文子串包含在以S[id]为中心的回文子串中,由于 i 和 j 对称,以S[i]为中心的回文子串必然包含在以S[id]为中心的回文子串中,所以必有 P[i] = P[j],见下图。


当 P[j] > mx - i 的时候,以S[j]为中心的回文子串不完全包含于以S[id]为中心的回文子串中,但是基于对称性可知,下图中两个绿框所包围的部分是相同的,也就是说以S[i]为中心的回文子串,其向右至少会扩张到mx的位置,也就是说 P[i] >= mx - i。至于mx之后的部分是否对称,就只能老老实实去匹配了。


对于 mx <= i 的情况,无法对 P[i]做更多的假设,只能P[i] = 1,然后再去匹配了。

于是代码如下:
//输入,并处理得到字符串s
int p[1000], mx = 0, id = 0;
memset(p, 0, sizeof(p));
for (i = 1; s[i] != '\0'; i++) {
    p[i] = mx > i ? min(p[2*id-i], mx-i) : 1;
    while (s[i + p[i]] == s[i - p[i]]) p[i]++;
    if (i + p[i] > mx) {
        mx = i + p[i];
        id = i;
    }
}
//找出p[i]中最大的


你可能感兴趣的:(算法)