定义
一个序列S任意删除若干个字符得到新序列T,则T称为S的子序列
两个序列X和Y的公共子序列中,长度最长的那个,定义为X和Y的最长公共子序列(LCS)
如12455和3455677的最长公共子序列就是455
注意区别最大公共子串(需要连续)
算法
(1)暴力穷举法(不可取)
(2)动态规划
假设两个序列X和Y,设Xi为X序列的前i个字符,Yi为Y序列的前i个字符
LCS(X, Y)为X和Y的最长公共子序列
若Xm=Yn(最后一个字符相同),则LCS(X, Y)的最后一个字符Zk=Xm=Yn
即LCS(Xm, Yn) = LCS(Xm-1, Yn-1) + Xm
若Xm!=Yn(最后一个字符不相同),则LCS(Xm, Yn) =LCS(Xm-1, Yn)或LCS(Xm, Yn) =LCS(Xm, Yn-1)
即LCS(Xm, Yn) =max{LCS(Xm-1, Yn), LCS(Xm, Yn-1)}
相关算法题
Delete Operation for Two Strings
Given two words word1 and word2, find the minimum number of steps required to make word1 and word2 the same, where in each step you can delete one character in either string.
Example 1:
Input: "sea", "eat"Output: 2Explanation: You need one step to make "sea" to "ea" and another step to make "eat" to "ea".
Note:
The length of given words won't exceed 500.
Characters in given words can only be lower-case letters.
该题实则可以转换成LCS的问题,假设s1,s2两个序列长度是m,n,则本题的结果可以通过m+n-2*LCS(s1,s2)获得
接下来就来解决如何求LCS(s1,s2),也就是上文提到的LCS算法的实现
Approach #1 Using Longest Common Subsequence [Time Limit Exceeded](用递归实现)
public class Solution {
public int minDistance(String s1, String s2) {
return s1.length() + s2.length() - 2 * lcs(s1, s2, s1.length(), s2.length());
}
public int lcs(String s1, String s2, int m, int n) {
if (m == 0 || n == 0)
return 0;
if (s1.charAt(m - 1) == s2.charAt(n - 1))
return 1 + lcs(s1, s2, m - 1, n - 1);
else
return Math.max(lcs(s1, s2, m, n - 1), lcs(s1, s2, m - 1, n));
}
}
这种方法的时间复杂度为O(2max(m,n)),空间复杂度O(max(m,n)),实际上有些结果有重复计算,可以改进该算法
Approach #2 Longest Common Subsequence with Memoization [Accepted](空间换时间,依旧使用递归)
public class Solution {
public int minDistance(String s1, String s2) {
int[][] memo = new int[s1.length() + 1][s2.length() + 1];
return s1.length() + s2.length() - 2 * lcs(s1, s2, s1.length(), s2.length(), memo);
}
public int lcs(String s1, String s2, int m, int n, int[][] memo) {
if (m == 0 || n == 0)
return 0;
if (memo[m][n] > 0)
return memo[m][n];
if (s1.charAt(m - 1) == s2.charAt(n - 1))
memo[m][n] = 1 + lcs(s1, s2, m - 1, n - 1, memo);
else
memo[m][n] = Math.max(lcs(s1, s2, m, n - 1, memo), lcs(s1, s2, m - 1, n, memo));
return memo[m][n];
}
}
这种方法的时间复杂度为O(m*n),空间复杂度O(m*n),实际上就是用一个二维数组记录已经计算过的值避免重复计算,但是还是使用到了递归的思想
Approach #3 Using Longest Common Subsequence- Dynamic Programming [Accepted](空间换时间,扫描LCS(Xi, Yi)每一个点通过附近的点算出来)
public class Solution {
public int minDistance(String s1, String s2) {
int[][] dp = new int[s1.length() + 1][s2.length() + 1];
for (int i = 0; i <= s1.length(); i++) {
for (int j = 0; j <= s2.length(); j++) {
if (i == 0 || j == 0)
continue;
if (s1.charAt(i - 1) == s2.charAt(j - 1))
dp[i][j] = 1 + dp[i - 1][j - 1];
else
dp[i][j] = Math.max(dp[i - 1][j], dp[i][j - 1]);
}
}
return s1.length() + s2.length() - 2 * dp[s1.length()][s2.length()];
}
}
这种方法的时间复杂度为O(m*n),空间复杂度O(m*n),与Approach#2的区别是二维数组不是通过递归计算得出,而是直接扫描每行每列把所有值都计算出来。
当然为了进一步节省空间,该算法可以进一步优化空间复杂度,发现算法中求每行的值只与本行和上一行的值有关,所以可以仅用两行一维数组来搞定该算法。这样就降低了空间复杂度