Question:
You're given a large string T, and a stream of smaller string S1, S2, S3 ... Determine whether Si is a subsequence of T.
|T| < 10 000 000
|Si| < 100
alphabet is 'a' - 'z'
Example:
T = abcdefg
S1 = abc yes
S2 = ag yes
S3 = ga no
S4 = aa no
Analysis:
A common approach is we start from the first letter of Si, and go through T from the beginning, and check whether that letter exists in T, if yes, continue with the second letter of Si, but for T, we start from next position where the previous letter is found in T.
public static boolean isExisted(String dest, String input) { if (dest == null || input == null || dest.length() < input.length()) return false; if (dest.length() == 0 && input.length() == 0) return true; int j = 0; for (int i = 0; i < input.length(); i++) { while (j < dest.length()) { if (input.charAt(i) == dest.charAt(j)) { j++; break; } j++; } if (i != input.length() - 1 && j == input.length()) { return false; } } return true; }The complexity is O(mn), where m and n are the length of T and S, respectively.
However, if there are too many Si, we should find an other to decrease the complexity by pre-processing T.
The idea of pre-processing is we build a table to record the nearest position of a letter in T after a specify position p. For instance, if T = "abcdefg", and the position of a is 1, b is 2, ... , and g is 7, then, the question is how to quickly find out the first "g" after position 2? (it is 7).
To solve the problem, we create a table position[256][ |T| + 1], and position[x][y] refers to the nearest position of x after y. For instance, for T = "abcdefg", the position table is:
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | |
a | 1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 |
b | 2 | 2 | -1 | -1 | -1 | -1 | -1 | -1 |
c | 3 | 3 | 3 | -1 | -1 | -1 | -1 | -1 |
d | 4 | 4 | 4 | 4 | -1 | -1 | -1 | -1 |
e | 5 | 5 | 5 | 5 | 5 | -1 | -1 | -1 |
f | 6 | 6 | 6 | 6 | 6 | 6 | -1 | -1 |
g | 7 | 7 | 7 | 7 | 7 | 7 | 7 | -1 |
… |
public class Subsequence { public static int[][] getPosition(String dest) { if (dest == null) return null; int[][] position = new int[256][dest.length() + 1]; // i refers to position i for (int i = 0; i < position[0].length; i++) { //check the nearest position of the remaining letters in dest for (int j = i + 1; j < position[0].length; j++) { char ch = dest.charAt(j - 1); if (position[ch][i] == 0) { position[ch][i] = j; } } } return position; } public static boolean isExisted(String dest, String input) { if (dest == null || input == null || dest.length() < input.length()) return false; if (dest.length() == 0 && input.length() == 0) return true; int[][] position = getPosition(dest); int index = 0; int i = 0; do { char ch = input.charAt(i); index = position[ch][index]; if (index == 0) return false; i++; } while (i < input.length()); return true; } }