determine S is a subsequence of T

Question:

You're given a large string T, and a stream of smaller string S1, S2, S3 ... Determine whether Si is a subsequence of T. 

|T| < 10 000 000
|Si| < 100
alphabet is 'a' - 'z'

Example:
T = abcdefg
S1 = abc        yes
S2 = ag          yes
S3 = ga          no
S4 = aa          no

Analysis:

A common approach is we start from the first letter of Si, and go through T from the beginning, and check whether that letter exists in T, if yes, continue with the second letter of Si, but for T, we start from next position where the previous letter is found in T. 

public static boolean isExisted(String dest, String input) {
	if (dest == null || input == null || dest.length() < input.length()) return false;
	if (dest.length() == 0 && input.length() == 0) return true;
	int j = 0;
	for (int i = 0; i < input.length(); i++) {
		while (j < dest.length()) {
			if (input.charAt(i) == dest.charAt(j)) {
				j++;
				break;
			}
			j++;
		}
		if (i != input.length() - 1 && j == input.length()) {
			return false;
		}
	}
	return true;	
}
The complexity is O(mn), where m and n are the length of T and S, respectively.

However, if there are too many Si, we should find an other to decrease the complexity by pre-processing T.

The idea of pre-processing is we build a table to record the nearest position of a letter in T after a specify position p. For instance, if T = "abcdefg", and the position of a is 1, b is 2, ... , and g is 7, then, the question is how to quickly find out the first "g" after position 2? (it is 7).

To solve the  problem, we create a table position[256][ |T| + 1], and position[x][y] refers to the nearest position of x after y.  For instance, for T = "abcdefg", the position table is:

  0 1 2 3 4 5 6 7
a 1 -1 -1 -1 -1 -1 -1 -1
b 2 2 -1 -1 -1 -1 -1 -1
c 3 3 3 -1 -1 -1 -1 -1
d 4 4 4 4 -1 -1 -1 -1
e 5 5 5 5 5 -1 -1 -1
f 6 6 6 6 6 6 -1 -1
g 7 7 7 7 7 7 7 -1
               
So, based on the table, we know in T, the nearest position of e after position 1 is 5, and we cannot find e after position 7 because position['e'][7] = -1.

public class Subsequence {
	
	public static int[][] getPosition(String dest) {
		if (dest == null) return null;
		int[][] position = new int[256][dest.length() + 1];
		// i refers to position i
		for (int i = 0; i < position[0].length; i++) {
			//check the nearest position of the remaining letters in dest
			for (int j = i + 1; j < position[0].length; j++) {
				char ch = dest.charAt(j - 1);
				if (position[ch][i] == 0) {
					position[ch][i] = j;
				}
			}
		}
		return position;	
	}
	
	public static boolean isExisted(String dest, String input) {
		if (dest == null || input == null || dest.length() < input.length()) return false;
		if (dest.length() == 0 && input.length() == 0) return true;
		int[][] position = getPosition(dest);
		int index = 0;
		int i = 0;
		do {
			char ch = input.charAt(i);
			index = position[ch][index];
			if (index == 0) return false;
			i++;
		} while (i < input.length());
		return true;		
	}
}

blog.csdn.net/beiyetengqing

 

你可能感兴趣的:(determine S is a subsequence of T)