I think we better understand the general solution for string matching problem.
The general solution is Nondeterministic Finite Automate (NFA) and Deterministic Finite Automate (DFA).
If you don’t know or forget the 2 things, do not panic, spend a little time on Compiler Construction and you will learn a lot.
The difference between NFA and DFA is that DFA is a simpler version of NFA, which means DFA run faster than NFA.
But please notice that we need more time to build a DFA from a NFA.
Since we only perform string match once for each query, building a NFA and querying from it is obviously faster than building a DFA from a NFA and querying from the DFA.
Check the NFA solution for more implementation detail.
Say string s’s length is n n n and string p’s length is m m m.
Building a NFA needs O ( m ) O(m) O(m) time and O ( m ) O(m) O(m) space.
Querying an answer needs O ( n ∗ m ) O(n*m) O(n∗m) time in worst case with O ( n ) O(n) O(n) space if we deal with the GC perfectly.
With the idea of NFA, we can evaluate a simpler solution since complex data structures in interview coding are meaningless.
(but they are still very important in daily development!!)
Imagine the process of NFA:
Since each states in NFA corresponding to each characters of string s, we do not need to construct the NFA explicitly.
Every character of string s can be used as NFA states directly.
So we can write the following DP equation:
if P [ j ] P[j] P[j] is a repeatable character (end with ∗ * ∗), then f i , j = f i , j − 1 ∣ ∣ i s C h a r a c t e r M a t c h ( S i , P j ) & & ( f i − 1 , j − 1 ∣ ∣ f i − 1 , j ) f_{i, j} = f_{i, j-1} || isCharacterMatch(S_i, P_j) \&\& (f_{i-1, j-1} || f_{i-1, j}) fi,j=fi,j−1∣∣isCharacterMatch(Si,Pj)&&(fi−1,j−1∣∣fi−1,j)
if P [ j ] P[j] P[j] is not a repeatable character, then f i , j = i s C h a r a c t e r M a t c h ( S i , P j ) & & f i − 1 , j − 1 f_{i, j} = isCharacterMatch(S_i, P_j) \&\& f_{i-1, j-1} fi,j=isCharacterMatch(Si,Pj)&&fi−1,j−1
f i , j f_{i,j} fi,j means the query result of string S [ 0... i ] S[0 ... i] S[0...i] from the P [ 0... j ] P[0 ... j] P[0...j] NFA.
Then the simpler DP solution come out:
boolean isMatch(int i, int j) {
if (j < 0) {
return i < 0;
}
if (i < 0) {
return '*' == p[j] && isMatch(i, j-2);
}
if ('*' == p[j]) {
// repeatable
int realJ = j -1;
return isMatch(i, realJ-1) ||
isCharacterMatch(s[i], p[realJ]) && (isMatch(i-1, realJ-1) || isMatch(i-1, j));
}
else {
return isCharacterMatch(s[i], p[j]) && isMatch(i-1, j-1);
}
}
boolean isCharacterMatch(char chS, char chP) {
return '.' == chP || chS == chP;
}
Needs O ( N ∗ M ) O(N*M) O(N∗M) time and space in worst case.