Regular Expression Matching
Given an input string (s) and a pattern (p), implement regular expression matching with support for '.' and '*'.
'.' Matches any single character.
'*' Matches zero or more of the preceding element.
The matching should cover the entire input string (not partial).
Note:
s could be empty and contains only lowercase letters a-z.
p could be empty and contains only lowercase letters a-z, and characters like . or *.
这道题目中special matching character 只有 .
和 *
,并且*
前一定有 a-z
或 .
除了题目中给的例子,这里再放几个自己在解题过程中失败的测试情况:
Input1: s = "ab", p = ".*c"
Input2: s = "aaa", p = "a*a"
Input3: s = "aaa", p = "ab*a*c*a"
class Solution:
def isMatch(self, s: str, p: str) -> bool:
l1,l2 = len(s),len(p)
i = j = 0
while (i < l1 and j < l2):
if (p[j]=="*"):
# 如果*前的字符p[j-1] == s[i]或".",j不增加,i加1,即*会尽可能多得匹配前一个字符
if (j>=1 and (p[j-1]=="." or p[j-1]==s[i])):
i += 1
#否则,j加1
else:
j += 1
# 如果p[j]不是"*",p[j] == s[i]或".",则j和i都加1,一起往后匹配
elif (p[j]=="." or p[j]==s[i]):
i += 1
j += 1
# 如果前两种都不符合,判断p[j+1]是否为"*",如果是,则可以将pattern往后跳两个字符(Matches zero of the preceding element.)
elif (j+2<l2 and p[j+1]=="*"):
j += 2
# 前三种都不符合的话就是不匹配
else:
return False
# 如果 s 被匹配完了则返回True
return True if i==l1 else False
这是第一版自写解法,后来也根据失败的测试例子改了几次判断是 return True or False 的 if statement,但是因为本身构思就不对,所以无法得到正确解法。自写解法的问题在于:
*
,要么match zero of the preceding element,要么match as many as possible of the preceding characterreturn True if (i==l1 && j==l2) else False
,存在的问题在于如果pattern的最后一个字符是*
,如s="aaa", p="a*"
,j会一直停在*
的位置。此问题可以通过return True if (i==l1 and (j==l2 or p[l2-1]=="*")) else False
来解决拿Input3: s = "aaa", p = "ab*a*c*a"
来分析一下第一个问题:
a == a, 判断 “aa" 是否匹配 "b*a*c*a"
a != b 且 b 后为 *,判断 “aa" 是否匹配 "a*c*a"
问题出在,“aa” 会被 “a*” 匹配完,pattern里还剩下 “c*a”
这是因为在我的解法中 "*" matches zero or as many as possible of the preceding element
但其实如果 “a*” 只匹配一个“a”,“c*” 匹配0个字符, “a" 匹配text里最后一个 "a"
Input3 的返回值应该为 True
Without a Kleene star, our solution would look like this:
def match(text, pattern):
if not pattern: return not text
first_match = bool(text) and pattern[0] in {
text[0], '.'}
return first_match and match(text[1:], pattern[1:])
If a star is present in the pattern, it will be in the second position pattern[1]. Then, we may ignore this part of the pattern, or delete a matching character in the text. If we have a match on the remaining strings after any of these operations, then the initial inputs matched.
class Solution(object):
def isMatch(self, text, pattern):
if not pattern:
return not text
first_match = bool(text) and pattern[0] in {
text[0], '.'}
if len(pattern) >= 2 and pattern[1] == '*':
# 先判断self.isMatch(text, pattern[2:])是否为True
# 如果为True则返回True
# 否则再去判断 first_match and self.isMatch(text[1:], pattern)是否为True
return (self.isMatch(text, pattern[2:]) or first_match and self.isMatch(text[1:], pattern))
else:
return first_match and self.isMatch(text[1:], pattern[1:])
这个是官网上放出的解法。需要提一下的是:
For A or B
- if A is True, then result is True, B's bool value would be ignored
- if A is False, B's bool value would be checked
For A and B
- if if A is False, then result is False, B's bool value would be ignored
- if A is True, B's bool value would be checked
So (self.isMatch(text, pattern[2:]) or first_match and self.isMatch(text[1:], pattern)) (A or B and C)
- If A is True, result is True, B and C's bool value would be ignored
- If A is False, (B and C)'s bool value would be checked
所以官网的思路是如果第二个字符是*
,先看match zero of preceding element是否符合,如果不符合再去看first_match是否为True,如果为True,则减少一个text中的匹配字符,再去判断。所以这个的思路是依次去判断*
是该匹配0/1/… of preceding character,而不是像我的解法那样,匹配0/as many as possible of preceding character。
如果从前往后匹配,每次都得判断后面是否跟着*
,而且还要考虑越界问题。但是从后往前匹配没这个问题,一旦遇到*
,前面必然有字符。
如果p最后一个字符为*,判断 len(s) and (p[-2] == '.' or p[-2] == s[-1])
如果相同先尝试匹配掉 s[-1] 这个字符,再判断 s[:-1] 与 p 是否匹配,如果匹配返回true
否则不管 p[-2] 与 s[-1] 相同不相同,都不匹配 s[-1] 的这个字符(跳过*前面的字符)
如果p最后一个字符不为*,直接看 s[-1]和 p[-1] 是否相等,不相等返回false,相等则继续向前匹配
再考虑退出的情况
如果p已经匹配完了,这时候如果s匹配完了,返回true,如果s没匹配完,返回false
如果s已经匹配完了,这时候p可以没匹配完,只要还有*存在,继续执行代码
class Solution(object):
def isMatch(self, s, p):
if not p: return not s
if p[-1] == '*':
return (len(s) and (p[-2] == '.' or p[-2] == s[-1]) and self.isMatch(s[:-1], p)) or self.isMatch(s,p[:-2])
else:
return len(s) and (p[-1] == '.' or p[-1] == s[-1]) and self.isMatch(s[:-1], p[:-1])
class Solution(object):
def isMatch(self, text, pattern):
memo = {
}
def dp(i, j):
if (i, j) not in memo:
if j == len(pattern):
ans = i == len(text)
else:
first_match = i < len(text) and pattern[j] in {
text[i], '.'}
if j+1 < len(pattern) and pattern[j+1] == '*':
ans = dp(i, j+2) or first_match and dp(i+1, j)
else:
ans = first_match and dp(i+1, j+1)
memo[i, j] = ans
return memo[i, j]
return dp(0, 0)
Dictionary memo 用作memory function,这样可以避免同一个entry被计算多次(top-down variation 一直会自上往下 call,同一个entry可能会被call多次)
class Solution(object):
def isMatch(self, s, p):
"""
:type s: str
:type p: str
:rtype: bool
"""
lp = len(p)
ls = len(s)
dp = [[0] * (lp+1) for i in range(ls+1)]
dp[0][0] = 1
for j in range(1,lp):
if p[j] == '*' and dp[0][j-1]==1:
dp[0][j+1] = 1
for i in range(ls):
for j in range(lp):
if p[j] == s[i] or p[j] == '.':
dp[i+1][j+1] = dp[i][j]
if p[j] == '*':
if p[j-1] == s[i] or p[j-1] == '.':
# s[i]* match zero or one s[i], or leave s[i] to be matched by behind patterns
dp[i+1][j+1] = dp[i+1][j-1] or dp[i+1][j] or dp[i][j+1]
else:
dp[i+1][j+1] = dp[i+1][j-1]
return dp[ls][lp]==1
p[j] == s[i] or p[j] == '.': : dp[i][j] = dp[i-1][j-1]
if p[j] == '*':
here are two sub conditions:
if p.charAt(j-1) != s.charAt(i) : dp[i][j] = dp[i][j-2] //in this case, a* only counts as empty
if if p[j-1] == s[i] or p[j-1] == '.':
dp[i+1][j+1] = dp[i+1][j-1] // in this case, s[i]* counts as empty
dp[i+1][j+1] = dp[i+1][j] // in this case, s[i]* counts as single s[i]
dp[i+1][j+1] = dp[i+1][j-1] or dp[i+1][j] or dp[i][j+1] // in this case, s[i]* counts as multiple s[i]
class Solution(object):
def isMatch(self, s, p):
"""
:type s: str
:type p: str
:rtype: bool
"""
lp = len(p)
ls = len(s)
dp = [[False] * (lp+1) for i in range(ls+1)]
dp[0][0] = True
for i in range(ls+1):
for j in range(1,lp+1):
if p[j-1] == '*':
#s[i-1] repeat 0 times or (1 or more) times
dp[i][j] = dp[i][j - 2] or (i > 0 and (s[i - 1] == p[j - 2] or p[j - 2] == '.') and dp[i - 1][j])
else:
dp[i][j] = i > 0 and dp[i - 1][j - 1] and (s[i - 1] == p[j - 1] or p[j - 1] == '.')
return dp[ls][lp]
class Solution(object):
def isMatch(self, text, pattern):
dp = [[False] * (len(pattern) + 1) for _ in range(len(text) + 1)]
dp[-1][-1] = True
for i in range(len(text), -1, -1):
for j in range(len(pattern) - 1, -1, -1):
first_match = i < len(text) and pattern[j] in {
text[i], '.'}
if j+1 < len(pattern) and pattern[j+1] == '*':
dp[i][j] = dp[i][j+2] or first_match and dp[i+1][j]
else:
dp[i][j] = first_match and dp[i+1][j+1]
return dp[0][0]
见官网,Recursive的解法时间复杂度高, DP(动态规划)解法是在用空间换时间
官网解法及复杂度分析
leetcode题解(10): Regular Expression Matching——DP解决正则匹配