LeetCode10 Python Regular Expression Matching

LeetCode10 Python Regular Expression Matching 解法总结

  • 题目
  • 解题
    • 错误解法:
    • 正确解法1:Recursion method
      • 写法1
      • 写法2: 回溯法
    • 正确解法2:Dynamic Programming
      • Top-Down Variation
      • Bottom-Up Variation
        • 写法1
        • 写法2
        • 写法3
  • 复杂度分析

题目

Regular Expression Matching

Given an input string (s) and a pattern (p), implement regular expression matching with support for '.' and '*'.

'.' Matches any single character.
'*' Matches zero or more of the preceding element.
The matching should cover the entire input string (not partial).

Note:

s could be empty and contains only lowercase letters a-z.
p could be empty and contains only lowercase letters a-z, and characters like . or *.

这道题目中special matching character 只有 .*,并且*前一定有 a-z.
除了题目中给的例子,这里再放几个自己在解题过程中失败的测试情况:

Input1: s = "ab", p = ".*c"
Input2: s = "aaa", p = "a*a"
Input3: s = "aaa", p = "ab*a*c*a"

解题

错误解法:

class Solution:
    def isMatch(self, s: str, p: str) -> bool:
        l1,l2 = len(s),len(p)
        i = j = 0
        while (i < l1 and j < l2):
            if (p[j]=="*"):
            	# 如果*前的字符p[j-1] == s[i]或".",j不增加,i加1,即*会尽可能多得匹配前一个字符
                if (j>=1 and (p[j-1]=="." or p[j-1]==s[i])): 
                    i += 1
                #否则,j加1
                else:
                    j += 1
            # 如果p[j]不是"*",p[j] == s[i]或".",则j和i都加1,一起往后匹配
            elif (p[j]=="." or p[j]==s[i]):
                i += 1
                j += 1
            # 如果前两种都不符合,判断p[j+1]是否为"*",如果是,则可以将pattern往后跳两个字符(Matches zero of the preceding element.)
            elif (j+2<l2 and p[j+1]=="*"):
                j += 2 
            # 前三种都不符合的话就是不匹配
            else:
                return False
        # 如果 s 被匹配完了则返回True
        return True if i==l1 else False

这是第一版自写解法,后来也根据失败的测试例子改了几次判断是 return True or False 的 if statement,但是因为本身构思就不对,所以无法得到正确解法。自写解法的问题在于:

  1. 如果遇到 *,要么match zero of the preceding element,要么match as many as possible of the preceding character
  2. 最后的 return 的问题在于有的时候text被匹配完了,但是pattern可能还有字符剩下。如果改为return True if (i==l1 && j==l2) else False,存在的问题在于如果pattern的最后一个字符是*,如s="aaa", p="a*",j会一直停在*的位置。此问题可以通过return True if (i==l1 and (j==l2 or p[l2-1]=="*")) else False来解决

Input3: s = "aaa", p = "ab*a*c*a"来分析一下第一个问题:

a == a, 判断 “aa" 是否匹配 "b*a*c*a"
a != b 且 b 后为 *,判断 “aa" 是否匹配 "a*c*a"
问题出在,“aa” 会被 “a*” 匹配完,pattern里还剩下 “c*a”
这是因为在我的解法中 "*" matches zero or as many as possible of the preceding element
但其实如果 “a*” 只匹配一个“a”,“c*” 匹配0个字符, “a" 匹配text里最后一个 "a"
Input3 的返回值应该为 True

正确解法1:Recursion method

写法1

Without a Kleene star, our solution would look like this:

def match(text, pattern):
    if not pattern: return not text
    first_match = bool(text) and pattern[0] in {
     text[0], '.'}
    return first_match and match(text[1:], pattern[1:])

If a star is present in the pattern, it will be in the second position pattern[1]. Then, we may ignore this part of the pattern, or delete a matching character in the text. If we have a match on the remaining strings after any of these operations, then the initial inputs matched.

class Solution(object):
    def isMatch(self, text, pattern):
        if not pattern:
            return not text

        first_match = bool(text) and pattern[0] in {
     text[0], '.'}

        if len(pattern) >= 2 and pattern[1] == '*':
        	# 先判断self.isMatch(text, pattern[2:])是否为True
        	# 如果为True则返回True
        	# 否则再去判断 first_match and self.isMatch(text[1:], pattern)是否为True
            return (self.isMatch(text, pattern[2:]) or first_match and self.isMatch(text[1:], pattern))
        else:
            return first_match and self.isMatch(text[1:], pattern[1:])

这个是官网上放出的解法。需要提一下的是:

For A or B
	- if A is True, then result is True, B's bool value would be ignored
	- if A is False, B's bool value would be checked
For A and B
	- if if A is False, then result is False, B's bool value would be ignored
	- if A is True, B's bool value would be checked

So (self.isMatch(text, pattern[2:]) or first_match and self.isMatch(text[1:], pattern)) (A or B and C)
	- If A is True, result is True, B and C's bool value would be ignored
	- If A is False, (B and C)'s bool value would be checked

所以官网的思路是如果第二个字符是*,先看match zero of preceding element是否符合,如果不符合再去看first_match是否为True,如果为True,则减少一个text中的匹配字符,再去判断。所以这个的思路是依次去判断*是该匹配0/1/… of preceding character,而不是像我的解法那样,匹配0/as many as possible of preceding character。

写法2: 回溯法

如果从前往后匹配,每次都得判断后面是否跟着*,而且还要考虑越界问题。但是从后往前匹配没这个问题,一旦遇到*,前面必然有字符。

如果p最后一个字符为*,判断 len(s) and (p[-2] == '.' or p[-2] == s[-1])
如果相同先尝试匹配掉 s[-1] 这个字符,再判断 s[:-1] 与 p 是否匹配,如果匹配返回true
否则不管 p[-2] 与 s[-1] 相同不相同,都不匹配 s[-1] 的这个字符(跳过*前面的字符)

如果p最后一个字符不为*,直接看 s[-1]和 p[-1] 是否相等,不相等返回false,相等则继续向前匹配

再考虑退出的情况
如果p已经匹配完了,这时候如果s匹配完了,返回true,如果s没匹配完,返回false
如果s已经匹配完了,这时候p可以没匹配完,只要还有*存在,继续执行代码
class Solution(object):
    def isMatch(self, s, p):
        if not p: return not s
        if p[-1] == '*':
            return (len(s) and (p[-2] == '.' or p[-2] == s[-1]) and self.isMatch(s[:-1], p)) or self.isMatch(s,p[:-2])
        else:
            return len(s) and (p[-1] == '.' or p[-1] == s[-1]) and self.isMatch(s[:-1], p[:-1])

正确解法2:Dynamic Programming

Top-Down Variation

class Solution(object):
    def isMatch(self, text, pattern):
        memo = {
     }
        def dp(i, j):
            if (i, j) not in memo:
                if j == len(pattern):
                    ans = i == len(text)
                else:
                    first_match = i < len(text) and pattern[j] in {
     text[i], '.'}
                    if j+1 < len(pattern) and pattern[j+1] == '*':
                        ans = dp(i, j+2) or first_match and dp(i+1, j)
                    else:
                        ans = first_match and dp(i+1, j+1)

                memo[i, j] = ans
            return memo[i, j]

        return dp(0, 0)

Dictionary memo 用作memory function,这样可以避免同一个entry被计算多次(top-down variation 一直会自上往下 call,同一个entry可能会被call多次)

Bottom-Up Variation

写法1

class Solution(object):
    def isMatch(self, s, p):
        """
        :type s: str
        :type p: str
        :rtype: bool
        """
         
        lp = len(p)
        ls = len(s)
        dp = [[0] * (lp+1) for i in range(ls+1)]
        dp[0][0] = 1 
        for j in range(1,lp):
            if p[j] == '*' and dp[0][j-1]==1:
                dp[0][j+1] = 1
        for i in range(ls):
            for j in range(lp):
                if p[j] == s[i] or p[j] == '.':
                    dp[i+1][j+1] = dp[i][j] 
                if p[j] == '*':
                    if p[j-1] == s[i] or p[j-1] == '.':
                    	# s[i]* match zero or one s[i], or leave s[i] to be matched by behind patterns
                        dp[i+1][j+1] = dp[i+1][j-1] or dp[i+1][j] or dp[i][j+1]
                    else:
                        dp[i+1][j+1] = dp[i+1][j-1]
        return dp[ls][lp]==1
p[j] == s[i] or p[j] == '.': : dp[i][j] = dp[i-1][j-1]
if p[j] == '*':
	here are two sub conditions:
	if p.charAt(j-1) != s.charAt(i) : dp[i][j] = dp[i][j-2] //in this case, a* only counts as empty
	if if p[j-1] == s[i] or p[j-1] == '.':
		dp[i+1][j+1] = dp[i+1][j-1] // in this case, s[i]* counts as empty
		dp[i+1][j+1] = dp[i+1][j] // in this case, s[i]* counts as single s[i]
		dp[i+1][j+1] = dp[i+1][j-1] or dp[i+1][j] or dp[i][j+1] // in this case, s[i]* counts as multiple s[i]

写法2

class Solution(object):
    def isMatch(self, s, p):
        """
        :type s: str
        :type p: str
        :rtype: bool
        """
         
        lp = len(p)
        ls = len(s)
        dp = [[False] * (lp+1) for i in range(ls+1)]
        dp[0][0] = True
        for i in range(ls+1):
            for j in range(1,lp+1):
                if p[j-1] == '*':
                	#s[i-1] repeat 0 times or (1 or more) times
                    dp[i][j] = dp[i][j - 2] or (i > 0 and (s[i - 1] == p[j - 2] or p[j - 2] == '.') and dp[i - 1][j])
                else:
                    dp[i][j] = i > 0 and dp[i - 1][j - 1] and (s[i - 1] == p[j - 1] or p[j - 1] == '.')
        return dp[ls][lp]

写法3

class Solution(object):
    def isMatch(self, text, pattern):
        dp = [[False] * (len(pattern) + 1) for _ in range(len(text) + 1)]

        dp[-1][-1] = True
        for i in range(len(text), -1, -1):
            for j in range(len(pattern) - 1, -1, -1):
                first_match = i < len(text) and pattern[j] in {
     text[i], '.'}
                if j+1 < len(pattern) and pattern[j+1] == '*':
                    dp[i][j] = dp[i][j+2] or first_match and dp[i+1][j]
                else:
                    dp[i][j] = first_match and dp[i+1][j+1]

        return dp[0][0]

复杂度分析

见官网,Recursive的解法时间复杂度高, DP(动态规划)解法是在用空间换时间

官网解法及复杂度分析
leetcode题解(10): Regular Expression Matching——DP解决正则匹配

你可能感兴趣的:(刷题笔记,leetcode,算法,正则表达式)