飘逸的python - 实现glob style pattern

一说起通配符,大家很快就会想起*和?号,有了通配符,使得表达能力大大增强,很多linux命令都支持这个东西,其实就是glob style pattern.
就连redis的keys命令都支持glob.

我要实现的glob,支持以下特性:
  • 星号*匹配0个或多个任意字符
  • ?匹配确切的一个任意字符
  • [characters]匹配任意一个方括号内的字符,比如[abc],要么匹配a,要么匹配b,要么匹配c.
  • [!character]排除方括号内的字符
  • [character-character],表示2个字符范围内的都可以匹配,如[a-z],[0-9]

实现这个东西其实挺简单的,从左往右扫描s串和p串,如果最后都走到了结尾,那么就是可以匹配的.
主要难点在于*号的匹配.因为*号可以匹配0个或者多个,所以需要试探回溯.这里通过保存*号位置,如果后面的走不通了,就拉回*号位置,贪婪匹配.

至于方括号的展开,弄个include和exclude变量就很清晰了.


下面上代码.

#coding=utf-8
def build_expand(p):#方括号展开
    ptr2include = {}
    ptr2exclude = {}
    ptr2next = {}
    len_p = len(p)
    pPtr = 0
    while pPtr<len_p:
        if p[pPtr] == '[':
            start = pPtr
            pPtr += 1
            include = set([])
            exclude = set([])
            while p[pPtr]!=']':
                if p[pPtr]=='!':
                    exclude.add(p[pPtr+1])
                    pPtr += 2
                elif p[pPtr+1] == '-':
                    include.update({chr(x) for x in range(ord(p[pPtr]),ord(p[pPtr+2])+1)})
                    pPtr += 3
                else:
                    include.add(p[pPtr])
                    pPtr += 1
            if include:
                ptr2include[start] = include
            if exclude:
                ptr2exclude[start] = exclude
            ptr2next[start] = pPtr + 1
        else:
            pPtr += 1
    return ptr2include, ptr2exclude, ptr2next

def isMatch(s, p):
    len_s = len(s); len_p = len(p)
    sPtr = pPtr = ss = 0
    star = None
    ptr2include, ptr2exclude, ptr2next = build_expand(p)
    while sPtr<len_s:
        if pPtr<len_p and (p[pPtr] in ['?',s[sPtr]]):
            sPtr += 1; pPtr += 1
            continue
        if pPtr<len_p and p[pPtr] == '[':
            if pPtr in ptr2include and s[sPtr] in ptr2include[pPtr]:
                sPtr += 1
                pPtr = ptr2next[pPtr]
                continue
            if pPtr in ptr2exclude and s[sPtr] not in ptr2exclude[pPtr]:
                sPtr += 1
                pPtr = ptr2next[pPtr]
                continue
        if pPtr<len_p and p[pPtr]=='*':
            star = pPtr; pPtr += 1; ss = sPtr
            continue
        if star is not None:
            pPtr = star + 1; ss += 1; sPtr = ss
            continue
        return False
    while pPtr<len(p) and p[pPtr]=='*':
        pPtr += 1
    return pPtr == len_p

if __name__ == '__main__':
    params = [
            ("aa","a"),
            ("aa","aa"),
            ("aaa","aa"),
            ("aa", "*"),
            ("aa", "a*"),
            ("ab", "?*"),
            ("aab", "c*a*b"),
            ("cab", "c*a*b"),
            ("cxyzbazba", "c*ba"),
            ('abc','ab[a-c]'),
            ('abd','ab[a-c]'),
            ('abe','ab[cde]'),
            ('abe','ab[!e]'),
            ('abe','ab[!c]'),
        ]

    for p in params:
        print p,isMatch(*p)

运行结果是

('aa', 'a') False
('aa', 'aa') True
('aaa', 'aa') False
('aa', '*') True
('aa', 'a*') True
('ab', '?*') True
('aab', 'c*a*b') False
('cab', 'c*a*b') True
('cxyzbazba', 'c*ba') True
('abc', 'ab[a-c]') True
('abd', 'ab[a-c]') False
('abe', 'ab[cde]') True
('abe', 'ab[!e]') False
('abe', 'ab[!c]') True



你可能感兴趣的:(正则,模式匹配,glob)