基本的 re
方法如下:
re.search()
re.findall()
re.sub()
re.compile()
re.split()
具体使用细节如下:
1.简单 Python 匹配
>>> a = 'www'
... s = 'www.baidu.com'
... print(a in s)
True
2.用正则寻找匹配
>>> import re
>>> a = 'www'
>>> s = 'www.baidu.com'
>>> print(re.search(a, s))
<_sre.SRE_Match object at 0x106bf69f0>
3.匹配多种可能,使用 []
>>> a = r"r[au]n" #如果没有前面的 r 则是普通的字符串,现在有,就是这正则表达式;同时匹配 run 和 ran
... print(re.search(a, 'dog runs to cat'))
<_sre.SRE_Match object at 0x106bf6d30>
4.匹配更多种可能,使用类似 [0-9a-z]
的形式
>>> a = r"r[au]n"
... print(re.search(a, 'dog runs to cat'))
... print(re.search(r"r[A-Z]n",'dog runs to cat'))
... print(re.search(r"r[a-z]n",'dog runs to cat'))
... print(re.search(r"r[0-9a-z]n",'dog runs to cat')) #注意这里的同时匹配 0-9 和 a-z 的写法
<_sre.SRE_Match object at 0x1065972a0>
None
<_sre.SRE_Match object at 0x1065972a0>
<_sre.SRE_Match object at 0x1065972a0>
5.数字
\d
:任何数字,\D
:不是数字
>>> print(re.search(r"r\dn", "run r5n").span())
... print(re.search(r"r\Dn", "run r5n").span())
(4, 7)
(0, 3)
6.空白符
\s
:任何空白符,\S
:非空白符
空白符有:\n
\t
\r
\f
\v
>>> print(re.search(r"r\sn", "run r\tn").span())
... print(re.search(r"r\Sn", "r5n"))
(4, 7)
<_sre.SRE_Match object at 0x105781c60>
7.所有数字字母和"_"
\w
:[a-zA-Z0-9] ;\W
:除去\w
的
>>> print(re.search(r"r\wn", "r3n r\tn").span())
... print(re.search(r"r\Wn", "r5n"))
(0, 3)
None
>>> print(re.search(r"r\wn", "rUn r\tn").span())
... print(re.search(r"r\Wn", "r-n"))
(0, 3)
<_sre.SRE_Match object at 0x1066fe850>
8.空白字符
\b
:匹配字符的前一个,或者后一个空白字符;
\B
:除去\b
的
>>> print(re.search(r"r \bn", "r n r\tn").span())
... print(re.search(r"rn \B l", "rn l ").span()) # 注意这里 `\B` 前得有个空格成功才能匹配
(0, 3)
(0, 5)
9.特殊字符,任意字符
使用双反斜杠
\\
匹配\
使用.
匹配除去\n
以为的任意字符
>>> print(re.search(r"r\\m", "ram r\m").span())
(4, 7)
>>> print(re.search(r"r.m", "r\tm").span())
(0, 3)
>>> print(re.search(r"r.m", "r\nm").span())
Traceback (most recent call last):
File "", line 1, in
AttributeError: 'NoneType' object has no attribute 'span'
'NoneType' object has no attribute 'span'
10.句尾句首
句首
^
:r"^this"
句尾$
:r"this$"
>>> print(re.search(r"^this", "this is an apple").span())
(0, 4)
>>> print(re.search(r"this$", "this is an apple this").span())
(17, 21)
11.是否
使用问号
?
>>> print(re.search(r"th(is)?", "th is an apple this").span())
(0, 2)
>>> print(re.search(r"th(is)?", "this is an apple this").span())
(0, 4)
12.多行匹配
匹配句首:
使用I
,同时设置flags=re.M
,其中M
指MULTILINE
>>> s = """
... dog runs to cat
... you run to dog
... """
... print(re.search(r"^you", s))
... print(re.search(r"^you", s, flags=re.M).span())
None
(17, 20)
13.0
次或多次
字符
+*
,表示这个字符可以匹配多次
>>> print(re.search(r"this*", "thissssss an apple this").span())
(0, 9)
>>> print(re.search(r"this*", "thi an apple this").span())
(0, 3)
14.1
或多次
字符
++
,表示该字符至少被匹配到 1 次
>>> print(re.search(r"this+", "thi an apple"))
None
>>> print(re.search(r"this+", "thisss an apple").span())
(0, 6)
15.可选次数
使用
{n, m}
>>> print(re.search(r"this{4,10}", "thisss an apple"))
None
>>> print(re.search(r"this{2,10}", "thisss an apple").span())
(0, 6)
16.group
使用方式1:通过序号
>>> s = re.search(r"(\d), Date:(.+)", "ID:12345, Date:Feb/9/2018")
>>> print(s.group())
5, Date:Feb/9/2018
>>> print(s.group(1))
5
>>> print(s.group(2))
Feb/9/2018
使用方式2:通过设定的变量
>>> s = re.search(r"(?P\d+), Date:(?P.+)", "ID:12345, Date:Feb/9/2018")
>>> print(s.group('idddd'))
12345
>>> print(s.group('datee'))
Feb/9/2018
17.寻找所有匹配
findall 可使用
[]
和|
>>> print(re.findall(r"th[ia]s", "thasss an apple this"))
['thas', 'this']
>>> print(re.findall(r"th[ia]s|an", "thasss an apple this"))
['thas', 'an', 'this']
18.替换
re.sub()
>>> print(re.sub(r"dog", "cat", "dog run to cat"))
cat run to cat
>>> print(re.sub(r"dog", "cat", "dogs run to cat"))
cats run to cat
19.分裂
re.split() 注意括号里的写法,需要加上
[]
>>> print(re.split(r"[;,\.]", "cat, dogs ;run. to cat"))
['cat', ' dogs ', 'run', ' to cat']
20.compile
(编译的意思)
>>> compiled_re = re.compile(r"r[ua]n")
... print(compiled_re.search("dog ran to cat"))
<_sre.SRE_Match object at 0x1066c4168>
小抄: