Python 正则表达式

基本的 re 方法如下:

re.search()
re.findall()
re.sub()
re.compile()
re.split()

具体使用细节如下:

1.简单 Python 匹配

>>> a = 'www'
... s = 'www.baidu.com'
... print(a in s)
True

2.用正则寻找匹配

>>> import re

>>> a = 'www'

>>> s = 'www.baidu.com'

>>> print(re.search(a, s))
<_sre.SRE_Match object at 0x106bf69f0>

3.匹配多种可能,使用 []

>>> a = r"r[au]n"  #如果没有前面的 r 则是普通的字符串,现在有,就是这正则表达式;同时匹配 run 和 ran
... print(re.search(a, 'dog runs to cat'))
<_sre.SRE_Match object at 0x106bf6d30>

4.匹配更多种可能,使用类似 [0-9a-z] 的形式

>>> a = r"r[au]n"
... print(re.search(a, 'dog runs to cat'))
... print(re.search(r"r[A-Z]n",'dog runs to cat'))
... print(re.search(r"r[a-z]n",'dog runs to cat'))
... print(re.search(r"r[0-9a-z]n",'dog runs to cat')) #注意这里的同时匹配 0-9 和 a-z 的写法
<_sre.SRE_Match object at 0x1065972a0>
None
<_sre.SRE_Match object at 0x1065972a0>
<_sre.SRE_Match object at 0x1065972a0>

5.数字

\d:任何数字,\D:不是数字

>>> print(re.search(r"r\dn", "run r5n").span())
... print(re.search(r"r\Dn", "run r5n").span())
(4, 7)
(0, 3)

6.空白符

\s:任何空白符,\S:非空白符
空白符有:\n \t \r \f \v

>>> print(re.search(r"r\sn", "run r\tn").span())
... print(re.search(r"r\Sn", "r5n"))
(4, 7)
<_sre.SRE_Match object at 0x105781c60>

7.所有数字字母和"_"

\w:[a-zA-Z0-9] ;\W:除去\w

>>> print(re.search(r"r\wn", "r3n r\tn").span())
... print(re.search(r"r\Wn", "r5n"))
(0, 3)
None

>>> print(re.search(r"r\wn", "rUn r\tn").span())
... print(re.search(r"r\Wn", "r-n"))
(0, 3)
<_sre.SRE_Match object at 0x1066fe850>

8.空白字符

\b:匹配字符的前一个,或者后一个空白字符;
\B:除去 \b

>>> print(re.search(r"r \bn", "r n r\tn").span())
... print(re.search(r"rn \B l", "rn  l ").span()) # 注意这里 `\B` 前得有个空格成功才能匹配
(0, 3)
(0, 5)

9.特殊字符,任意字符

使用双反斜杠 \\ 匹配 \
使用 . 匹配除去 \n 以为的任意字符

>>> print(re.search(r"r\\m", "ram r\m").span())
(4, 7)
>>> print(re.search(r"r.m", "r\tm").span())
(0, 3)
>>> print(re.search(r"r.m", "r\nm").span())
Traceback (most recent call last):
  File "", line 1, in 
AttributeError: 'NoneType' object has no attribute 'span'
'NoneType' object has no attribute 'span'

10.句尾句首

句首 ^:r"^this"
句尾 $ :r"this$"

>>> print(re.search(r"^this", "this is an apple").span())
(0, 4)
>>> print(re.search(r"this$", "this is an apple this").span())
(17, 21)

11.是否

使用问号 ?

>>> print(re.search(r"th(is)?", "th is an apple this").span())
(0, 2)

>>> print(re.search(r"th(is)?", "this is an apple this").span())
(0, 4)

12.多行匹配

匹配句首:
使用 I,同时设置 flags=re.M,其中 MMULTILINE

>>> s = """
... dog runs to cat
... you run to dog
... """
... print(re.search(r"^you", s))
... print(re.search(r"^you", s, flags=re.M).span())
None
(17, 20)

13.0 次或多次

字符+*,表示这个字符可以匹配多次

>>> print(re.search(r"this*", "thissssss an apple this").span())
(0, 9)

>>> print(re.search(r"this*", "thi an apple this").span())
(0, 3)

14.1 或多次

字符++,表示该字符至少被匹配到 1 次

>>> print(re.search(r"this+", "thi an apple"))
None

>>> print(re.search(r"this+", "thisss an apple").span())
(0, 6)

15.可选次数

使用 {n, m}

>>> print(re.search(r"this{4,10}", "thisss an apple"))
None

>>> print(re.search(r"this{2,10}", "thisss an apple").span())
(0, 6)

16.group
使用方式1:通过序号

>>> s = re.search(r"(\d), Date:(.+)", "ID:12345, Date:Feb/9/2018")

>>> print(s.group())
5, Date:Feb/9/2018

>>> print(s.group(1))
5

>>> print(s.group(2))
Feb/9/2018

使用方式2:通过设定的变量

>>> s = re.search(r"(?P\d+), Date:(?P.+)", "ID:12345, Date:Feb/9/2018")

>>> print(s.group('idddd'))
12345

>>> print(s.group('datee'))
Feb/9/2018

17.寻找所有匹配

findall 可使用 []|

>>> print(re.findall(r"th[ia]s", "thasss an apple this"))
['thas', 'this']

>>> print(re.findall(r"th[ia]s|an", "thasss an apple this"))
['thas', 'an', 'this']

18.替换

re.sub()

>>> print(re.sub(r"dog", "cat", "dog run to cat"))
cat run to cat

>>> print(re.sub(r"dog", "cat", "dogs run to cat"))
cats run to cat

19.分裂

re.split() 注意括号里的写法,需要加上[]

>>> print(re.split(r"[;,\.]", "cat, dogs ;run. to cat"))
['cat', ' dogs ', 'run', ' to cat']

20.compile (编译的意思)

>>> compiled_re = re.compile(r"r[ua]n")
... print(compiled_re.search("dog ran to cat"))
<_sre.SRE_Match object at 0x1066c4168>

小抄:


小抄

你可能感兴趣的:(Python 正则表达式)