python 正则表达式(re)
96 thinkando
https://www.jianshu.com/p/39fb062abbe0
2017.12.03 23:47 字数 584 阅读 151评论 0喜欢 2
No. 目录
import re
result = re.match(“hello”,“hello.cn”)
print(result.group())
hello
3. 单字符匹配
字符 功能
. 匹配任意1个字符(除了\n)
[ ] 匹配[ ]中列举的字符
\d 匹配数字,即0-9
\D 匹配非数字,即不是数字
\s 匹配空白,即 空格,tab键
\S 匹配非空白
\w 匹配单词字符,即a-z、A-Z、0-9、_
\W 匹配非单词字符
示例1: .
import re
ret = re.match(".",“a”) # 匹配任意一个
ret.group()
‘a’
示例2:[ ]coding=utf-8
import re
ret = re.match(“h”, “hello Python”) # 正则表达式区分大小写
ret.group()
‘h’ret = re.match(“H”, “Hello Python”)
ret.group()
‘H’ret = re.match("[hH]", “hello Python”) # 大小写h都可以的情况
ret.group()
‘h’ret = re.match("[hH]", “Hello Python”)
ret.group()
‘H’ret = re.match("[0123456789]", “7Hello Python”) # 匹配数字一
ret.group()
‘7’ret = re.match("[0-9]", “7Hello Python”) # 匹配数字二
ret.group()
‘7’
示例3:\d
import re
ret = re.match(“嫦娥1号”,“嫦娥1号发射成功”)
print(ret.group())
嫦娥1号
ret = re.match(“嫦娥\d号”,“嫦娥1号发射成功”)
print(ret.group())
嫦娥1号
4. 原始字符窜
Python中字符串前面加上 r 表示原生字符串
如果路径很长,你一定会恨死反斜杠的
mm = “c:\a\b\c”
mm
‘c:\a\b\c’
print(mm)
c:\a\b\c
re.match(“c:\\”,mm).group()
‘c:\’
ret = re.match(“c:\\”,mm).group()
print(ret)
c:
ret = re.match(“c:\\a”,mm).group()
print(ret)
c:\a
ret = re.match(r"c:\a",mm).group()
print(ret)
c:\a
ret = re.match(r"c:\a",mm).group()
Traceback (most recent call last):
File “”, line 1, in
AttributeError: ‘NoneType’ object has no attribute ‘group’
5.多字符匹配
字符 功能
ret = re.match("[A-Z][a-z]",“Aabcdef”)
ret.group()
‘Aabcdef’
匹配出,变量名是否有效
ret = re.match("[a-zA-Z_]+[\w_]",“name1”)
ret.group()
‘name1’
ret = re.match("[a-zA-Z_]+[\w_]*","_name")
ret.group()
‘_name’
ret = re.match("[a-zA-Z_]+[\w_]*",“2_name”)
ret.group()
Traceback (most recent call last):
File “”, line 1, in
AttributeError: ‘NoneType’ object has no attribute ‘group’
匹配出,0到99之间的数字
ret = re.match("[1-9]?[0-9]",“7”)
ret.group()
‘7’
ret = re.match("[1-9]?[0-9]",“33”)
ret.group()
‘33’
ret = re.match("[1-9]?[0-9]",“09”)
ret.group()
‘0’
6. 边界匹配
字符 功能
^ 匹配字符串开头
$ 匹配字符串结尾
\b 匹配一个单词的边界
\B 匹配非单词边界
示例1:匹配163.com的邮箱地址
ret = re.match("[\w]{4,20}@163.com", "[email protected]")
ret.group()
‘[email protected]’
ret = re.match("[\w]{4,20}@163.com", “[email protected]”)
ret.group()
‘[email protected]’
ret = re.match("[\w]{4,20}@163.com$", “[email protected]”)
ret.group()
Traceback (most recent call last):
File “”, line 1, in
AttributeError: ‘NoneType’ object has no attribute ‘group’
示例2: \b
re.match(r".*\bver\b", “ho ver abc”).group()
‘ho ver’
re.match(r".*\bver\b", “ho verabc”).group()
Traceback (most recent call last):
File “”, line 1, in
AttributeError: ‘NoneType’ object has no attribute ‘group’
re.match(r".*\bver\b", “hover abc”).group()
Traceback (most recent call last):
File “”, line 1, in
AttributeError: ‘NoneType’ object has no attribute ‘group’
示例3:\B
re.match(r".*\Bver\B", “hoverabc”).group()
‘hover’
re.match(r".*\Bver\B", “ho verabc”).group()
Traceback (most recent call last):
File “”, line 1, in
AttributeError: ‘NoneType’ object has no attribute ‘group’
re.match(r".*\Bver\B", “hover abc”).group()
Traceback (most recent call last):
File “”, line 1, in
AttributeError: ‘NoneType’ object has no attribute ‘group’
re.match(r".*\Bver\B", “ho ver abc”).group()
Traceback (most recent call last):
File “”, line 1, in
AttributeError: ‘NoneType’ object has no attribute ‘group’
import re
ret = re.match("[1-9]?\d",“8”)
ret.group()
‘8’
ret = re.match("[1-9]?\d",“78”)
ret.group()
‘78’
ret = re.match("[1-9]?\d",“08”)
ret.group()
‘0’
ret = re.match("[1-9]?\d$",“08”)
ret.group()
Traceback (most recent call last):
File “”, line 1, in
AttributeError: ‘NoneType’ object has no attribute ‘group’
ret = re.match("[1-9]?\d$|100",“8”)
ret.group()
‘8’
ret = re.match("[1-9]?\d$|100",“78”)
ret.group()
‘78’
ret = re.match("[1-9]?\d$|100",“08”)
ret.group()
Traceback (most recent call last):
File “”, line 1, in
AttributeError: ‘NoneType’ object has no attribute ‘group’
ret = re.match("[1-9]?\d$|100",“100”)
ret.group()
‘100’
匹配出163、126、qq邮箱之间的数字
ret = re.match("\w{4,20}@163.com", "[email protected]")
ret.group()
‘[email protected]’ret = re.match("\w{4,20}@(163|126|qq).com", "[email protected]")
ret.group()
‘[email protected]’ret = re.match("\w{4,20}@(163|126|qq).com", "[email protected]")
ret.group()
‘[email protected]’ret = re.match("\w{4,20}@(163|126|qq).com", "[email protected]")
ret.group()
Traceback (most recent call last):
File “”, line 1, in
AttributeError: ‘NoneType’ object has no attribute ‘group’
groupret = re.match("([^-]*)-(\d+)",“010-12345678”)
ret.group()
‘010-12345678’ret.group(1)
‘010’ret.group(2)
‘12345678’
匹配出hh
ret = re.match("<[a-zA-Z]>\w[a-zA-Z]*>", “hh”)
ret.group()
‘hh’
ret = re.match("<[a-zA-Z]>\w[a-zA-Z]*>", “hh”)
ret.group()
‘hh’
ret = re.match(r"<([a-zA-Z])>\w\1>", “hh”)
ret.group()
‘hh’
ret = re.match(r"<([a-zA-Z])>\w\1>", “hh”)
ret.group()
Traceback (most recent call last):
File “”, line 1, in
AttributeError: ‘NoneType’ object has no attribute ‘group’
匹配出www.hello.cn
ret = re.match(r"<(\w*)><(\w*)>.*\2>\1>", “
www.hello.cn
”)
ret.group()
‘www.hello.cn
’ret = re.match(r"<(\w*)><(\w*)>.*\2>\1>", “
www.hello.cn
”)
ret.group()
Traceback (most recent call last):
File “”, line 1, in
AttributeError: ‘NoneType’ object has no attribute ‘group’
匹配出www.hello.cn
(方法二)ret = re.match(r"<(?P\w*)><(?P\w*)>.*(?P=name2)>(?P=name1)>", “
www.itcast.cn
”)
ret.group()
‘www.itcast.cn
’ret = re.match(r"<(?P\w*)><(?P\w*)>.*(?P=name2)>(?P=name1)>", “
www.hello.cn
”)
ret.group()
Traceback (most recent call last):
File “”, line 1, in
AttributeError: ‘NoneType’ object has no attribute ‘group’
ret = re.search(r"\d+", “阅读次数为 9999”)
print(ret.group())
9999
8.2 findall
统计出python、c、c++相应文章阅读的次数
ret = re.findall(r"\d+", “python = 9999, c = 7890, c++ = 12345”)
print (ret)
[‘9999’, ‘7890’, ‘12345’]
8.3 sub 将匹配到的数据进行替换
将匹配到的阅读次数加1
#coding=utf-8
import re
ret = re.sub(r"\d+", ‘998’, “python = 997”)
print (ret)
python = 998
方法二
#coding=utf-8
import re
def add(temp):
strNum = temp.group()
num = int(strNum) + 1
return str(num)
ret = re.sub(r"\d+", add, “python = 997”)
print (ret)
ret = re.sub(r"\d+", add, “python = 99”)
print (ret)
python = 998
python = 100
8.4 split 根据匹配进行切割字符串,并返回一个列表
#coding=utf-8
import re
ret = re.split(r"? ",“info:xiaoZhang 33 shandong”)
print (ret)
[‘info’, ‘xiaoZhang’, ‘33’, ‘shandong’]
9. 贪婪和非贪婪
Python里数量词默认是贪婪的(在少数语言里也可能是默认非贪婪),总是尝试匹配尽可能多的字符;非贪婪则相反,总是尝试匹配尽可能少的字符。
在"*","?","+","{m,n}“后面加上?,使贪婪变成非贪婪。
s=“This is a number 234-235-22-423”
r=re.match(”(.+)(\d±\d±\d±\d+)",s)
print(r.group(1))
print(r.group(2))
r=re.match("(.+?)(\d±\d±\d±\d+)",s)
print(r.group(1))
print(r.group(2))
This is a number 23
4-235-22-423
This is a number
234-235-22-423
re.match(r"aa(\d+)",“aa2343ddd”).group(1)
‘2343’re.match(r"aa(\d+?)",“aa2343ddd”).group(1)
‘2’re.match(r"aa(\d+)ddd",“aa2343ddd”).group(1)
‘2343’re.match(r"aa(\d+?)ddd",“aa2343ddd”).group(1)
‘2343’
0-9 ↩︎
0-9 ↩︎
0-9 ↩︎
1-9 ↩︎
1-9 ↩︎
1-9 ↩︎
1-9 ↩︎
\u4e00-\u9fa5 ↩︎
A-Za-z0-9 ↩︎
A-Za-z0-9 ↩︎
A-Za-z ↩︎
A-Z ↩︎
a-z ↩︎
A-Za-z0-9 ↩︎
\u4E00-\u9FA5A-Za-z0-9_ ↩︎
\u4E00-\u9FA5A-Za-z0-9 ↩︎
\u4E00-\u9FA5A-Za-z0-9 ↩︎
a-zA-Z ↩︎
a-zA-Z ↩︎