下图列出了Python支持的正则表达式元字符和语法参考文档:
正则表达式通常用于在文本中查找匹配的字符串。Python里数量词默认是贪婪的(在少数语言里也可能是默认非贪婪),总是尝试匹配尽可能多的字符;非贪婪的则相反,总是尝试匹配尽可能少的字符。例如:正则表达式"ab*“如果用于查找"abbbc”,将找到"abbb"。而如果使用非贪婪的数量词"ab*?",将找到"a"。
2.re模块
2.1 开始使用re
Python通过re模块提供对正则表达式的支持。使用re的一般步骤是先将正则表达式的字符串形式编译为Pattern实例,然后使用Pattern实例处理文本并获得匹配结果(一个Match实例),最后使用Match实例获得信息,进行其他的操作。
# encoding: UTF-8
import re
# 将正则表达式编译成Pattern对象
pattern = re.compile(r'hello')
# 使用Pattern匹配文本,获得匹配结果,无法匹配时将返回None
match = pattern.match('hello world!')
if match:
# 使用Match获得分组信息
print match.group()
### 输出 ###
# hello
1、**?:**
操作:选择要提取的子字符串
>>> re.findall(r'^.*(ing|ly|ed|ious|ies|ive|es|s|ment)$','processing')
>['ing']
>>> re.findall(r'^.*(?:ing|ly|ed|ious|ies|ive|es|s|ment)$','processing')
>['processing']
2、搜索一个文本中的多个词:"".找出文本中的多有 a man 的实例
>>> moby=["a monied man","a nervous man"]
>>> moby.findall(r"(<.*>)" )
> monied;nervous;
搜索"x and other ys"的形式表达式
>>> re.findall(r"<\w*><\w*s>" ,words)
>speed and other activities;water and other liduids;
https://blog.csdn.net/lilongsy/article/details/78505309
含义 | 语法 | 示例 | |
---|---|---|---|
前向搜索肯定模式零宽度正预测先行断言 | 匹配exp前面的位置 | (?=exp) | 用\b\w+(?=ing\b)查找I’m singing while you’re dancing.匹配到sing danc |
前向搜索否定模式零宽度负预测先行断言 | 匹配后面跟的不是exp的位置 | (?!exp) | \d{3}(?!\d)匹配三位数字,而且这三位数字的后面不能是数字;\b((?!abc)\w)+\b匹配不包含连续字符串abc的单词 |
后向搜索肯定模式零宽度正回顾后发断言 | 匹配exp后面的位置 | (?<=exp) | 用(?<=\bre)\w+\b查找reading a book得到ading |
后向搜索否定模式零宽度负回顾后发断言 | 匹配前面不是exp的位置 | (? | (? |
代码示例:
text = "I play on playground. It is the best ground."
positivelookaheadobjpattern = re.findall(r'play(?=ground)',text,re.M | re.I)
print "Positive lookahead: " + str(positivelookaheadobjpattern)
>>> Positive lookahead: ['play']
positivelookaheadobj = re.search(r'play(?=ground)',text,re.M | re.I)
print "Positive lookahead character index: "+ str(positivelookaheadobj.span())
>>> Positive lookahead character index: (10, 14)
negativelookaheadobjpattern = re.findall(r'play(?!ground)', text, re.M | re.I)
print "Negative lookahead: " + str(negativelookaheadobjpattern)
>>> Negative lookahead: ['play']
negativelookaheadobj = re.search(r'play(?!ground)', text, re.M | re.I)
print "Negative lookahead character index: " + str(negativelookaheadobj.span())
>>> Negative lookahead character index: (2, 6)
possitivelookbehindobjpattern = re.findall(r'(?<=play)ground',text,re.M | re.I)
print "Positive lookbehind: " + str(possitivelookbehindobjpattern)
>>> Positive lookbehind: ['ground']
possitivelookbehindobj = re.search(r'(?<=play)ground',text,re.M | re.I)
print "Positive lookbehind character index: " + str(possitivelookbehindobj.span())
>>> Positive lookbehind character index: (14, 20)
negativelookbehindobjpattern = re.findall(r'(?, text, re.M | re.I)
print "negative lookbehind: " + str(negativelookbehindobjpattern)
>>> negative lookbehind: ['ground']
negativelookbehindobj = re.search(r'(?, text, re.M | re.I)
print "Negative lookbehind character index: " + str(negativelookbehindobj.span())
>>> Negative lookbehind character index: (37, 43)