Python Re正则表达式详解

下图列出了Python支持的正则表达式元字符和语法参考文档:
Python Re正则表达式详解_第1张图片

1.1数量词的贪婪模式与非贪婪模式

正则表达式通常用于在文本中查找匹配的字符串。Python里数量词默认是贪婪的(在少数语言里也可能是默认非贪婪),总是尝试匹配尽可能多的字符;非贪婪的则相反,总是尝试匹配尽可能少的字符。例如:正则表达式"ab*“如果用于查找"abbbc”,将找到"abbb"。而如果使用非贪婪的数量词"ab*?",将找到"a"。

2.re模块
2.1 开始使用re
Python通过re模块提供对正则表达式的支持。使用re的一般步骤是先将正则表达式的字符串形式编译为Pattern实例,然后使用Pattern实例处理文本并获得匹配结果(一个Match实例),最后使用Match实例获得信息,进行其他的操作。

# encoding: UTF-8
import re
 
# 将正则表达式编译成Pattern对象
pattern = re.compile(r'hello')
 
# 使用Pattern匹配文本,获得匹配结果,无法匹配时将返回None
match = pattern.match('hello world!')
 
if match:
    # 使用Match获得分组信息
    print match.group()
 
### 输出 ###
# hello

特殊操作

1、**?:**操作:选择要提取的子字符串

>>> re.findall(r'^.*(ing|ly|ed|ious|ies|ive|es|s|ment)$','processing')
>['ing']
>>> re.findall(r'^.*(?:ing|ly|ed|ious|ies|ive|es|s|ment)$','processing')
>['processing']

2、搜索一个文本中的多个词:"".找出文本中的多有 a man 的实例

>>> moby=["a monied man","a nervous man"]
>>> moby.findall(r"(<.*>)")
> monied;nervous;

搜索"x and other ys"的形式表达式

>>> re.findall(r"<\w*><\w*s>",words)
>speed and other activities;water and other liduids;

前向界定符

https://blog.csdn.net/lilongsy/article/details/78505309

含义 语法 示例
前向搜索肯定模式零宽度正预测先行断言 匹配exp前面的位置 (?=exp) 用\b\w+(?=ing\b)查找I’m singing while you’re dancing.匹配到sing danc
前向搜索否定模式零宽度负预测先行断言 匹配后面跟的不是exp的位置 (?!exp) \d{3}(?!\d)匹配三位数字,而且这三位数字的后面不能是数字;\b((?!abc)\w)+\b匹配不包含连续字符串abc的单词
后向搜索肯定模式零宽度正回顾后发断言 匹配exp后面的位置 (?<=exp) 用(?<=\bre)\w+\b查找reading a book得到ading
后向搜索否定模式零宽度负回顾后发断言 匹配前面不是exp的位置 (? (?

代码示例:

    text = "I play on playground. It is the best ground."

    positivelookaheadobjpattern = re.findall(r'play(?=ground)',text,re.M | re.I)
    print "Positive lookahead: " + str(positivelookaheadobjpattern)
    >>> Positive lookahead: ['play']
    positivelookaheadobj = re.search(r'play(?=ground)',text,re.M | re.I)
    print "Positive lookahead character index: "+ str(positivelookaheadobj.span())
	>>> Positive lookahead character index: (10, 14)
	
	negativelookaheadobjpattern = re.findall(r'play(?!ground)', text, re.M | re.I)
    print "Negative lookahead: " + str(negativelookaheadobjpattern)
    >>> Negative lookahead: ['play']
    negativelookaheadobj = re.search(r'play(?!ground)', text, re.M | re.I)
    print "Negative lookahead character index: " + str(negativelookaheadobj.span())
	>>> Negative lookahead character index: (2, 6)

    possitivelookbehindobjpattern = re.findall(r'(?<=play)ground',text,re.M | re.I)
    print "Positive lookbehind: " + str(possitivelookbehindobjpattern)
    >>> Positive lookbehind: ['ground']
    possitivelookbehindobj = re.search(r'(?<=play)ground',text,re.M | re.I)
    print "Positive lookbehind character index: " + str(possitivelookbehindobj.span())
	>>> Positive lookbehind character index: (14, 20)
	
    negativelookbehindobjpattern = re.findall(r'(?, text, re.M | re.I)
    print "negative lookbehind: " + str(negativelookbehindobjpattern)
    >>> negative lookbehind: ['ground']
    negativelookbehindobj = re.search(r'(?, text, re.M | re.I)
    print "Negative lookbehind character index: " + str(negativelookbehindobj.span())
    >>> Negative lookbehind character index: (37, 43)

你可能感兴趣的:(python)