参考自:http://blog.jobbole.com/74844/
1、首先python中要使用regex,必须导入模块 re
>>>import re
\的使用,要想输出元字符(. * - + \ $ ^ 等),必须在前边加\ , 才能输出。
>>> string='this is a\nstring'
>>> print string
this is a
string
>>> string1=r'this is a\nstring' #前边加r可以当做一个正则式,后边全是字符,没有其他含义
>>> print string1
this is a\nstring
2、下面我们看一下,re中常用的几个函数:
re.match(regex,string)方法,只有当被搜索字符串的开头匹配到定义的regex时,才能查找到匹配对象。
>>> import re
>>> re.match(r'hello','hello world hello boys!')
<_sre.SRE_Match object at 0x7fcaa1b0c308> #returnamatch_object
>>> re.match(r'world','hello world hello boys!')
>>>
re.search(regex,string)方法,这个不会只限制与在开头进行匹配,而是在整个string中进行匹配,但是只返回匹配到的第一个。
>>> match_obj = re.search(r'world','hello world ,hello world!') #returnamatch_object
>>> match_obj.group(0)
'world'
re.findall(regex,string)方法,这个是查找所有的符合regex的element, 返回的时一个list
>>> print re.findall(r'world','hello world ,hello world!')
['world', 'world']
3、详细讲解一下前边用到的group函数
>>> contactInfo = 'Doe, John: 555-1212'
>>> match = re.search(r'(\w+), (\w+): (\S+)', contactInfo)
>>> match.group(0)
'Doe, John: 555-1212'
>>> match.group(1)
'Doe'
>>> match.group(2)
'John'
>>> match.group(3)
'555-1212'
>>> re.findall(r'(\w+), (\w+): (\S+)', contactInfo)
[('Doe', 'John', '555-1212')]
>>> re.findall(r'\w+, \w+: \S+', contactInfo)
['Doe, John: 555-1212']
可以看出findall()并不适用分组的形式。
4、下面看一个匹配电话号码的例子
import re
#the regex of landline num and phone num
print "the landline regex: "
landline = '0538-1234567'
r_landline = r'\d{4}-?\d{7}'
MatchObject_landline = re.match(r_landline,landline)
if MatchObject_landline==None:
print "match fail"
else:
print "match success"
print "the phone num regex: "
phone_num = '+8618811112222'
r_phone_num = r'\+\d{2}\d{11}'
MatchObject_phone = re.match(r_phone_num,phone_num)
if MatchObject_phone==None:
print "match fail"
else:
print "match success"
import re
#before '@' is a str of length between 3 to 10 ,behind '@' is one more char ,the end is '.com' , '.cn' or .'.org'
email = r'\w{3,10}@\w+(\.com|\.cn)'
match_object = re.match(email,'[email protected]') #return one Match_Object
print match_object.group(); #print the content of match_object