split拆分常和strip使用
val='a,b,guido'
val.split(',')
['a', 'b', 'guido']
pieces=[ x.strip() for x in val.split(',')]
pieces
['a', 'b', 'guido']
'::'.join(pieces)
'a::b::guido'
first,second,third=pieces
first+'::'+'::'+second+'::'+third
'a::::b::guido'
in, find, index 子串定位
find index 的区别是: 如果找不到字符串,index将会引发一个异常, 而不是返回-1
'guido' in val
True
val.index(',')
1
val.find(':')
-1
val.find(',')
1
val.inde(':')
ValueError Traceback (most recent call last)
in ()
----> 1 val.index(':')
ValueError: substring not found
count 返回子字符串出现的次数
replace用于将制定模式替换为另一个模式,也常常用于删除模式,传入空字符串
val.replace(',','::')
val.replace(',','')
正则表达式
re模块的函数可以分为三个大类: 模式匹配,替换以及拆分
import re
text="foo bar \t baz \tqux"
re.split('\s+',text)
调用re.split正则表达式会先被编译,然后再在text上调用split方法。可以用re.compile自己编译regex以得到一个可以重用的regex对象
regex=re.compile('\s+')
regex.split(text)
如果希望得到匹配到的所有模式,使用findall
regex.findall(text)
[' ', ' \t ', ' \t']
findall 返回字符串中所有的匹配项
search 只返回第一个匹配项
match 只匹配字符串的首部
text="""
Dave [email protected]
Steve [email protected]
Ryan [email protected]
"""
pattern = r'[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}'
regex=re.compile(pattern, flags=re.IGNORECASE)
regex.findall(text)
['[email protected]', '[email protected]', '[email protected]', '[email protected]']
m=regex.search(text)
m
text[m.start():m.end()]
print (regex.match(text))
None
sub 来替换指定的字符串
print (regex.sub('REDACTED',text))
Dave REDACTED
Steve REDACTED
Rob REDACTED
Ryan REDACTED
groups 返回模式匹配的一组
pattern = r'([A-Z0-9._%+-]+)@([A-Z0-9.-]+)\.([A-Z]{2,4})'
regex=re.compile(pattern,flags=re.IGNORECASE)
m=regex.match('[email protected]')
m.groups()
('wesm', 'bright', 'net')
findall 返回一个元组列表
regex.findall(text)
[('dave', 'google', 'com'),
('steve', 'gmail', 'com'),
('rob', 'gmail', 'com'),
('ryan', 'yahoo', 'com')]
print (regex.sub(r'Usrname: \1, Domain:\2, Suffix:\3',text))
Dave Usrname: dave, Domain:google, Suffix:com
Steve Usrname: steve, Domain:gmail, Suffix:com
Rob Usrname: rob, Domain:gmail, Suffix:com
Ryan Usrname: ryan, Domain:yahoo, Suffix:com