python正则表达式详解 pandas_Python/Pandas(十四)-字符串与正则-阿里云开发者社区...

split拆分常和strip使用

val='a,b,guido'

val.split(',')

['a', 'b', 'guido']

pieces=[ x.strip() for x in val.split(',')]

pieces

['a', 'b', 'guido']

'::'.join(pieces)

'a::b::guido'

first,second,third=pieces

first+'::'+'::'+second+'::'+third

'a::::b::guido'

in, find, index 子串定位

find index 的区别是: 如果找不到字符串,index将会引发一个异常, 而不是返回-1

'guido' in val

True

val.index(',')

1

val.find(':')

-1

val.find(',')

1

val.inde(':')

ValueError Traceback (most recent call last)

in ()

----> 1 val.index(':')

ValueError: substring not found

count 返回子字符串出现的次数

replace用于将制定模式替换为另一个模式,也常常用于删除模式,传入空字符串

val.replace(',','::')

val.replace(',','')

正则表达式

re模块的函数可以分为三个大类: 模式匹配,替换以及拆分

import re

text="foo bar \t baz \tqux"

re.split('\s+',text)

调用re.split正则表达式会先被编译,然后再在text上调用split方法。可以用re.compile自己编译regex以得到一个可以重用的regex对象

regex=re.compile('\s+')

regex.split(text)

如果希望得到匹配到的所有模式,使用findall

regex.findall(text)

[' ', ' \t ', ' \t']

findall 返回字符串中所有的匹配项

search 只返回第一个匹配项

match 只匹配字符串的首部

text="""

Dave [email protected]

Steve [email protected]

Rob [email protected]

Ryan [email protected]

"""

pattern = r'[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}'

regex=re.compile(pattern, flags=re.IGNORECASE)

regex.findall(text)

['[email protected]', '[email protected]', '[email protected]', '[email protected]']

m=regex.search(text)

m

text[m.start():m.end()]

'[email protected]'

print (regex.match(text))

None

sub 来替换指定的字符串

print (regex.sub('REDACTED',text))

Dave REDACTED

Steve REDACTED

Rob REDACTED

Ryan REDACTED

groups 返回模式匹配的一组

pattern = r'([A-Z0-9._%+-]+)@([A-Z0-9.-]+)\.([A-Z]{2,4})'

regex=re.compile(pattern,flags=re.IGNORECASE)

m=regex.match('[email protected]')

m.groups()

('wesm', 'bright', 'net')

findall 返回一个元组列表

regex.findall(text)

[('dave', 'google', 'com'),

('steve', 'gmail', 'com'),

('rob', 'gmail', 'com'),

('ryan', 'yahoo', 'com')]

print (regex.sub(r'Usrname: \1, Domain:\2, Suffix:\3',text))

Dave Usrname: dave, Domain:google, Suffix:com

Steve Usrname: steve, Domain:gmail, Suffix:com

Rob Usrname: rob, Domain:gmail, Suffix:com

Ryan Usrname: ryan, Domain:yahoo, Suffix:com

你可能感兴趣的:(python正则表达式详解,pandas)