python正则表达式提取文本中的电话号码和邮箱

代码:

#! python3

import pyperclip,re

phoneregex = re.compile(r'''
(\d{3}|\(\d{3}\))?              # area code
(\s|-|\.)?                       # separator
(\d{3})                          # first 3 digits
(\s|-|\.)                        # separator
(\d{4})                          # last 4 digits
(\s*(ext|x|ext.)\s*(\d{2,5}))? # extension
''', re.VERBOSE)

emailregex = re.compile(r'''(
[a-zA-Z0-9._%+-]+               #username
@                               #@symbol
[a-zA-Z0-9.-]+                  #domain name
(\.[a-zA-Z]{2,4})               #dot-something
)''',re.VERBOSE)

text = str(pyperclip.paste())
matches=[]
print(phoneregex.findall(text))
for groups in phoneregex.findall(text):
    print(groups)
    phonenum='-'.join([groups[0],groups[2],groups[4]])
    if groups[7] !='':
        phonenum+=' x'+groups[7]
    matches.append(phonenum)
for groups in emailregex.findall(text):
    matches.append(groups[0])

if len(matches)>0:
    pyperclip.copy('\n'.join(matches))
    print('copied to clipbpard:')
    print('\n'.join(matches))
else:
    print('no phone numbers or eamil addresses found.')
输出:

[('800', '-', '420', '-', '7240', '', '', ''), ('415', '-', '863', '-', '9900', '', '', ''), ('415', '-', '863', '-', '9950', '', '', '')]
('800', '-', '420', '-', '7240', '', '', '')
('415', '-', '863', '-', '9900', '', '', '')
('415', '-', '863', '-', '9950', '', '', '')
copied to clipbpard:
800-420-7240
415-863-9900
415-863-9950
[email protected]
[email protected]
[email protected]
[email protected]

说明:

书中r'''之后有个括号,所以findall会先返还整个匹配成功对象,后面的大括号同理,extension部分先返回整个括号匹配的,在返回两个小括号匹配的


你可能感兴趣的:(python正则表达式提取文本中的电话号码和邮箱)