参考链接:
(1)正则表达式匹配URL或者网址https://blog.csdn.net/qq_38819293/article/details/81570751
(2)常用的验证网址的正则表达式
https://blog.csdn.net/guo_qiangqiang/article/details/89286302
(3)【python 正则表达式】python正则表达式提取邮箱、网址、手机号、ip地址
https://blog.csdn.net/u013421629/article/details/82918060
(4)一条完美精确匹配各种url网址的正则表达式
https://blog.csdn.net/qq569699973/article/details/94636893
#Android原生的文本框匹配不够完美,网上的大多不够完美。经过多次修改尝试,完善出了一条能够完美和精确匹配url的正则表达式,可匹配http、https、www、wap等等开头的网址(大小写混搭亦可)携带参数的(转码的、加密的),无限接近微信匹配精确度且速度快性能稳定。
正则表达式:
([hH][tT]{2}[pP]://|[hH][tT]{2}[pP][sS]://|[wW]{3}.|[wW][aA][pP].|[fF][tT][pP].|[fF][iI][lL][eE].)[-A-Za-z0-9+&@#/%?=~_|!:,.;]+[-A-Za-z0-9+&@#/%=~_|]
常见:
正则表达式
(http|ftp|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&:/~\+#]*[\w\-\@?^=%&/~\+#])?
匹配
http://regxlib.com/Default.aspx | http://electronics.cnet.com/electronics/0-6342366-8-8994967-1.html
不匹配
www.yahoo.com
正则表达式
^\\{2}[\w-]+\\(([\w-][\w-\s]*[\w-]+[$$]?$)|([\w-][$$]?$))
匹配
\\server\service | \\server\my service | \\serv_001\service$
不匹配
\\my server\service | \\server\ service | \\server$\service
正则表达式
^(http|https|ftp)\://([a-zA-Z0-9\.\-]+(\:[a-zA-Z0-9\.&%\$\-]+)*@)?((25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9])\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[0-9])|([a-zA-Z0-9\-]+\.)*[a-zA-Z0-9\-]+\.[a-zA-Z]{2,4})(\:[0-9]+)?(/[^/][a-zA-Z0-9\.\,\?\'\\/\+&%\$#\=~_\-@]*)*$
匹配
http://www.sysrage.net | https://64.81.85.161/site/file.php?cow=moo's |ftp://user:[email protected]:123
不匹配
sysrage.net
正则表达式
^([a-zA-Z]\:|\\\\[^\/\\:*?"<>|]+\\[^\/\\:*?"<>|]+)(\\[^\/\\:*?"<>|]+)+(\.[^\/\\:*?"<>|]+)$
匹配
c:\Test.txt | \\server\shared\Test.txt | \\server\shared\Test.t
不匹配
c:\Test | \\server\shared | \\server\shared\Test.?
正则表达式
^(http|https|ftp)\://([a-zA-Z0-9\.\-]+(\:[a-zA-Z0-9\.&%\$\-]+)*@)*((25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9])\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[0-9])|localhost|([a-zA-Z0-9\-]+\.)*[a-zA-Z0-9\-]+\.(com|edu|gov|int|mil|net|org|biz|arpa|info|name|pro|aero|coop|museum|[a-zA-Z]{2}))(\:[0-9]+)*(/($|[a-zA-Z0-9\.\,\?\'\\\+&%\$#\=~_\-]+))*$
匹配
http://site.com/dir/file.php?var=moo | https://localhost |ftp://user:[email protected]:21/file/dir
不匹配
site.com | http://site.com/dir//
正则表达式
^([a-zA-Z]\:)(\\[^\\/:*?<>"|]*(?<![ ]))*(\.[a-zA-Z]{2,6})$
匹配
C:\di___r\fi_sysle.txt | c:\dir\filename.txt
不匹配
c:\dir\file?name.txt
正则表达式
^([a-zA-Z0-9]([a-zA-Z0-9\-]{0,61}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,6}$
匹配
regexlib.com | this.is.a.museum | 3com.com
不匹配
notadomain-.com | helloworld.c | .oops.org
正则表达式
^(((ht|f)tp(s?))\://)?(www.|[a-zA-Z].)[a-zA-Z0-9\-\.]+\.(com|edu|gov|mil|net|org|biz|info|name|museum|us|ca|uk)(\:[0-9]+)*(/($|[a-zA-Z0-9\.\,\;\?\'\\\+&%\$#\=~_\-]+))*$
匹配
www.blah.com:8103 | www.blah.com/blah.asp?sort=ASC |www.blah.com/blah.htm#blah
不匹配
www.state.ga | http://www.blah.ru
正则表达式
\b(([\w-]+://?|www[.])[^\s()<>]+(?:[\w\d]+[\w\d]+|([^[:punct:]\s]|/)))
匹配
http://foo.com/blah_blah | http://foo.com/blah_blah/ | (Something like http://foo.com/blah_blah) | http://foo.com/blah_blah_(wikipedia) | (Something like http://foo.com/blah_blah_(wikipedia)) | http://foo.com/blah_blah. |http://foo.com/blah_blah/. | <http://foo.com/blah_blah> | <http://foo.com/blah_blah/>| http://foo.com/blah_blah, | http://www.example.com/wpstyle/?p=364. | http://?df.ws/123 | rdar://1234 | rdar:/1234 | http://userid:[email protected]:8080 |http://[email protected] | http://[email protected]:8080 |http://userid:[email protected]
不匹配
no_ws.example.com | no_proto_or_ws.com | /relative_resource.php
#正则表达式,包含http
(http|ftp|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&:/~\+#]*[\w\-\@?^=%&/~\+#])?
#正则表达式,包含\www
^\\{2}[\w-]+\\(([\w-][\w-\s]*[\w-]+[$$]?$)|([\w-][$$]?$))
#正则表达式,包含http|https|ftp
^(http|https|ftp)\://([a-zA-Z0-9\.\-]+(\:[a-zA-Z0-9\.&%\$\-]+)*@)?((25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9])\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[0-9])|([a-zA-Z0-9\-]+\.)*[a-zA-Z0-9\-]+\.[a-zA-Z]{2,4})(\:[0-9]+)?(/[^/][a-zA-Z0-9\.\,\?\'\\/\+&%\$#\=~_\-@]*)*$
#正则表达式,包含\www
^([a-zA-Z]\:|\\\\[^\/\\:*?"<>|]+\\[^\/\\:*?"<>|]+)(\\[^\/\\:*?"<>|]+)+(\.[^\/\\:*?"<>|]+)$
电话、邮件等正则表达式提取(python):
#要从文本中提取电子邮件、url、手机号、ip地址等,可以使用杀手锏正则表达式。
# encoding: utf-8
import re
# 自定义获取文本电子邮件的函数
def get_findAll_emails(text):
"""
:param text: 文本
:return: 返回电子邮件列表
"""
emails = re.findall(r"[a-z0-9\.\-+_]+@[a-z0-9\.\-+_]+\.[a-z]+", text)
return emails
# 自定义获取文本手机号函数
def get_findAll_mobiles(text):
"""
:param text: 文本
:return: 返回手机号列表
"""
mobiles = re.findall(r"1\d{10}", text)
return mobiles
# 自定义获取文本url函数
def get_findAll_urls(text):
"""
:param text: 文本
:return: 返回url列表
"""
urls=re.findall(r"(http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*,]|(?:%[0-9a-fA-F][0-9a-fA-F]))+)|([a-zA-Z]+.\w+\.+[a-zA-Z0-9\/_]+)",text)
urls=list(sum(urls,()))
urls=[x for x in urls if x!='']
return urls
# 自定义获取获取ip地址函数
def get_findAll_ips(text):
"""
:param text: 文本
:return: 返回ip列表
"""
ips = re.findall(r"\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b", text)
return ips
if __name__ == '__main__':
content = "Please 42.121.252.58:443 contact 127.0.0.1 15988455173 us 18720071239 https://blog.csdn.net/u013421629/ at https://www.yiibai.com/ [email protected] for further information [email protected] You can also give feedbacl at [email protected]"
emails=get_findAll_emails(text=content)
print emails
moblies=get_findAll_mobiles(text=content)
print moblies
urls=get_findAll_urls(text=content)
print urls
ips=get_findAll_ips(text=content)
print ips
运行结果:
D:\Python27\python.exe F:/PycharmProjects/tom/提取电子邮件.py
['[email protected]', '[email protected]', '[email protected]']
['15988455173', '18720071239']
['https://blog.csdn.net/u013421629/', 'https://www.yiibai.com/']
['42.121.252.58', '127.0.0.1']
Process finished with exit code 0