常与split(),replace()一起用
re.findall的简单用法findall(pattern, string, flags=0)
,以列表的形式返回能匹配的字符串数组,r代表转义
例一:值得注意的是:当 string中含有"\n",会找不到\n后面的字符
import re
text="123\n456"
print(re.findall("1(.*)3",text)[0])
print(re.findall("1(.*)4",text)[0])
#第二个print报错,list index out of range
例二:返回的是一个列表,且只有一个元素
import re
a="do you like dancing?
please go with me
please go with me
"
b=re.findall(r"(.*)
",a)
#(.*)代表任意长度字符串
print(b)
# ['do you like dancing?please go with me
please go with me']
例三:如果要返回多个元素后面加个问号,实现beautifulsoup对象的soup.select(“p”)
b=re.findall(r"(.*?)
",a)
例四:符号^表示匹配以https开头的的字符串返回,、
regular_v2 = re.findall(r"^https","https://docs.python.org/3/whatsnew/3.6.html")
print (regular_v2)
# ['https']
例五:用$符号表示以html结尾的字符串返回,判断是否字符串结束的字符串
regular_v3 = re.findall(r"html$","https://docs.python.org/3/whatsnew/3.6.html")
print (regular_v3)
# ['html']
例六: […],匹配中括号中的其中一个字符
regular_v4 = re.findall(r"[t,w]h","https://docs.python.org/3/whatsnew/3.6.html")
print (regular_v4)
# ['th', 'wh']
如果是多个字符,比如我们,我 们,中间差距空格,
需要用replace变成统一我们,再用re.findall("我们(.*)others",text)
例七:\d \D
“d”是正则语法规则用来匹配0到9之间的数返回列表
regular_v5 = re.findall(r"\d","https://docs.python.org/3/whatsnew/3.6.html")
regular_v6 = re.findall(r"\d\d\d","https://docs.python.org/3/whatsnew/3.6.html/1234")
print (regular_v5)
# ['3', '3', '6']
print (regular_v6)
# ['123']
小d表示取数字0-9,大D表示不要数字,也就是出了数字以外的内容返回
regular_v7 = re.findall(r"\D","https://docs.python.org/3/whatsnew/3.6.html")
print (regular_v7)
# ['h', 't', 't', 'p', 's', ':', '/', '/', 'd', 'o', 'c', 's', '.', 'p', 'y', 't', 'h', 'o', 'n', '.', 'o', 'r', 'g', '/', '/', 'w', 'h', 'a', 't', 's', 'n', 'e', 'w', '/', '.', '.', 'h', 't', 'm', 'l']
例八:\w \W
“w”在正则里面代表匹配从小写a到z,大写A到Z,数字0到9
regular_v8 = re.findall(r"\w","https://docs.python.org/3/whatsnew/3.6.html")
print (regular_v8)
#['h', 't', 't', 'p', 's', 'd', 'o', 'c', 's', 'p', 'y', 't', 'h', 'o', 'n', 'o', 'r', 'g', '3', 'w', 'h', 'a', 't', 's', 'n', 'e', 'w', '3', '6', 'h', 't', 'm', 'l']
“W”在正则里面代表匹配除了字母与数字以外的特殊符号
regular_v9 = re.findall(r"\W","https://docs.python.org/3/whatsnew/3.6.html")
print (regular_v9)
# [':', '/', '/', '.', '.', '/', '/', '/', '.', '.']
例九:在网页中取得最大数字
例如:取得国家卫健委新型冠状病毒感染的肺炎疫情最新情况中的确诊病例和疑似病例数
第三个确诊病例后不是数字,需要用\d排除,且必须放入()中,不然会认为是不需要的数字
Diagnosed = re.findall("确诊病例(\d.*?\d)例",text)
Diagnosed = [int(i) for i in Diagnosed]
Diagnosed = max(Diagnosed)
suspected = re.findall("疑似病例(\d.*?\d)例", text)
suspected = [int(i) for i in suspected]
suspected = max(suspected)
参考于
https://www.cnblogs.com/xieshengsen/p/6727064.html