正则表达式 re.findall()
findall (pattern, string [, flags])
返回string中与pattern匹配的所有未重叠的值,包括空匹配值。如果模式包含分组,将返回与分组匹配的文本列表。如果使用了不只一个分组,那么列表中的每项都是一个元组,包含每个分组的文本。
flags解释:
标志 | 描述 |
---|---|
A或ASCII | 执行仅8位ASCII字符匹配(仅适用python3) |
I或IGNORECASE | 执行不区分大小写的匹配 |
L或LOCALE | 为\w、\W、\b和\B 使用地区设置 |
M或MULTILINE | 将 ^ 和 $应用于包括整个字符串的开始和结尾的每一行(在正常情况下,^ 和 $仅适用于整个字符串的开始和结尾) |
S或DOTALL | 使点(.)字符匹配所有字符,包括换行符 |
U或UNICODE | 使用\w、\W、\b和\B在Unicode字符属性数据库中的信息(仅限与python2。python3默认使用Unicode) |
X或VERBOSE | 忽略模式字符串中未转义的空格和注释 |
注:返回列表格式:在字符串中找到正则表达式所匹配的所有子串,并返回一个列表,如果没有找到匹配的,则返回空列表。
例:
print('4',re.findall("[<>,\'\"/]", '/', re.IGNORECASE)) # 匹配 <;>;,;';";/;(6个)其中之一
4 [’<’, ‘,’, ‘<’, ‘>’, ‘/’]
url1='/0/252/report/reportServlet?action=4&url=http://127.0.0.1&file=wait_trace.raq&columns=0&srcType=file&width=-1&height=-1&cachedId=A_2&t_i_m_e=&frame=stu_saveAs_frame--%3E%3C/sCrIpT%3E%3CsCrIpT%3Ealert(42873)%3C/sCrIpT%3E'
print('1',re.findall("(alert)|(script=)(%3c)|(%3e)|(%20)|(onerror)|(onload)|(eval)|(src=)|(prompt)",url1,re.IGNORECASE))
print('2',re.findall("(alert)|(script=)|(%3c)|(%3e)|(%20)|(onerror)|(onload)|(eval)|(src=)|(prompt)",url1,re.IGNORECASE))
1 [(’’, ‘’, ‘’, ‘%3E’, ‘’, ‘’, ‘’, ‘’, ‘’, ‘’), (’’, ‘’, ‘’, ‘%3E’, ‘’, ‘’, ‘’, ‘’, ‘’, ‘’), (’’, ‘’, ‘’, ‘%3E’, ‘’, ‘’, ‘’, ‘’, ‘’, ‘’), (‘alert’, ‘’, ‘’, ‘’, ‘’, ‘’, ‘’, ‘’, ‘’, ‘’), (’’, ‘’, ‘’, ‘%3E’, ‘’, ‘’, ‘’, ‘’, ‘’, ‘’)]
2 [(’’, ‘’, ‘’, ‘%3E’, ‘’, ‘’, ‘’, ‘’, ‘’, ‘’), (’’, ‘’, ‘%3C’, ‘’, ‘’, ‘’, ‘’, ‘’, ‘’, ‘’), (’’, ‘’, ‘’, ‘%3E’, ‘’, ‘’, ‘’, ‘’, ‘’, ‘’), (’’, ‘’, ‘%3C’, ‘’, ‘’, ‘’, ‘’, ‘’, ‘’, ‘’), (’’, ‘’, ‘’, ‘%3E’, ‘’, ‘’, ‘’, ‘’, ‘’, ‘’), (‘alert’, ‘’, ‘’, ‘’, ‘’, ‘’, ‘’, ‘’, ‘’, ‘’), (’’, ‘’, ‘%3C’, ‘’, ‘’, ‘’, ‘’, ‘’, ‘’, ‘’), (’’, ‘’, ‘’, ‘%3E’, ‘’, ‘’, ‘’, ‘’, ‘’, ‘’)]
注:每匹配到一个就增加一个元组,并且按出现的先后顺序排序,未匹配到要添加空字符串。
url3='script=%3cscriptoioio%3c%3e'
url4='script=%3cscript=oioio%3c%3e'
print('3',re.findall("(alert)|(script=)(%3c)|(%3e)",url3,re.IGNORECASE))
print('4',re.findall("(alert)|(script=)|(%3c)|(%3e)",url3,re.IGNORECASE))
print('5',re.findall("(alert)|(script=)(%3c)|(%3e)",url4,re.IGNORECASE))
print('6',re.findall("(alert)|(script=)|(%3c)|(%3e)",url4,re.IGNORECASE))
3 [(’’, ‘script=’, ‘%3c’, ‘’), (’’, ‘’, ‘’, ‘%3e’)]
4 [(’’, ‘script=’, ‘’, ‘’), (’’, ‘’, ‘%3c’, ‘’), (’’, ‘’, ‘%3c’, ‘’), (’’, ‘’, ‘’, ‘%3e’)]
5 [(’’, ‘script=’, ‘%3c’, ‘’), (’’, ‘’, ‘’, ‘%3e’)]
6 [(’’, ‘script=’, ‘’, ‘’), (’’, ‘’, ‘%3c’, ‘’), (’’, ‘script=’, ‘’, ‘’), (’’, ‘’, ‘%3c’, ‘’), (’’, ‘’, ‘’, ‘%3e’)]
注:(script=)(%3c) ,即要出现script=%3c,script和%3c要分别占位,所以,5 [(’’, ‘script=’, ‘%3c’, ‘’), (’’, ‘’, ‘’, ‘%3e’)]中每个元组包含4个元素。