Python正则表达式匹配换行符

默认时,Python正则中的.是不能匹配换行符的,如果碰到下面这种带有换行的js字符串该怎么办呢?

下面用到的js2py,是一个用Python执行js,可对JavaScript渲染的库。这里用来拼接出真正的url

import re
import js2py

txt = '''
(new Image()).src = 'https://weixin.sogou.com/approve?uuid=' + 'b9be9b04-7bcd-4a70-b412-70e1eb33fd1c' + '&token=' + '0177FFA5CCF44B442226BA55C2563A922371B60D5DF19CE0' + '&from=inner';

    setTimeout(function () {
        var url = '';
        url += 'http://mp.w';
        url += 'eixin.qq.co';
        url += 'm/s?src=11&';
        url += 'timestamp=1';
        url += '576115412&v';
        url += 'er=2029&sig';
        url += 'nature=3OfX';
        url += 'g*vTl0xc6Uv';
        url += 'afcTMAEg9B8';
        url += 'Ed0UQLlh744';
        url += '19o9uA1j0KFuh1W99OnNadkNegwwNkr5B7kI4g7k9vQzqb-BPoSoEESUUcMlerw99vocCRWur0Fp9fVATo*2aTRYiUo&new=1';
        url.replace("@", "");
        window.location.replace(url)
    },100);      
'''

# 这里用的是`.*?`匹配换行符
url_var = re.search('(var url.*?url\.replace\("@", ""\);)', txt).group(1)
url_rendered = js2py.eval_js(url_var)
print(url_rendered)

强行照上面写的话,结果就会报错。

解决方法之一,是使用[\s\S]*?代替.*?[\s\S]是可以匹配包括换行符的任意字符的。

import re
import js2py

txt = '''
(new Image()).src = 'https://weixin.sogou.com/approve?uuid=' + 'b9be9b04-7bcd-4a70-b412-70e1eb33fd1c' + '&token=' + '0177FFA5CCF44B442226BA55C2563A922371B60D5DF19CE0' + '&from=inner';

    setTimeout(function () {
        var url = '';
        url += 'http://mp.w';
        url += 'eixin.qq.co';
        url += 'm/s?src=11&';
        url += 'timestamp=1';
        url += '576115412&v';
        url += 'er=2029&sig';
        url += 'nature=3OfX';
        url += 'g*vTl0xc6Uv';
        url += 'afcTMAEg9B8';
        url += 'Ed0UQLlh744';
        url += '19o9uA1j0KFuh1W99OnNadkNegwwNkr5B7kI4g7k9vQzqb-BPoSoEESUUcMlerw99vocCRWur0Fp9fVATo*2aTRYiUo&new=1';
        url.replace("@", "");
        window.location.replace(url)
    },100);      
'''

# 这里用的是`[\s\S]*?`匹配换行符
url_var = re.search('(var url[\s\S]*?url\.replace\("@", ""\);)', txt).group(1)
url_rendered = js2py.eval_js(url_var)
print(url_rendered)

解决方法之二,设置re.DOTALL,就可以使.匹配换行符了,如下:

import re

txt = '''
(new Image()).src = 'https://weixin.sogou.com/approve?uuid=' + 'b9be9b04-7bcd-4a70-b412-70e1eb33fd1c' + '&token=' + '0177FFA5CCF44B442226BA55C2563A922371B60D5DF19CE0' + '&from=inner';

    setTimeout(function () {
        var url = '';
        url += 'http://mp.w';
        url += 'eixin.qq.co';
        url += 'm/s?src=11&';
        url += 'timestamp=1';
        url += '576115412&v';
        url += 'er=2029&sig';
        url += 'nature=3OfX';
        url += 'g*vTl0xc6Uv';
        url += 'afcTMAEg9B8';
        url += 'Ed0UQLlh744';
        url += '19o9uA1j0KFuh1W99OnNadkNegwwNkr5B7kI4g7k9vQzqb-BPoSoEESUUcMlerw99vocCRWur0Fp9fVATo*2aTRYiUo&new=1';
        url.replace("@", "");
        window.location.replace(url)
    },100);      
'''

pattern = re.compile(r'(var url.*?url\.replace\("@", ""\);)', re.DOTALL)
res = pattern.search(txt).group(1)
print(res)

你可能感兴趣的:(Python正则表达式匹配换行符)