近期在采集一个网站的时候遇到一部分的页面是使用JS代码来填充数据的,代码如下:
案件级别:var s = getDictLabel([{"id":"4191c4842b3842749dd467655f90b1fa","isNewRecord":false,"remarks":"案件级别","createDate":"2016-11-29 15:19:34","updateDate":"2016-11-29 15:19:34","value":"1","label":"一般事件","type":"case_grade","description":"案件级别","sort":1,"parentId":"0"},{"id":"5e24fd12f3384f38ab10898013fe25d7","isNewRecord":false,"createDate":"2016-11-29 15:19:35","updateDate":"2016-11-29 15:19:35","value":"2","label":"紧急事件","type":"case_grade","description":"案件级别","sort":2,"parentId":"0"},{"id":"f793a78eb44c405b9fd007359df09579","isNewRecord":false,"createDate":"2016-11-29 15:19:36","updateDate":"2016-11-29 15:19:36","value":"3","label":"一般重复","type":"case_grade","description":"案件级别","sort":3,"parentId":"0"},{"id":"3f2605e3f6e648f688b79da5f730aedb","isNewRecord":false,"createDate":"2016-11-29 15:19:34","updateDate":"2016-11-29 15:19:34","value":"4","label":"紧急重复","type":"case_grade","description":"案件级别","sort":4,"parentId":"0"}], 1, '', true);
document.write(s);
受理时间: 2018-08-05 13:39:22对上下文的源码进行分析,找到 getDictLabel 这个JavaScript的函数实现代码如下:
function getDictLabel(data, value, defaultValue){
for (var i=0; i
var row = data[i];
if (row.value == value){
return row.label;
}
}
return defaultValue;
}
通过一番了解,可以获取到 td 标签中间的 script 对应的JS代码,通过执行JS代码获取对应的数据。找到了一个Python的库 pyexecjs 可以实现,可以通过如下代码进行安装:
pip install PyExecJS
核心代码如下:
import execjs
def exec_js_function(js):
# 编译JS代码
ctx = execjs.compile("""
function getDictLabel(data, value, defaultValue){
for (var i = 0; i < data.length; i++){
var row = data[i];
if (row.value == value){
return row.label;
}
}
return defaultValue;
}
""")
# 删除一些无关的字符
jscode = js.replace('document.write(s);', '').replace(', true);', ')').replace('var s =', '')
# 执行代码
return ctx.eval(jscode)
标签: JS Python
顶一下
(0)
0%
踩一下
(0)
0%