在日常使用Python做爬虫,一般会用到以下手段:
针对1,应该是爬取大多数没有任何安全机制并且是静态网页时,常用手段,而2是提取数据最为快速的,但是接口并不是总是返回我们最喜欢的json字符串,有可能是一段js代码定义的变量数据,针对3则更为头痛,如果在Python汇总能运行js代码,则可能会解决掉我们大部分的烦恼,本文即讲解一个Python包,用于直接在Python代码中直接运行js代码,然后得到我们感兴趣的函数、数据或者js代码运行之后的结果。
js2py是众多可以在Python中运行js代码比较优秀的库。
安装js2py方法如下:
pip install js2py
js2py中有以下两个比较关键的方法:
方法 | 解释 | 示意 |
js2py.eval_js(js_string) | 直接运行含有js代码的字符串(或js文件),并得出结果 | js_string='var a=10' js2py.eval_js(js_string) #输出10 |
js2py.EvalJs() | 生成一个EvalJs对象 可通过该对象的execute方法来运行一段js代码(或js文件),并得到对应的变量和对象(即抑制输出,得到变量和对象,便于后续直接使用) 可通过该对象的eval()方法来运行一段js代码,并得到结果 |
js_string='var a=10' js_obj=js2py.EvalJs() js_obj.execute(js_string) js_obj.a #输出为10 |
import js2py
string='var db={chars:["a","b","c","d","f","g","h","j","k","m","n","p","q","r","s","t","w","x","y","z"],datas:[["005970","国泰消费优选股票","GTXFYXGP","1.9082","1.9082","1.8657","1.8657","0.0425","2.28","开放申购","开放赎回","","1","0","1","","1","0.15%","0.15%","1","1.50%"],["004069","南方中证全指证券公司ETF联接A","NFZZQZZQGSETFLJA","1.1438","1.1438","1.1212","1.1212","0.0226","2.02","开放申购","开放赎回","","1","0","2","","1","0.12%","0.12%","1","1.20%"]],count:["9981","3745","1653","4583"],record:"10481",pages:"5241",curpage:"1",indexsy:[-0.04,-0.34,-0.03,],showday:["2021-03-05","2021-03-04"]}'
mydict=js2py.eval_js(string) #最后会将string内定义的对象,变为Python字典并赋值给mydict
string='''
function add(a, b) {
return a + b
}
'''
myadd=js2py.eval_js(string) #会得到一个add函数并赋值给myadd,后续可通过myadd直接调用
import js2py
js_obj=js2py.EvalJs()
string='''
var a=10
function func(a,b){
return a*b
}
'''
js_obj.execute(string)
js_obj.a #输出为10
js_obj.func #为func函数
js_obj.func(3,4) #输出为12
当然,Python还有其他运行js代码的库,但是作者感觉js2py整体是最契合Python的,甚至语法都基本没什么变化,故推荐使用该库
该库还有其他很多方法和用途,不过以上是最为常用的场景,读者如果需要或感兴趣,可以进一步研究学习,不顾如果能掌握好以上方法,基本可以满足90%的日常需要了。