声明:文章仅源自个人兴趣爱好,不涉及他用,侵权联系删。
简单爬虫,只要会python基础,都能把数据请求过来,稍微处理下就可以将数据保存了,个人认为爬虫能力的区分度就在于JS逆向这块,为了减少和大佬们的差距,特意还去学习了下JavaScript,点击直达笔记传送门。到目前为止,个人也解决了不少JS逆向的网站,从最初的遇到JS加密无从下手,到现在特别喜欢处理JS加密的网站,个人也成长了不少,但是和大佬们还是有很大的差距。这篇文章纯粹是个人的一些方法技巧记录,对于小白来说应该或多或少有帮助的,不喜勿喷。
百度翻译:传送门
随便翻译一个,并用谷歌浏览器开发者工具抓包就够了
提交路由,提交参数有了:
那么如何下手,最简单的直接编写代码
# -*- coding: utf-8 -*-
'''
@Author :Jason
复制代码记得替换cookie,我的...
'''
import requests,json
def baidu_translate():
url = "https://fanyi.baidu.com/v2transapi"
headers = {
"user-agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36",
"cookie": "BIDUPSID=5E24E1B5DAC82560F6154B5DD; PSTM=1524914397; REALTIME_TRANS_SWITCH=1; FANYI_WORD_SWITCH=1; HISTORY_SWITCH=1; SOUND_SPD_SWITCH=1; SOUND_PREFER_SWITCH=1; BAIDUID=F4AF11698CAE711B7789913E9C:FG=1; BDUSS=lTUm9oQW1TY3FCN2VoSjNWQ0JIeTVrV2d-Ty15aklFd3o1R0Rmc2dxbjNuUnhlSVFBQUFBJCQAAAAAAAAAAAEAAAAsDjRSu6i7qMrAvefLrcH0xOMAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAPcQ9V33EPVdY2; APPGUIDE_8_2_2=1; BDORZ=B490B5EBF6F3CD402E515D22BCDA1598; BDSFRCVID=E2kOJeC62wc-xrRuWMv5MMNtvmxxFXTTH6ao4MkDp_Ysd8BlzrHjEG0PHU8g0Ku-S2O3ogKKL2OTHm_F_2uxOjjg8UtVJeC6EG0Ptf8g0M5; H_BDCLCKID_SF=tRAOoC_bfIvffJ5gbP__-4_tbh_X5-RLf2IHBh7F5l8-h43nqJ7K0RLiXqDtWxrwbTcLXPnIyUbxOKQphTOYMU0QBnreKTvGaTQHancN3KJmOMK9bT3v5Du9D4Pe2-biW2tH2Mbdax7P_IoG2Mn8M4bb3qOpBtQmJeTxoUJ25DnJhhCGe4bK-Tr3DGuj3J; PSINO=5; delPer=1; H_PS_PSSID=1465_21102_30211_30284_30509; Hm_lvt_64ecd82404c51e03dc91cb9e8c025574=1576342265,1577612174; from_lang_often=%5B%7B%22value%22%3A%22en%22%2C%22text%22%3A%22%u82F1%u8BED%22%7D%2C%7B%22value%zh%22%2C%22text%22%3A%22%u4E2D%u6587%22%7D%5D; Hm_lpvt_64ecd82404c51e03dc91cb9e8c025574=1577630985; yjs_js_security_passport=67c5413d792d9a22195ece9a00643e1fc4299f16_1577630985_js; to_lang_often=%5B%7B%22value%22%3A%22zh%22%2C%22text%22%3A%22%u4E2D%u6587%22%7D%2e%22%3A%22en%22%2C%22text%22%3A%22%u82F1%u8BED%22%7D%5D",
}
data = {
"from": "zh",
"to": "en",
"query": "今天",
"transtype": "translang",
"simple_means_flag": "3",
"sign": "537032.840441",
"token": "0821115fdc50d35f8637589add7b065d",
}
res = requests.post(url,data=data,headers = headers)
translate = json.loads(res.text)
print('query:', "今天", '\n', 'translate_result:', translate["trans_result"]["data"][0]["dst"])
if __name__ == "__main__":
baidu_translate()
结果:
这基本也能达到我们的要求,但是我们今天走的是JS调试,那么...
可以找到路由中比较特殊的一部分,比如v2transapi,在source栏中search
双击js文件,跳转到查看详情页面,点击箭头中的{}格式化js
格式化后点击左侧行号,打断点调试
重新发起翻译请求,运行到打断点处自动停止,这时可以查阅参数的相关信息(生成值,传递值这些都可以看到):
一步一步往上推,ajax请求,提交方式POST,路由中部分参数需要从上面获取,提价参数p,p往上看是一个json,参数中sign和token暂时未给定,点击y(a)其实是上面的e(a)
初始化的 i 值:
复制其中的代码到本地,利用execjs进行调试,先给一段execjs的基本使用:
import execjs
a = execjs.get().name
print(a) #Node.js (V8)
#定义对象
ctx = execjs.compile('''
function add(x,y){
return (x+y);
}
''')
print(ctx.call("add",1,2)) #3
说白了就是将Python 与 JavaScript 集合:
bd.js
function e(r) {
var i = "537032.840441";
var window = {};
var o = r.match(/[\uD800-\uDBFF][\uDC00-\uDFFF]/g);
if (null === o) {
var t = r.length;
t > 30 && (r = "" + r.substr(0, 10) + r.substr(Math.floor(t / 2) - 5, 10) + r.substr(-10, 10))
} else {
for (var e = r.split(/[\uD800-\uDBFF][\uDC00-\uDFFF]/), C = 0, h = e.length, f = []; h > C; C++)
"" !== e[C] && f.push.apply(f, a(e[C].split(""))),
C !== h - 1 && f.push(o[C]);
var g = f.length;
g > 30 && (r = f.slice(0, 10).join("") + f.slice(Math.floor(g / 2) - 5, Math.floor(g / 2) + 5).join("") + f.slice(-10).join(""))
}
var u = void 0
, l = "" + String.fromCharCode(103) + String.fromCharCode(116) + String.fromCharCode(107);
u = null !== i ? i : (i = window[l] || "") || "";
for (var d = u.split("."), m = Number(d[0]) || 0, s = Number(d[1]) || 0, S = [], c = 0, v = 0; v < r.length; v++) {
var A = r.charCodeAt(v);
128 > A ? S[c++] = A : (2048 > A ? S[c++] = A >> 6 | 192 : (55296 === (64512 & A) && v + 1 < r.length && 56320 === (64512 & r.charCodeAt(v + 1)) ? (A = 65536 + ((1023 & A) << 10) + (1023 & r.charCodeAt(++v)),
S[c++] = A >> 18 | 240,
S[c++] = A >> 12 & 63 | 128) : S[c++] = A >> 12 | 224,
S[c++] = A >> 6 & 63 | 128),
S[c++] = 63 & A | 128)
}
for (var p = m, F = "" + String.fromCharCode(43) + String.fromCharCode(45) + String.fromCharCode(97) + ("" + String.fromCharCode(94) + String.fromCharCode(43) + String.fromCharCode(54)), D = "" + String.fromCharCode(43) + String.fromCharCode(45) + String.fromCharCode(51) + ("" + String.fromCharCode(94) + String.fromCharCode(43) + String.fromCharCode(98)) + ("" + String.fromCharCode(43) + String.fromCharCode(45) + String.fromCharCode(102)), b = 0; b < S.length; b++)
p += S[b],
p = n(p, F);
return p = n(p, D),
p ^= s,
0 > p && (p = (2147483647 & p) + 2147483648),
p %= 1e6,
p.toString() + "." + (p ^ m)
}
/*上面调用了function n()*/
function n(r, o) {
for (var t = 0; t < o.length - 2; t += 3) {
var a = o.charAt(t + 2);
a = a >= "a" ? a.charCodeAt(0) - 87 : Number(a),
a = "+" === o.charAt(t + 1) ? r >>> a : r << a,
r = "+" === o.charAt(t) ? r + a & 4294967295 : r ^ a
}
return r
}
调用该js生成sign值:
import execjs
with open(r"./bd.js","r") as f:
ctx = execjs.compile(f.read())
sign = ctx.call("e","今天")
print(sign)
生成的sign值:
我们需要的sign值:
可以看出两者是一样的。
其他的参数也是一样的来,这里就不一一讲解了,直接利用这个参数:
# -*- coding: utf-8 -*-
"""
功能:百度翻译
实现功能:中英文互译
"""
import requests
import re
import json
import execjs
def baidu_translate():
while True:
words = input("请输入要查询的内容,输入 ## 退出:")
if words == "##":
print("退出程序成功!!!")
break
url = "https://fanyi.baidu.com/v2transapi"
headers = {
"user-agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36",
"cookie": "你的cookie",
}
zh_en = re.compile('[\u4e00-\u9fa5]+') # 判断输入的内容中是否有中文
if re.search(pattern=zh_en, string=words):
fr = "zh"
to = "en"
else:
fr = "en"
to = "zh"
with open(r"./bd.js", "r") as f:
ctx = execjs.compile(f.read())
sign = ctx.call("e", str(words))
data = {
"from": fr,
"to": to,
"query": words,
"transtype": "translang",
"simple_means_flag": "3",
"sign": sign,
"token": "0821115fdc50d35f8637589add7b065d",
}
res = requests.post(url, data=data, headers=headers)
translate = json.loads(res.text)
try:
print('query:', words, '\r\n', 'translate_result:', translate["trans_result"]["data"][0]["dst"])
except:
print("返回结果有误,请检查")
if __name__ == "__main__":
baidu_translate()
逆向解密解决思路:
(1)是根据加密参数,给定的URL等挑选特殊点,不要调http、:、www等这些大众化的就行
(2)source 栏中 按ctrl + shift + F快捷键 或者 调出另一search功能,输入你的关键词找到js代码
(3)打断点:行号处点击一下,出现蓝点即表示打断点成功;
(4)重新发起请求,在打断点处基本可以看到你的值的生成调用等一系列过程;
(5)将相应的JS原生代码:JS函数,JS变量等拷贝到一个单独的js文件中;
(6)调用Python的 execjs 先 调试看是否能实现相应的功能;