python爬虫 破解js加密有道词典案列的两种方式以及思路总结

代码在后面
思路1总结 抓包找到这四个加密参数
‘salt’: salt,
‘sign’: sign,
‘ts’: ts,
‘bv’: bv,
得到四个值(涉及到js中 MD5, 时间戳。 随机数等)
携带这四个值去发送请请求, 携带cookies referer User-Agent

     var r = function(e) {
    var t = n.md5(navigator.appVersion)
      , r = "" + (new Date).getTime()
      , i = r + parseInt(10 * Math.random(), 10);
    return {
        ts: r,
        bv: t,
        salt: i,
        sign: n.md5("fanyideskweb" + e + i + "n%A-rKaT5fb[Gy?;N5@Tj")
    }
};

分析可知salt 是i, 而i是 r + 一个(0,9)的随机数, r 是时间戳
ts 是 r 时间戳

navigator.appVersion 是User-Agent
断点可以看得出来e 是传的参数 i=‘hello’ 要翻译的内容

思路2总结: 用execjs执行js文件

import execjs
with open(‘youdao.js’, ‘r’, encoding=‘utf-8’) as r:
js = r.read()

js = execjs.compile(js)
result = js.call(‘function’, ‘apple’)
print(result)

代码

import json
import random
import time
import hashlib
import requests


headers = {
     'Accept': 'application/json, text/javascript, */*; q=0.01',
    'Accept-Encoding': 'gzip, deflate',
    'Accept-Language': 'zh-CN,zh;q=0.9',
    'Connection': 'keep-alive',
    'Content-Length': '255',
    'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
    'Cookie': 'JSESSIONID=abcaVFNZrEtSV583CqbZw; _ntes_nnid=6927f0d06e9bbabddaa221862ff48a3c,1566610413099; OUTFOX_SEARCH_USER_ID_NCOO=1609575443.4062839; [email protected]; ___rl__test__cookies=1566611355038',
    'Host': 'fanyi.youdao.com',
    'Origin': 'http://fanyi.youdao.com',
    'Referer': 'http://fanyi.youdao.com/',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36',
    'X-Requested-With': 'XMLHttpRequest',
}


# 破解salt ts sign


def make_md5(string):
    """md5加密"""
    string = string.encode("utf-8")
    md5 = hashlib.md5(string).hexdigest()
    return md5


def get_content(e):
    ts = str(int(time.time() * 1000))
    salt = ts + str(random.randint(0, 9))  # 1566613520.000589      # 15666113550427
    ua = "5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36"
    bv = make_md5(ua)

    # sign = "fanyideskweb" + e + i + "n%A-rKaT5fb[Gy?;N5@Tj"  # i 是salt
    sign = "fanyideskweb" + e + salt + "n%A-rKaT5fb[Gy?;N5@Tj"
    sign = make_md5(sign)
    # print(salt)
    url = "http://fanyi.youdao.com/translate_o?smartresult=dict&smartresult=rule"
    data = {
        'i': e,
        'from': 'AUTO',
        'to': 'AUTO',
        'smartresult': 'dict',
        'client': 'fanyideskweb',
        'salt': salt,
        'sign': sign,
        'ts': ts,
        'bv': bv,
        'doctype': 'json',
        'version': '2.1',
        'keyfrom': 'fanyi.web',
        'action': 'FY_BY_CLICKBUTTION',
    }

    # print(data)
    response = requests.post(url=url, headers=headers, data=data).text
    return response

def handle_content(result):
    """
    规范化处理
    :return:
    """
    # print(result)
    result = json.loads(result)
    print(result["translateResult"][0][0]["tgt"])

def main():
    while 1:
        input_content = input("请输入要翻译的内容:")
        result = get_content(input_content)
        handle_content(result)


if __name__ == '__main__':
    main()

由于时间比较紧,就先给大家分享一种方法,有时间我会继续完成第二种方法代码与大家分享。

你可能感兴趣的:(python爬虫 破解js加密有道词典案列的两种方式以及思路总结)