有道翻译反爬虫实现

1.通过Network找到接口

有道翻译反爬虫实现_第1张图片

2.分析接口参数

有道翻译反爬虫实现_第2张图片

3.多次刷新页面,分析出固定的参数和变动的参数

i:需要进行翻译的字符串

from:源语言的语种

to:翻译后的语种

smartresult:智能结果,固定值

client:客户端,固定值

salt:加密用到的盐,待定

sign:签名字符串,待定

ts:毫秒时间戳

bv:未知的md5值,固定值

doctype:文档类型,固定值

version:版本,固定值

keyfrom:键来源,固定值

action:操作动作,固定值

typoResult:是否打印错误,固定值

4.搜索sign,找到http://shared.ydstatic.com/fanyi/newweb/v1.0.18/scripts/newweb/fanyi.min.js,相关代码如下:

var r = function(e) {
    var t = n.md5(navigator.appVersion)
      , r = "" + (new Date).getTime()
      , i = r + parseInt(10 * Math.random(), 10);
    return {
        ts: r,
        bv: t,
        salt: i,
        sign: n.md5("fanyideskweb" + e + i + "97_3(jkMYg@T[KZQmqjTK")
    }
};

t.recordUpdate = function(e) {
    var t = e.i
      , i = r(t);
    n.ajax({
        type: "POST",
        contentType: "application/x-www-form-urlencoded; charset=UTF-8",
        url: "/bettertranslation",
        data: {
            i: e.i,
            client: "fanyideskweb",
            salt: i.salt,
            sign: i.sign,
            ts: i.ts,
            bv: i.bv,
            tgt: e.tgt,
            modifiedTgt: e.modifiedTgt,
            from: e.from,
            to: e.to
        },
        success: function(e) {},
        error: function(e) {}
    })
}

5.分析出参数变动规律

i:需要进行翻译的字符串的前5000字

salt:当前毫秒时间戳与10以内随机数字字符串的拼接

sign:"fanyideskweb"+i+salt+"97_3(jkMYg@T[KZQmqjTK"的md5值

ts:当前毫秒时间戳

6.实现有道接口爬取

import random
import time
import requests
import hashlib

def generateSaltSign(e):
    navigator_appVersion = "5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36"
    t = hashlib.md5(navigator_appVersion.encode("utf-8")).hexdigest()
    r = str(int(time.time()*1000))
    i = r + str(random.randint(1,10))
    return {
        "ts": r,
        "bv": t,
        "salt": i,
        "sign": hashlib.md5(str("fanyideskweb" + e + i + "97_3(jkMYg@T[KZQmqjTK").encode("utf-8")).hexdigest()
    }

def spider(i):
    url = 'http://fanyi.youdao.com/translate_o?smartresult=dict&smartresult=rule'
    r = generateSaltSign(i)

    data = {
        "i": i,
        "from": "AUTO",
        "to": "AUTO",
        "smartresult": "dict",
        "client": "fanyideskweb",
        "salt": r["salt"],
        "sign": r["sign"],
        "ts": r["ts"],
        "bv": r["bv"],
        "doctype": "json",
        "version": "2.1",
        "keyfrom": "fanyi.web",
        "action": "FY_BY_REALTlME",
    }
    # data = parse.urlencode(data).encode('utf-8')

    headers = {
        "Cookie": "[email protected];",
        "Referer": "http://fanyi.youdao.com/",
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36",
    }
    response = requests.post(url=url, data=data, headers=headers)
    print(response.text)

if __name__ == '__main__':
    i = "你好,有道!"
    spider(i)

 

你可能感兴趣的:(爬虫)