Google翻译接口调用

最近有个需求要批量调用Google翻译进行翻译,由于数据量过大,使用selenium模拟显然是不现实的。。。。

要是能调用他的接口就好了,但接口是收费的。。。没办法,哎,只能自己搞,看看能不能破解。看了下Google翻译的接口,

 

对浏览器右键检查,可用发现翻译是有接口的:

https://translate.google.com/translate_a/single?client=webapp&sl=auto&tl=en&hl=zh-CN&dt=at&dt=bd&dt=ex&dt=ld&dt=md&dt=qca&dt=rw&dt=rm&dt=ss&dt=t&ssel=5&tsel=5&kc=1&tk={}&q={}

其中tk是一个ticket ,用作反爬机制,q为你要翻译的数据,所以我们最重要的是需要知道tk怎么来的。关于tk的生成,有兴趣的可用查看翻译页面的js,tk的生成是在前端通过js生成的,由需要翻译的数据进行一系列的加密操作,已经有大佬破解了,下面贴上生成tk的代码,直接调用get_tk()方法就可以获得,传入你需要翻译的数据。

#!/usr/bin/env python2
# -*- coding: UTF-8 -*-

"""This module creates the TK GET parameter for Google translate.

This is just a code port to python. All credits should go to the original
creators of the code @tehmaestro and @helen5106.

For more info see: https://github.com/Stichoza/google-translate-php/issues/32

Usage:
    Call this python script from the command line.

        $ python tk_generator.py 

    Use this module from another python script.

        >>> import .tk_generator
        >>> tk_generator.get_tk('dog')

Attributes:
    _ENCODING (string): Default encoding to be used during the string
        encode-decode process.

"""

__all__ = ["get_tk"]

import sys
from datetime import datetime


_ENCODING = "UTF-8"


# Helper functions
def _mb_strlen(string):
    """Get the length of the encoded string."""
    return len(string.decode(_ENCODING))


def _mb_substr(string, start, length):
    """Get substring from the encoded string."""
    return string.decode(_ENCODING)[start: start + length]

##################################################


def _shr32(x, bits):
    if bits <= 0:
        return x

    if bits >= 32:
        return 0

    x_bin = bin(x)[2:]
    x_bin_length = len(x_bin)

    if x_bin_length > 32:
        x_bin = x_bin[x_bin_length - 32: x_bin_length]

    if x_bin_length < 32:
        x_bin = x_bin.zfill(32)

    return int(x_bin[:32 - bits].zfill(32), 2)


def _char_code_at(string, index):
    return ord(_mb_substr(string, index, 1))


#OLD Function
def _generateB():
    start = datetime(1970, 1, 1)
    now = datetime.now()

    diff = now - start

    return int(diff.total_seconds() / 3600)


def _TKK():
    """Replacement for _generateB function."""
    return [406604, 1836941114]


def _RL(a, b):
    for c in range(0, len(b) - 2, 3):
        d = b[c + 2]

        if d >= 'a':
            d = _char_code_at(d, 0) - 87
        else:
            d = int(d)

        if b[c + 1] == '+':
            d = _shr32(a, d)
        else:
            d = a << d

        if b[c] == '+':
            a = a + d & (pow(2, 32) - 1)
        else:
            a = a ^ d

    return a


def _TL(a):
    #b = _generateB()
    tkk = _TKK()
    b = tkk[0]

    d = []

    for f in range(0, _mb_strlen(a)):
        g = _char_code_at(a, f)

        if g < 128:
            d.append(g)
        else:
            if g < 2048:
                d.append(g >> 6 | 192)
            else:
                if ((g & 0xfc00) == 0xd800 and
                        f + 1 < _mb_strlen(a) and
                        (_char_code_at(a, f + 1) & 0xfc00) == 0xdc00):

                    f += 1
                    g = 0x10000 + ((g & 0x3ff) << 10) + (_char_code_at(a, f) & 0x3ff)

                    d.append(g >> 18 | 240)
                    d.append(g >> 12 & 63 | 128)
                else:
                    d.append(g >> 12 | 224)
                    d.append(g >> 6 & 63 | 128)

            d.append(g & 63 | 128)

    a = b

    for e in range(0, len(d)):
        a += d[e]
        a = _RL(a, "+-a^+6")

    a = _RL(a, "+-3^+b+-f")

    a = a ^ tkk[1]

    if a < 0:
        a = (a & (pow(2, 31) - 1)) + pow(2, 31)

    a %= pow(10, 6)

    return "%d.%d" % (a, a ^ b)


def get_tk(word):
    """Returns the tk parameter for the given word."""
    if isinstance(word, unicode):
        word = word.encode(_ENCODING)

    return _TL(word)


if __name__ == '__main__':
    # if len(sys.argv) != 2:
    #     print "Usage: %s " % sys.argv[0]
    #     sys.exit(1)
    #
    # print "%s=%s" % (sys.argv[1], get_tk(sys.argv[1]))

    print get_tk("你 好")

调用谷歌翻译的大致流程如下:

# 生成tk
tk = get_tk(content)
url = "https://translate.google.com/translate_a/single?client=webapp&sl=auto&tl=en&hl=zh-CN&dt=at&dt=bd&dt=ex&dt=ld&dt=md&dt=qca&dt=rw&dt=rm&dt=ss&dt=t&ssel=5&tsel=5&kc=1&tk={}&q={}".format(tk, content)

header = {"Referer": "https://translate.google.com/",
                  "User-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36",
                  "sec-fetch-site": "same-origin",
                  "sec-fetch-mode": "cors",
                  "x-client-data": "CJG2yQEIorbJAQjBtskBCKmdygEIqKPKAQjiqMoBCJetygEIza3KAQjyrcoBCMuuygEIyq/KAQ==",
                  "cookie": "ANID=AHWqTUmEUOyWZxeXZaHhE0-5lGBTHOVeMH3NP6elG0Zi4kx3VkGky6oE7hNyEZk_; _ga=GA1.3.123605499.1565587980; _gid=GA1.3.1811799555.1565587980; NID=188=nikTUMCm2LqPiIFv2Qr6mAjrzE6jONn4_LBFYRYhdDPrXaNU5KA6336tSkKY67wWHbbLAYwL86RuyqqJE1bWp6ltY-RS27-hjNxx0F2GN5pF2NKJ9mDT25hPU5XBEMjllYomR9TvdBR7u0Lig-Mra7to_bFGWWW2nq3wog9TAdA; 1P_JAR=2019-08-12-09"
                  }
resp = self._session.get(url, headers=header, timeout=30)
resp.raise_for_status()
if resp.status_code == 200:
   data = self._xpather.parse(url, resp.content)
   print data

最好带上请求头去请求,还有一个问题,当要翻译的content数据量过长时,请求方式使用post,不要使用get,content作为请求体。

实测可用翻译,但如果想做批量短时间翻译,使用上面方法还是不行,上面的方式翻译时会出现短时间内两虚翻译了20次左右时,ip被封的情况,封禁时间大概5分钟左右,之后自动解封。如果想短时间内批量调用,可用尝试使用代理池,我也用代理池试过,可能是代理的问题,使用代理访问不了这个接口,有兴趣的朋友可用试试有没有更好的办法

你可能感兴趣的:(Google翻译接口调用)