爬虫--百度翻译(自动识别中英文)

声明:以下内容,仅为爬虫初学爱好者提供思路,禁止任何商业用途


先分析流程:

pc端的百度翻译,

post请求体携带一下内容

from:zh

to:en

query:今天天气不错

transtype:enters

imple_means_flag:3

sign:728535.1048294

token:57815b74809f509d4c8d2c3b6f66f622

sign 目前找的比较费劲,就尝试,使用手机端。

手机端百度翻译

post请求体携带以下内容

    query:今天天气不错
    from:zh

    to:en

所以就选择模拟手机端登陆。简单,大方


分析页面,

文件:langdetect

发现发送post请求到:http://fanyi.baidu.com/langdetect

会自动识别语言类型,返回值{"error":0,"msg":"success","lan":"zh"} zh表示中文


文件:basetrans

发现发送post请求到:http://fanyi.baidu.com/basetrans

会返回翻译后的内容

{"errno":0,"from":"zh","to":"en","trans":[{"dst":"It's a nice day today","prefixWrap":0,"src":"\u4eca\u5929\u5929\u6c14\u4e0d\u9519","relation":[],"result":[[0,"It's a nice day today",["0|18"],[],["0|18"],["0|21"]]]}],"dict":[],"keywords":[{"means":["right","correct","not bad","ok"],"word":"\u4e0d\u9519"}]}

而我们被翻译的内容,则是it's  a nice day

思路理清了,就开始上代码吧~

import requests
import json

search_data = '今天天气不错'
headers = {
    'User-Agent': 'Mozilla/5.0 (iPhone; CPU iPhone OS 9_1 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko) Version/9.0 Mobile/13B143 Safari/601.1'}
url = 'http://fanyi.baidu.com/basetrans'
data = {'query': search_data,
        'from': 'zh',
        'to': 'en'}

response = requests.post(url, headers=headers, data=data)
res = response.content.decode()
real_data = json.loads(res)['trans'][0]['dst']
print("{}的翻译结果是:{}".format(search_data, real_data))

上面的代码,没有进行中英文的识别功能,仅仅是直接翻译。

下面再上一段带中英文自动识别功能的代码

import requests
import json


class translation():
    """翻译"""
    def __init__(self, search):
        self.search = search
        self.url = 'http://fanyi.baidu.com/basetrans'
        self.headers = {
            'User-Agent': 'Mozilla/5.0 (iPhone; CPU iPhone OS 9_1 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko) Version/9.0 Mobile/13B143 Safari/601.1'
                            }


    def langdetect(self):
        """中英文判断"""
        detect_url = 'http://fanyi.baidu.com/langdetect'
        set_data = {'query': self.search}
        # 获取返回的response的json数据
        response = requests.post(detect_url, headers=self.headers, data=set_data)
        detect_json = response.content.decode()
        # 获取文本类型:zh or en
        detect = json.loads(detect_json)['lan']
        # 判断文本类型,返回对应的转换data内容
        to = 'zh' if detect =='en' else 'en'
        search_data = {'query': self.search,
                        'from': detect,
                        'to': to}
        return search_data

    def request_data(self,search_data):
        response = requests.post(self.url, headers=self.headers, data=search_data).content.decode()
        return response

    def parse_data(self, response):
        data_dict = json.loads(response)
        data = data_dict['trans'][0]['dst']
        print("{}的翻译结果是:{}".format(self.search, data))


    def run(self):
        """操作方法"""
        # 1判断请求文本类型,组织请求体(post)数据
        search_data = self.langdetect()
        # 2 post请求,获取response
        response = self.request_data(search_data)
        # 3 接受返回的json数据,进行解析,显示翻译后的文本内容
        self.parse_data(response)


if __name__ == '__main__':
    translation = translation('今天天气不错')
    translation.run()


你可能感兴趣的:(spider)