声明:以下内容,仅为爬虫初学爱好者提供思路,禁止任何商业用途
先分析流程:
pc端的百度翻译,
post请求体携带一下内容
from:zh
to:en
query:今天天气不错
transtype:enters
imple_means_flag:3
sign:728535.1048294
token:57815b74809f509d4c8d2c3b6f66f622
sign 目前找的比较费劲,就尝试,使用手机端。
手机端百度翻译
post请求体携带以下内容
query:今天天气不错
from:zh
to:en
所以就选择模拟手机端登陆。简单,大方
分析页面,
文件:langdetect
发现发送post请求到:http://fanyi.baidu.com/langdetect
会自动识别语言类型,返回值{"error":0,"msg":"success","lan":"zh"} zh表示中文
文件:basetrans
发现发送post请求到:http://fanyi.baidu.com/basetrans
会返回翻译后的内容
{"errno":0,"from":"zh","to":"en","trans":[{"dst":"It's a nice day today","prefixWrap":0,"src":"\u4eca\u5929\u5929\u6c14\u4e0d\u9519","relation":[],"result":[[0,"It's a nice day today",["0|18"],[],["0|18"],["0|21"]]]}],"dict":[],"keywords":[{"means":["right","correct","not bad","ok"],"word":"\u4e0d\u9519"}]}
而我们被翻译的内容,则是it's a nice day
思路理清了,就开始上代码吧~
import requests
import json
search_data = '今天天气不错'
headers = {
'User-Agent': 'Mozilla/5.0 (iPhone; CPU iPhone OS 9_1 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko) Version/9.0 Mobile/13B143 Safari/601.1'}
url = 'http://fanyi.baidu.com/basetrans'
data = {'query': search_data,
'from': 'zh',
'to': 'en'}
response = requests.post(url, headers=headers, data=data)
res = response.content.decode()
real_data = json.loads(res)['trans'][0]['dst']
print("{}的翻译结果是:{}".format(search_data, real_data))
上面的代码,没有进行中英文的识别功能,仅仅是直接翻译。
下面再上一段带中英文自动识别功能的代码
import requests
import json
class translation():
"""翻译"""
def __init__(self, search):
self.search = search
self.url = 'http://fanyi.baidu.com/basetrans'
self.headers = {
'User-Agent': 'Mozilla/5.0 (iPhone; CPU iPhone OS 9_1 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko) Version/9.0 Mobile/13B143 Safari/601.1'
}
def langdetect(self):
"""中英文判断"""
detect_url = 'http://fanyi.baidu.com/langdetect'
set_data = {'query': self.search}
# 获取返回的response的json数据
response = requests.post(detect_url, headers=self.headers, data=set_data)
detect_json = response.content.decode()
# 获取文本类型:zh or en
detect = json.loads(detect_json)['lan']
# 判断文本类型,返回对应的转换data内容
to = 'zh' if detect =='en' else 'en'
search_data = {'query': self.search,
'from': detect,
'to': to}
return search_data
def request_data(self,search_data):
response = requests.post(self.url, headers=self.headers, data=search_data).content.decode()
return response
def parse_data(self, response):
data_dict = json.loads(response)
data = data_dict['trans'][0]['dst']
print("{}的翻译结果是:{}".format(self.search, data))
def run(self):
"""操作方法"""
# 1判断请求文本类型,组织请求体(post)数据
search_data = self.langdetect()
# 2 post请求,获取response
response = self.request_data(search_data)
# 3 接受返回的json数据,进行解析,显示翻译后的文本内容
self.parse_data(response)
if __name__ == '__main__':
translation = translation('今天天气不错')
translation.run()