Python网络爬虫实例1:用Python访问有道进行翻译

 

1.打开有道网页,写入文本并点击翻译

2.点击审查元素

Python网络爬虫实例1:用Python访问有道进行翻译_第1张图片

3.点击Network,找到Name中transate一项

Python网络爬虫实例1:用Python访问有道进行翻译_第2张图片

4.点击Headers,找到General中的Request URL

Python网络爬虫实例1:用Python访问有道进行翻译_第3张图片

5.找到From Data这一项

Python网络爬虫实例1:用Python访问有道进行翻译_第4张图片

6.打开Python写脚本

将找到的Request URL 复制放入url中

url = 'http://fanyi.youdao.com/translate?smartresult=dict&smartresult=rule'

将From Data中的元素建立一个字典,冒号前面的存为key,冒号后面的存为value

data = {}

data['i'] = content
data['from'] = 'AUTO'
data['to'] = 'AUTO'
data['smartresult'] = 'dict'
data['client'] = 'fanyideskweb'
data['salt'] = '1532850198388'
data['sign'] = '364e87600cf20d7bdb57c669faa45306'
data['doctype'] = 'json'
data['version'] = '2.1'
data['keyfrom'] = 'fanyi.web'
data['action'] = 'FY_BY_REALTIME'
data['typoResult'] = 'false'

整体代码如下:

import urllib.request
import urllib.parse
import json

content = input('请输入需要翻译的内容:')
url = 'http://fanyi.youdao.com/translate?smartresult=dict&smartresult=rule'
data = {}

data['i'] = content
data['from'] = 'AUTO'
data['to'] = 'AUTO'
data['smartresult'] = 'dict'
data['client'] = 'fanyideskweb'
data['salt'] = '1532850198388'
data['sign'] = '364e87600cf20d7bdb57c669faa45306'
data['doctype'] = 'json'
data['version'] = '2.1'
data['keyfrom'] = 'fanyi.web'
data['action'] = 'FY_BY_REALTIME'
data['typoResult'] = 'false'

data = urllib.parse.urlencode(data).encode('utf-8')
req = urllib.request.Request(url,data)
response = urllib.request.urlopen(req)
html = response.read().decode('utf-8')


target = json.loads(html)
print('翻译的结果是:%s' %  (target['translateResult'][0][0]['tgt']))

7.编译运行

如果出现{"errorCode":50} ,将url中的 _o删去,再次编译运行

运行结果如下:

Python网络爬虫实例1:用Python访问有道进行翻译_第5张图片

8.改进

通过判断Headers下的Request Headers中的User-Agent来判断是代码访问网页还是浏览器访问网页,为了避免被服务器屏蔽可以
8.1添加headers

找到审查元素中,Headers下的User-Agent

Python网络爬虫实例1:用Python访问有道进行翻译_第6张图片

将User-Agent的内容放入代码中

 req.add_header('User-Agent','Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.90 Safari/537.36 2345Explorer/9.3.2.17331 ') 

整体代码如下: 

import urllib.request
import urllib.parse
import json

content = input('请输入需要翻译的内容:')
url = 'http://fanyi.youdao.com/translate?smartresult=dict&smartresult=rule'
data = {}

data['i'] = content
data['from'] = 'AUTO'
data['to'] = 'AUTO'
data['smartresult'] = 'dict'
data['client'] = 'fanyideskweb'
data['salt'] = '1532850198388'
data['sign'] = '364e87600cf20d7bdb57c669faa45306'
data['doctype'] = 'json'
data['version'] = '2.1'
data['keyfrom'] = 'fanyi.web'
data['action'] = 'FY_BY_REALTIME'
data['typoResult'] = 'false'

data = urllib.parse.urlencode(data).encode('utf-8')
req = urllib.request.Request(url,data)
req.add_header('User-Agent','Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.90 Safari/537.36 2345Explorer/9.3.2.17331 ')
response = urllib.request.urlopen(req)
html = response.read().decode('utf-8')


target = json.loads(html)
print('翻译的结果是:%s' %  (target['translateResult'][0][0]['tgt']))

 

8.2代理

iplist = ['118.31.220.3:8080','221.228.17.172:8181','219.141.153.4:80']#代理ip及端口
dict1 = {'http':random.choice(iplist)}

proxy_support = urllib.request.ProxyHandler(dict1)
opener = urllib.request.build_opener(proxy_support)
opener.addheaders = [('User-Agent','Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.90 Safari/537.36 2345Explorer/9.3.2.17331')]
urllib.request.install_opener(opener)

 

你可能感兴趣的:(Python,网络爬虫)