Python爬虫02—请求模块

Requests模块

  • 一、响应对象Response的方法
  • 二、发送post请求(有道翻译)
  • 三、Requests设置代理
  • 四、处理不信任的SSL证书
  • 五、cookie
    • 5.1 模拟登陆
    • 5.2 反爬
  • 六、session案例:攻克12306图片验证码
  • 七、json数据
    • 6.1 json.dumps()
    • 6.2 json.loads()
    • 6.3 json.dump()
    • 6.4 json.load()

一、响应对象Response的方法

response.text 返回unicode格式的数据(str)
response.content 返回字节流数据(二进制)
response.content.decode(‘utf-8’) 手动进行解码
response.url 返回url
response.encode() = ‘编码’

import requests
url='https://www.baidu.com/s?'
headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36'}
wd={'wd':'中国'}
res=requests.get(url,params=wd,headers=headers)
print(res.url)   # 返回请求的url
# https://www.baidu.com/s?wd=%E4%B8%AD%E5%9B%BD

获取网页源代码

import requests
res=requests.get('https://qq.yh31.com/zjbq/2920180.html')
print(res.text)      # 中文乱码
print(res.content.decode('utf-8'))  # 手动解码

二、发送post请求(有道翻译)

导入模块

import requests
import json
key=input('请输入:')
data = {
    'i': key,
    'from': 'AUTO',
    'smartresult': 'dict',
    'client': 'fanyideskweb',
    'salt': '15880623642174',
    'sign': 'c6c2e897040e6cbde00cd04589e71d4e',
    'ts': '1588062364217',
    'bv': '42160534cfa82a6884077598362bbc9d',
    'doctype': 'json',
    'version': '2.1',
    'keyfrom':'fanyi.web',
    'action': 'FY_BY_CLICKBUTTION'}

发起请求获取响应
url需要去掉_o

url = 'http://fanyi.youdao.com/translate?smartresult=dict&smartresult=rule'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36'}

res = requests.post(url,data=data,headers=headers)
res.encoding = 'utf-8'
html = res.text
# 或 html=res.content.decode('utf-8')

把json类型的字符串转换成字典 json.loads()

j_s=json.loads(html)
trans=j_s['translateResult'][0][0]['tgt']
print(trans)

三、Requests设置代理

使用requests添加代理只需要在请求方法中(get/post)传递proxies参数就可以

import requests
proxy = {'http':'210.5.10.87:53281'}
url = 'http://httpbin.org/ip'
res = requests.get(url,proxies=proxy)
print(res.text)

四、处理不信任的SSL证书

import requests
url = 'https://inv-veri.chinatax.gov.cn/'
res = requests.get(url,verify=False)
print(res.text)

五、cookie

5.1 模拟登陆

headers={‘user-agent’:’ ‘,‘cookie’:’ '}

import requests
url='https://www.zhihu.com/hot'
headers={'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.61 Safari/537.36','cookie':'_xsrf=HhFFHW2e18SHGJysSEldfFJhOnQgwZtm; _ga=GA1.2.1643043222.1582898541; _zap=69fc1d67-3c36-46e9-b776-ca81c3cbdc6e; d_c0="AOCWZ28a4xCPTtTgQ22oRntoS8YGLD5AE70=|1582898548"; Hm_lvt_98beee57fd2ef70ccdd5ca52b9740c49=1594991218; capsion_ticket="2|1:0|10:1594991218|14:capsion_ticket|44:MzA2NTY5MmU1YjM0NDQ5NTliYTdiZGI0OTMyNzMxMmY=|7b5f8e684d278775b371095598e7c4d84c62a1f9dead1f9f5659d3c14e84c424"; _gid=GA1.2.1766196486.1594991218; SESSIONID=H4yPXqrpqdQbkFmIyyub4tvDIAHynY7FPYRxtjIYjAN; JOID=U10UBU7xUdk9ckomYfbBAKgkAbFwvWGnbi4_UBaEMYxYGghPLIfwzGVySiVlZ8qwlebLgmZmqgmbWc75Vq1emx4=; osd=UF0VAEryUdg4dkkmYPPFA6glBLVzvWCiai0_UROAMoxZHwxMLIb1yGZySyBhZMqxkOLIgmdjrgqbWMv9Va1fnho=; z_c0="2|1:0|10:1594991276|4:z_c0|92:Mi4xaXZtNEJ3QUFBQUFBNEpabmJ4cmpFQ2NBQUFDRUFsVk5yQzg1WHdESS1fbHFrZEZDS2hVYm9XVjlJVUFId2tjMGJn|333502804e98f368694353c8eb98dce47bb75376068983ccc5fe1d7b3dcf71ae"; tst=h; tshl=; Hm_lpvt_98beee57fd2ef70ccdd5ca52b9740c49=1594991764; KLBRSID=53650870f91603bc3193342a80cf198c|1594991765|1594991216'}
res=requests.get(url,headers=headers)
print(res.text)

5.2 反爬

import requests
url = 'https://kyfw.12306.cn/otn/leftTicket/query?leftTicketDTO.train_date=2020-07-18&leftTicketDTO.from_station=CSQ&leftTicketDTO.to_station=CDW&purpose_codes=ADULT'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36','Cookie':'_uab_collina=159490169403897938828076; JSESSIONID=684F4F9AD3A7D8404FE6302F21864B1C; _jc_save_wfdc_flag=dc; _jc_save_fromStation=%u957F%u6C99%2CCSQ; _jc_save_toStation=%u6210%u90FD%2CCDW; BIGipServerotn=737149450.24610.0000; BIGipServerpool_passport=200081930.50215.0000; RAIL_EXPIRATION=1595165763460; RAIL_DEVICEID=BkXgC1KyMVCe_awQSAtB9ZkITShtJnxgl4CZUIw30CwahBiaWurpYaOtRYDCSCAbZFDdJXRjrhEx4DuatTj00UWBaR_J-HgaNoXKLr35hntXTtSg6Hnkt4L1qQRUINcq_bvoaB15Ks3FfqSh_3mNztyV97MtxUal; route=495c805987d0f5c8c84b14f60212447d; _jc_save_toDate=2020-07-16; _jc_save_fromDate=2020-07-18'}
res = requests.get(url,headers=headers)
print(res.content.decode('utf-8'))

六、session案例:攻克12306图片验证码

保持会话

import requests
req=requests.session()
headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36'}
def login():
# 拿到验证码图片
    res=req.get('https://kyfw.12306.cn/passport/captcha/captcha-image?login_site=E&module=login&rand=sjrand')
    pic=res.content
    with open('yzm.png','wb') as f:
        f.write(pic)
        
    s=input('请输入验证码坐标:')
    data={
        'answer': s,
        'rand': 'sjrand',
        'login_site':'E'
    }
    response=req.post('https://kyfw.12306.cn/passport/captcha/captcha-check',data=data,headers=headers)
    print(response.text)

login()

七、json数据

import json
s='pyt'

6.1 json.dumps()

python数据类型 --> json类型字符串

r=json.dumps(s)
print(r)    # "pyt"

6.2 json.loads()

json数据类型 --> python类型字符串

print(json.loads(r))  # pyt

6.3 json.dump()

python数据类型 --> json文件字符串

json.dump(s,open('json.txt','w')

6.4 json.load()

json类型文件–>python数据类型

print(json.load(open('json.txt','r')))

你可能感兴趣的:(Python爬虫,python)