requests是使用Apache2 licensed 许可证的HTTP库。比urllib模块更简洁。
Request支持HTTP连接保持和连接池,支持使用cookie保持会话,支持文件上传,支持自动响应内容的编码,支持国际化的URL和POST数据自动编码。
在python内置模块的基础上进行了高度的封装,从而使得python进行网络请求时,变得人性化,使用Requests可以轻而易举的完成浏览器可有的任何操作。
requests是第三方库,需要独立安装:pip install requests。requests是基于urllib编写的,并且使用起来非常方便,个人推荐使用requests。
官方中文教程地址:http://docs.python-requests.org/zh_CN/latest/user/quickstart.html
学习之前推荐一个非常好的http测试网站:http://httpbin.org,提供非常非常完善的接口调试、测试功能~
requests支持http的各种请求,比如:
GET: 请求指定的页面信息,并返回实体主体。
HEAD: 只请求页面的首部。
POST: 请求服务器接受所指定的文档作为对所标识的URI的新的从属实体。
PUT: 从客户端向服务器传送的数据取代指定的文档的内容。
DELETE: 请求服务器删除指定的页面。
OPTIONS: 允许客户端查看服务器的性能。
访问baidu,获取一些基本信息:
import requests
response = requests.get("https://www.baidu.com")# 打开网页获取响应
print('response:', type(response))# 打印响应类型,response:
print('status_code:', response.status_code)# 打印状态码 ,status_code: 200
print('cookie:', response.cookies)# 打印cookie ,cookie: ]>
print(type(response.text)) # 打印字符串形式的json响应体的类型 ,< class 'str'>
print('text:', response.text) # 打印字符串形式的响应体 ,text: >ç™»å½...•
print('二进制content:', response.content) # 二进制content, b'\r\n\xe7\x99\xbb\xe5\xbd\x95... \r\n'
print('content:', response.content.decode("utf-8")) # content: 登录...
请求后响应的内容是requests.models.Response对象,需要处理后才能得到我们需要的信息。
requests自动检测编码,可以使用encoding
属性查看。
无论响应是文本还是二进制内容,我们都可以用content
属性获得bytes
对象:
其实使用requset.text避免乱码的方式还有一个,就是发出请求后,获取内容之前使用response.encoding属性来改变编码,例如:
response =requests.get("http://www.baidu.com")
#设置响应内容的编码方式为utf-8
response.encoding="utf-8"
print(response.text)
requests.get(url=url, headers=headers, params=params)
对于带参数的URL,传入一个dict作为params
参数,如果值为None的键不会被添加到url中。
import requests
#将参数写在字典里,通过params传入,params接受字典或序列
data = {
"name": "hanson",
"age": 24
}
response = requests.get("http://httpbin.org/get", params=data) #发出一个get请求,获得响应
print(response.url) #打印url
print(response.text) #打印响应内容
结果为:
http://httpbin.org/get?name=hanson&age=24
{
"args": {
"age": "24",
"name": "hanson"
},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Host": "httpbin.org",
"User-Agent": "python-requests/2.22.0",
"X-Amzn-Trace-Id": "Root=1-5e71bb9d-79cfc9e0195befa018426f20"
},
"origin": "218.106.132.130",
"url": "http://httpbin.org/get?name=hanson&age=24"
}
requests的方便之处还在于,对于特定类型的响应,例如JSON,可以直接获取:
requests里的json方法就是封装了json.loads方法。
import requests
import json
# 发出一个get请求
response = requests.get("http://httpbin.org/get")
# text响应类型
print(type(response.text))
# 直接解析响应json(成字典)
print(response.json())
# 获取响应内容后json进行解析(成字典)
print(json.loads(response.text))
# 直接解析后的相应内容类型
print(type(response.json()))
控制台打印结果:
<class 'str'>
{'args': {}, 'headers': {'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate', 'Connection': 'close', 'Host': 'httpbin.org', 'User-Agent': 'python-requests/2.18.4'}, 'origin': '124.74.47.82', 'url': 'http://httpbin.org/get'}
{'args': {}, 'headers': {'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate', 'Connection': 'close', 'Host': 'httpbin.org', 'User-Agent': 'python-requests/2.18.4'}, 'origin': '124.74.47.82', 'url': 'http://httpbin.org/get'}
< class 'dict'>
需要传入HTTP Header时,我们传入一个dict作为headers参数:
添加头信息访问:
import requests
# 添加头部信息
headers = {
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
}
# 发送请求
response = requests.get("https://www.zhihu.com", headers=headers)
# 打印响应
print(response.text)
equests对Cookie做了特殊处理,使得我们不必解析Cookie就可以轻松获取指定的Cookie:
要在请求中传入Cookie,只需准备一个dict传入cookies
参数:
header = {'user-agent': 'my-app/0.0.1''}
cookie = {'key':'value'}
#发送请求
response = requests.get/post('your url',headers=header,cookies=cookie)
#打印cookie
print(response.cookies)
for key, value in response.cookies.items():
print(key + "=" + value)
requests.post(url=url, headers=headers, data=params)
要发送POST请求,只需要把get()方法变成post(),然后传入data参数作为POST请求的数据:
import requests
#参数写在字典里
data = {
"name": "hason",
"age": 23
}
#请求时将字典参数赋给data参数
response = requests.post("http://httpbin.org/post", data=data)
#打印响应
print(response.text)
打印结果:
{
"args": {},
"data": "",
"files": {},
"form": {
"age": "23",
"name": "zhaofan"
},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Content-Length": "19",
"Content-Type": "application/x-www-form-urlencoded",
"Host": "httpbin.org",
"User-Agent": "python-requests/2.18.4"
},
"json": null,
"origin": "124.74.47.82, 124.74.47.82",
"url": "https://httpbin.org/post"
}
requests默认使用application/x-www-form-urlencoded
对POST数据编码。如果要传递JSON数据,可以直接传入json参数:
params = {'key': 'value'}
r = requests.post(url, json=params) # 内部自动序列化为JSON
文件上传需要用到请求参数里的files
参数:
在读取文件时,注意务必使用'rb'
即二进制模式读取,这样获取的bytes
长度才是文件的长度。
import requests
# rb,以只读的方式打开二进制文件
files = {"files": open("a.jpg", "rb")}
# 发送post请求携带文件
response = requests.post("http://httpbin.org/post", files=files)
# 响应内容
print(response.text)
响应结果:
{
"args": {},
"data": "",
"files": {
"files": ""
},
"form": {},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Content-Length": "145",
"Content-Type": "multipart/form-data; boundary=75c9d62b8f1248a9b6a89741143836b5",
"Host": "httpbin.org",
"User-Agent": "python-requests/2.18.4"
},
"json": null,
"origin": "124.74.47.82, 124.74.47.82",
"url": "https://httpbin.org/post"
}
request更加方便的是,可以把字符串当作文件进行上传:
import requests
url = 'http://127.0.0.1:8080/upload'
files = {'file': ('test.txt', b'Hello Requests.')} #必需显式的设置文件名
r = requests.post(url, files=files)
print(r.text)
1、session会话维持
会话对象requests.Session能够跨请求地保持某些参数,比如cookies,即在同一个Session实例发出的所有请求都保持同一个cookies,而requests模块每次会自动处理cookies,这样就很方便地处理登录时的cookies问题。
import requests
headers = {'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Encoding': 'gzip, deflate, compress',
'Accept-Language': 'en-us;q=0.5,en;q=0.3',
'Cache-Control': 'max-age=0',
'Connection': 'keep-alive',
'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:22.0) Gecko/20100101 Firefox/22.0'}
#创建session对象
s = requests.Session()
s.headers.update(headers)#使用session访问并设置number参数
s.get("http://httpbin.org/cookies/set/number/123456")
#session对象再次访问,获取响应内容
response = s.get("http://httpbin.org/cookies")
print(response.text)
auth:认证,接受元祖
基本身份认证(HTTP Basic Auth)
import requests
from requests.auth import HTTPBasicAuth
r = requests.get('https://httpbin.org/hidden-basic-auth/user/passwd', auth=HTTPBasicAuth('user', 'passwd'))
print(r.json())
简写:
response = requests.get("http://120.27.34.24:9001/",auth=("user","123"))
另一种非常流行的HTTP身份认证形式是摘要式身份认证,Requests对它的支持也是开箱即可用的:
requests.get(URL, auth=HTTPDigestAuth('user', 'pass')
proxies = {'http':'ip1','https':'ip2' }
requests.get('url',proxies=proxies)
如果代理需要用户名和密码,则需要这样:
proxies = {
"http": "http://user:[email protected]:3128/",
}
现在的很多网站都是https的方式访问,所以这个时候就涉及到证书的问题
例如访问12306:
import requests
response = requests.get("https:/www.12306.cn")
print(response.status_code)
会报错,证书错误
解决:加上verify=false(默认是true)
import requests
#from requests.packages import urllib3
#urllib3.disable_warnings()
response = requests.get("https://www.12306.cn", verify=False)
print(response.status_code)
timeout,单位:毫秒
r = requests.get('url',timeout=1) #设置秒数超时,仅对于连接有效
使用GET或OPTIONS时,Requests会自动处理位置重定向。
Github将所有的HTTP请求重定向到HTTPS。可以使用响应对象的 history 方法来追踪重定向。 我们来看看Github做了什么:
r = requests.get('http://github.com')
>>> r.url
'https://github.com/'
>>> r.status_code
200
>>> r.history
[]
Response.history 是一个:class:Request 对象的列表,为了完成请求而创建了这些对象。这个对象列表按照从最老到最近的请求进行排序。
如果你使用的是GET或OPTIONS,那么你可以通过 allow_redirects 参数禁用重定向处理:
r = requests.get('http://github.com', allow_redirects=False)
>>> r.status_code
301
>>> r.history
[]
所有的异常都是在requests.excepitons中:
示例:
import requests
from requests.exceptions import ReadTimeout,ConnectionError,RequestException
try:
response = requests.get("http://httpbin.org/get",timout=0.1)
print(response.status_code)
except ReadTimeout:
print("timeout")
except ConnectionError:
print("connection Error")
except RequestException:
print("error")
测试可以发现,首先被捕捉的异常是timeout超时异常,当把网络断掉就会捕捉到ConnectionError连接异常,如果前面异常都没有捕捉到,最后也可以通过RequestExctption捕捉到。
import urllib.parse
import urllib.request
url = "https://api.douban.com/v2/event/list"
params = urllib.parse.urlencode({'loc':'108288','day_type':'weekend','type':'exhibition'})
print(">>>>>>request params is:")
print(params)
# 发送请求
response = urllib.request.urlopen('?'.join([url, params]))
# 处理响应
print(">>>>>>Response Headers:")
print(dict(response.info()))
print(">>>>>>Status Code:")
print(response.getcode())
print(">>>>>>Response body:")
print(response.read().decode())
import requests
url = "https://api.douban.com/v2/event/list"
params = {'loc':'108288','day_type':'weekend','type':'exhibition'}
print(">>>>>>request params is:")
print(params)
# 发送请求
response = requests.get(url=url,params=params)
# 处理响应
print(">>>>>>Response Headers:")
print(response.headers)
print(">>>>>>Status Code:")
print(response.status_code)
print(">>>>>>Response body:")
print(response.text)
#!/usr/bin/env python
# -*- coding:utf-8 -*-
import requests
# ############## 方式一 ##############
"""
# ## 1、首先登陆任何页面,获取cookie
i1 = requests.get(url="http://dig.chouti.com/help/service")
i1_cookies = i1.cookies.get_dict()
# ## 2、用户登陆,携带上一次的cookie,后台对cookie中的 gpsd 进行授权
i2 = requests.post(
url="http://dig.chouti.com/login",
data={
'phone': "8615131255089",
'password': "xxooxxoo",
'oneMonth': ""
},
cookies=i1_cookies
)
# ## 3、点赞(只需要携带已经被授权的gpsd即可)
gpsd = i1_cookies['gpsd']
i3 = requests.post(
url="http://dig.chouti.com/link/vote?linksId=8589523",
cookies={'gpsd': gpsd}
)
print(i3.text)
"""
# ############## 方式二 ##############
"""
import requests
session = requests.Session()
i1 = session.get(url="http://dig.chouti.com/help/service")
i2 = session.post(
url="http://dig.chouti.com/login",
data={
'phone': "8615131255089",
'password': "xxooxxoo",
'oneMonth': ""
}
)
i3 = session.post(
url="http://dig.chouti.com/link/vote?linksId=8589523"
)
print(i3.text)
"""
#!/usr/bin/env python
# -*- coding:utf-8 -*-
import requests
from bs4 import BeautifulSoup
# ############## 方式一 ##############
#
# # 1. 访问登陆页面,获取 authenticity_token
# i1 = requests.get('https://github.com/login')
# soup1 = BeautifulSoup(i1.text, features='lxml')
# tag = soup1.find(name='input', attrs={'name': 'authenticity_token'})
# authenticity_token = tag.get('value')
# c1 = i1.cookies.get_dict()
# i1.close()
#
# # 1. 携带authenticity_token和用户名密码等信息,发送用户验证
# form_data = {
# "authenticity_token": authenticity_token,
# "utf8": "",
# "commit": "Sign in",
# "login": "[email protected]",
# 'password': 'xxoo'
# }
#
# i2 = requests.post('https://github.com/session', data=form_data, cookies=c1)
# c2 = i2.cookies.get_dict()
# c1.update(c2)
# i3 = requests.get('https://github.com/settings/repositories', cookies=c1)
#
# soup3 = BeautifulSoup(i3.text, features='lxml')
# list_group = soup3.find(name='div', class_='listgroup')
#
# from bs4.element import Tag
#
# for child in list_group.children:
# if isinstance(child, Tag):
# project_tag = child.find(name='a', class_='mr-1')
# size_tag = child.find(name='small')
# temp = "项目:%s(%s); 项目路径:%s" % (project_tag.get('href'), size_tag.string, project_tag.string, )
# print(temp)
# ############## 方式二 ##############
# session = requests.Session()
# # 1. 访问登陆页面,获取 authenticity_token
# i1 = session.get('https://github.com/login')
# soup1 = BeautifulSoup(i1.text, features='lxml')
# tag = soup1.find(name='input', attrs={'name': 'authenticity_token'})
# authenticity_token = tag.get('value')
# c1 = i1.cookies.get_dict()
# i1.close()
#
# # 1. 携带authenticity_token和用户名密码等信息,发送用户验证
# form_data = {
# "authenticity_token": authenticity_token,
# "utf8": "",
# "commit": "Sign in",
# "login": "[email protected]",
# 'password': 'xxoo'
# }
#
# i2 = session.post('https://github.com/session', data=form_data)
# c2 = i2.cookies.get_dict()
# c1.update(c2)
# i3 = session.get('https://github.com/settings/repositories')
#
# soup3 = BeautifulSoup(i3.text, features='lxml')
# list_group = soup3.find(name='div', class_='listgroup')
#
# from bs4.element import Tag
#
# for child in list_group.children:
# if isinstance(child, Tag):
# project_tag = child.find(name='a', class_='mr-1')
# size_tag = child.find(name='small')
# temp = "项目:%s(%s); 项目路径:%s" % (project_tag.get('href'), size_tag.string, project_tag.string, )
# print(temp)
#!/usr/bin/env python
# -*- coding:utf-8 -*-
import time
import requests
from bs4 import BeautifulSoup
session = requests.Session()
i1 = session.get(
url='https://www.zhihu.com/#signin',
headers={
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.98 Safari/537.36',
}
)
soup1 = BeautifulSoup(i1.text, 'lxml')
xsrf_tag = soup1.find(name='input', attrs={'name': '_xsrf'})
xsrf = xsrf_tag.get('value')
current_time = time.time()
i2 = session.get(
url='https://www.zhihu.com/captcha.gif',
params={'r': current_time, 'type': 'login'},
headers={
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.98 Safari/537.36',
})
with open('zhihu.gif', 'wb') as f:
f.write(i2.content)
captcha = input('请打开zhihu.gif文件,查看并输入验证码:')
form_data = {
"_xsrf": xsrf,
'password': 'xxooxxoo',
"captcha": 'captcha',
'email': '[email protected]'
}
i3 = session.post(
url='https://www.zhihu.com/login/email',
data=form_data,
headers={
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.98 Safari/537.36',
}
)
i4 = session.get(
url='https://www.zhihu.com/settings/profile',
headers={
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.98 Safari/537.36',
}
)
soup4 = BeautifulSoup(i4.text, 'lxml')
tag = soup4.find(id='rename-section')
nick_name = tag.find('span',class_='name').string
print(nick_name)
#!/usr/bin/env python
# -*- coding:utf-8 -*-
import re
import json
import base64
import rsa
import requests
def js_encrypt(text):
b64der = 'MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQCp0wHYbg/NOPO3nzMD3dndwS0MccuMeXCHgVlGOoYyFwLdS24Im2e7YyhB0wrUsyYf0/nhzCzBK8ZC9eCWqd0aHbdgOQT6CuFQBMjbyGYvlVYU2ZP7kG9Ft6YV6oc9ambuO7nPZh+bvXH0zDKfi02prknrScAKC0XhadTHT3Al0QIDAQAB'
der = base64.standard_b64decode(b64der)
pk = rsa.PublicKey.load_pkcs1_openssl_der(der)
v1 = rsa.encrypt(bytes(text, 'utf8'), pk)
value = base64.encodebytes(v1).replace(b'\n', b'')
value = value.decode('utf8')
return value
session = requests.Session()
i1 = session.get('https://passport.cnblogs.com/user/signin')
rep = re.compile("'VerificationToken': '(.*)'")
v = re.search(rep, i1.text)
verification_token = v.group(1)
form_data = {
'input1': js_encrypt('wptawy'),
'input2': js_encrypt('asdfasdf'),
'remember': False
}
i2 = session.post(url='https://passport.cnblogs.com/user/signin',
data=json.dumps(form_data),
headers={
'Content-Type': 'application/json; charset=UTF-8',
'X-Requested-With': 'XMLHttpRequest',
'VerificationToken': verification_token}
)
i3 = session.get(url='https://i.cnblogs.com/EditDiary.aspx')
print(i3.text)
#!/usr/bin/env python
# -*- coding:utf-8 -*-
import requests
# 第一步:访问登陆页,拿到X_Anti_Forge_Token,X_Anti_Forge_Code
# 1、请求url:https://passport.lagou.com/login/login.html
# 2、请求方法:GET
# 3、请求头:
# User-agent
r1 = requests.get('https://passport.lagou.com/login/login.html',
headers={
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36',
},
)
X_Anti_Forge_Token = re.findall("X_Anti_Forge_Token = '(.*?)'", r1.text, re.S)[0]
X_Anti_Forge_Code = re.findall("X_Anti_Forge_Code = '(.*?)'", r1.text, re.S)[0]
print(X_Anti_Forge_Token, X_Anti_Forge_Code)
# print(r1.cookies.get_dict())
# 第二步:登陆
# 1、请求url:https://passport.lagou.com/login/login.json
# 2、请求方法:POST
# 3、请求头:
# cookie
# User-agent
# Referer:https://passport.lagou.com/login/login.html
# X-Anit-Forge-Code:53165984
# X-Anit-Forge-Token:3b6a2f62-80f0-428b-8efb-ef72fc100d78
# X-Requested-With:XMLHttpRequest
# 4、请求体:
# isValidate:true
# username:15131252215
# password:ab18d270d7126ea65915c50288c22c0d
# request_form_verifyCode:''
# submit:''
r2 = requests.post(
'https://passport.lagou.com/login/login.json',
headers={
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36',
'Referer': 'https://passport.lagou.com/login/login.html',
'X-Anit-Forge-Code': X_Anti_Forge_Code,
'X-Anit-Forge-Token': X_Anti_Forge_Token,
'X-Requested-With': 'XMLHttpRequest'
},
data={
"isValidate": True,
'username': '15131255089',
'password': 'ab18d270d7126ea65915c50288c22c0d',
'request_form_verifyCode': '',
'submit': ''
},
cookies=r1.cookies.get_dict()
)
print(r2.text)