Requests: HTTP for Humans
requests
利用requests发送请求
以github API为例
https://developer.github.com/
请求方法
example
response = requests.get(build_uri('user/emails'),auth = ('[username]','[password]'))
带参数的请求
url参数
表单参数提交
response = requests.get(build_uri('users'), params={'since': 11})
- json参数提交
response = requests.patch(build_uri('user'), auth=(
'[username]', '[password]'), json={'name': '[new name]'})
请求异常处理
- 请求超时处理
try:
response = requests.get(build_uri('user/emails'), timeout=10)
response.raise_for_status()
except exceptions.Timeout as e:
print e.message
except exceptions.HTTPError as e:
print e.message
else:
print_response(response)
自定义Request
from requests import Request, Session
s = Session()
headers = {'User-Agent': 'fake1.3.4'}
req = Request('GET', build_uri('user/emails'),
auth=('[username]', '[password]'), headers=headers)
prepped = req.prepare()
print prepped.body
print prepped.headers
resp = s.send(prepped,timeout = 5)
print resp.request.headers
print resp.text
响应请求 response
响应基本api
- status_code
- reason
code | reason |
---|---|
2XX | 成功 |
3XX | 重定向 |
4XX | 客户端错误 |
5XX | 服务器错误 |
- headers
- url
- history
- elapsed
- request
与响应主体有关 - encoding
- raw
- content
- text
- json
可在python交互模式下获得response,并进行查看相关的属性结果
requests库下载图片
- 利用爬虫自动下载图片
- 远程下载服务器上的文本文件
事件钩子(Event Hooks)
非线性处理事件
def get_key_info(response, *args, **kwargs):
print response.headers['Content-Type']
def main():
requests.get('https://api.github.com', hooks=dict(response=get_key_info))
进阶话题
HTTP认证
- oauth认证
from requests.auth import AuthBase
class GithubAuth(AuthBase):
def __init__(self, token):
self.token = token
def __call__(self, r):
r.headers['Authorization'] = ' '.join(['token', self.token])
return r
def oauth_advanced():
auth = GithubAuth('db8c8e8279b82af279b25c3a68168efe1f341f90')
response = requests.get(construct_url('user/emails'), auth=auth)
print response.status_code, response.reason
print response.text
代理(Proxy)
翻墙原理
- 启动代理服务Heroku(类似阿里云)
2, 在主机1080端口启动Socks服务 - 将请求转发大1080端口
- 获取相应资源
pip install requests[socksv5]
import requests
proxies = {'http':'socks5://127.0.0.1:1080','https':'socks5://127.0.0.1:1080'}
response = requests.get(url,proxies = proxies,timeout = 10)
Session和Cookie
Session 服务器端存储信息
Cookie 浏览器端存储信息
- Cookie
r = requests.get(url)
r.cookies['cookiename']
#解析
cookies = dict(c = 'uid')
requests.get(url,cookies = cookies)
存储在浏览器端不够安全,而且带宽占用较大
- session
在浏览器中的cookie中仅存储session-id,相关信息存放在服务器中,解决了带宽和安全的问题
Think with the first principle
多动动脑子去想
慕课网学习视频地址:http://www.imooc.com/video/13160
我的代码:https://github.com/lamchan2844/mypython/tree/master/requests