067 Python语法之Requests库

总体介绍

由于原生urllib不好用，所以作者写了这个库

库的地址

http://docs.python-requests.org/en/master

学好Requests的意义

这是一个网络时代
爬虫的利器
服务器编程基础（Restful API）
自动化测试接口(Python + Requests)

环境准备

http://httpbin.org/
pip install gunicorn httpbin
使用gunicorn httpbin:app,可以在本地访问这个网址

Http基本原理

Request

GET/HTTP/1.1
Start Line：请求方法，请求地址，请求协议
Host:www.baidu.com
User-Agent:Curl/7.43.0
Accept:/

Response

200 OK Start Line(状态码)
Headers

简单小程序

urllib，urllib2是独立的关系和模块
Requests库使用了urllib3(多次请求重复利用一个socket)

1. 使用urllib

import urllib
import urllib.request
import urllib.response

response = urllib.request.urlopen("http://httpbin.org/")
print(response.info())          # header
print(response.getheaders())    # 键值对形式的header
print(response.getcode())       # code
print(response.read().decode("utf-8"))  # 网页数据

2. 使用Requests

import requests

response = requests.get("http://httpbin.org/ip")
print(response.headers)     # header键值对形式
print(response.status_code) # 状态码
print(response.text)        # 网页数据
print(response.json())      # Json数据
print(type(response.json()))      # Json数据,字典类型

发送请求(Request)

请求方法

GET:查看资源
POST:增加一个资源
PUT:创建一个已知资源，对原有资源进行修改
PACTH:对已知资源进行局部更新(对put的补充)
DELETE:删除资源
HEAD:查看响应头
OPTIONS:查看可用请求方法

带参数的请求

requests.get(url,params={"key1":"value1"})
requests.post(url,data={"key1":"value1","key2":"value2"})
requests.post(url,json={"key1":"value1","key2":"value2"})

请求异常处理(exceptions包中的异常)

BaseHTTPError
...

自定义Requests

from requests import Request, Session
s = Session()   # 初始化一个Session
headers = {"User-Agent":"fake1.3.4"}    # 自定义头部
req = Request("GET",url,auth=(username,pwd),headers=headers)    # 定义一个请求
prepped = req.prepare() # 请求准备

response = s.send(prepped,timeout=5)   # 用Session发送，请求超时时间5秒

接收响应(Response)

Http状态码

1XX：消息
2XX：请求成功
3XX：重定向
4XX：客户端错误
5XX：服务器错误

属性

status_code：回应码
reason：回应状态(OK)
headers：头部
url：请求地址
elapsed：请求耗时
request：请求对象
encoding：编码信息
raw：原始对象
content：bytes类型内容
text：解码过了
json：获取json信息

下载图片/文件

headers = {"User-Agent":"浏览器信息"}
url = "网址"
response=requests.get(url, headers=headers, stream=True)
from contextlib import closing
with closing(requests.get(url,headers=headers,stream=True)) as response:
    # 打开文件
    with open("demo1.jpg","wb") as fd:
        # 每128字节写入一次
        for chunk in response.iter_content(128):
            fd.write(chunk)

事件钩子

import requests

def get_key_info(response,*args,**kwargs):
    """回调函数
    """
    print(response.headers["Content-Type"])

requests.get(url, hooks=dict(response=get_key_info))

进阶Cookie,Session

HTTP认证

requests.get(url, auth=(username,pwd))  # 基本认证AUTH

OAUTH认证

headers = {"Authorization":"token 具体的token"}
response = requests.get(url,headers = headers)
print(response.request.headers)



import requests

class GithubAuth(AuthBase):
    def __init__(self, token):
        self.token = token
    
    def __call__(self, r):
        r.headers["Authorization"] = " ".join(["token", self.token])
        return r

def auth_advanced():
    auth = GIthubAuth(token具体信息)
    response = requests.get(url,auth=auth)
    print(response.text)

oauth_advanced()

Proxy代理(中介)

启动代理服务Heroku
在主机1080端口启动Socket服务
将请求转发到1080端口
获取响应的资源
pip install "requests[socketv5]"
Requirement already satisfied(要求已经支持)
proxy={'http':'socks5://127.0.0.1:1080'}
result = requests.get(url, proxies=proxy, timeout=10)

Cookie,Session

Session是服务器端用于保留一些信息的机制
Cookie是浏览器端用于保留信息的一些机制

067 Python语法之Requests库

总体介绍

库的地址

学好Requests的意义

环境准备

Http基本原理

Request

Response

简单小程序

1. 使用urllib

2. 使用Requests

发送请求(Request)

请求方法

带参数的请求

请求异常处理(exceptions包中的异常)

自定义Requests

接收响应(Response)

Http状态码

属性

下载图片/文件

事件钩子

进阶Cookie,Session

HTTP认证

OAUTH认证

Proxy代理(中介)

Cookie,Session

你可能感兴趣的:(067 Python语法之Requests库)