Python 爬虫二 requests模块

requests模块

 

Requests模块

get方法请求

整体演示一下:

import requests

response  = requests.get("https://www.baidu.com")
print(type(response))
print(response.status_code)
print(type(response.text))
print(response.text)
print(response.cookies)
print(response.content)  # 二进制
print(response.content.decode("utf-8"))

我们可以看出response使用起来确实非常方便,这里有个问题需要注意一下:
很多情况下的网站如果直接response.text会出现乱码的问题,所以这个使用response.content
这样返回的数据格式其实是二进制格式,然后通过decode()转换为utf-8,这样就解决了通过response.text直接返回显示乱码的问题.

请求发出后,Requests 会基于 HTTP 头部对响应的编码作出有根据的推测。当你访问 response.text 之时,Requests 会使用其推测的文本编码。你可以找出 Requests 使用了什么编码,并且能够使用 response.encoding 属性来改变它.如:

import requests
response = requests.get(
    url='https://www.autohome.com.cn/news/'
)
response.encoding = response.apparent_encoding  # 使用默认的编码原则
print(response.text)

一个简单的get请求的爬虫结果:

Python 爬虫二 requests模块_第1张图片

如果我们想要在URL查询字符串传递数据,通常我们会通过httpbin.org/get?key=val方式传递。Requests模块允许使用params关键字传递参数,以一个字典来传递这些参数,格式如下:

# 实例方法一
import requests
response = requests.get("url?name=dandy&age=18")
print(response.text)


# 实例方法二
import requests
url = ''
data = {
    "name":"dandy",
    "age":18
}
response = requests.get(url,params=data)
print(response.url)
print(response.text)

上述两种的结果是相同的,通过params参数传递一个字典内容,从而直接构造url
注意:第二种方式通过字典的方式的时候,如果字典中的参数为None则不会添加到url上

 

获取二进制数据

在上面提到了response.content,这样获取的数据是二进制数据,同样的这个方法也可以用于下载图片以及
视频资源

 

添加headers
和前面我们将urllib模块的时候一样,我们同样可以定制headers的信息,如当我们直接通过requests请求知乎网站的时候,默认是无法访问的。谷歌浏览器里输入chrome://version,就可以看到用户代理,将用户代理添加到头部信息:

Python 爬虫二 requests模块_第2张图片

也可以随便输入一个网址:

Python 爬虫二 requests模块_第3张图片

都可以获取到。 

copy出来仿造的请求头信息

import requests
url = ''
headers = {

    "User-Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36"
}
response =requests.get(url,headers=headers)

print(response.text)

 

post请求

通过在发送post请求时添加一个data参数,这个data参数可以通过字典构造成,这样
对于发送post请求就非常方便

import requests
url = ''
data = {
    "name":"dandy",
    "age":18
}
response = requests.post(url,data=data)
print(response.text)

同样的在发送post请求的时候也可以和发送get请求一样通过headers参数传递一个字典类型的数据

 

模拟登陆&自动点赞

首先打开抽屉,点击登陆,打开开发者模式,随意的输入账号密码,然后点击登陆,可以的到如下的图:

Python 爬虫二 requests模块_第4张图片

备注:之前抽屉是不会去抓去请求头的终端设备信息的,现在有验证了,0.0

首先需要大佬们去注册一下账号密码,然后我们来模拟浏览器登陆,这里需要注意的一点是,登陆的时候可以注意一下,如果浏览器刷新了,那肯定是form验证,如果没有那就一定是ajax验证。所以这里不用说,测试一下就发现是ajax验证:

import requests
headers = {
    "User-Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36"
}  # header里伪造终端信息
post_dict = {
    'phone': '8615988888888',
    'password': '*******',
    'oneMonth': 1
}
response = requests.post(
    url='https://dig.chouti.com/login',
    data=post_dict,
    headers=headers
)
print(response.content)
cookie_dict = response.cookies.get_dict()
print(cookie_dict)

这里,既然是ajax登陆,返回的可以猜到一定是json数据:

b'{"result":{"code":"9999", "message":"", "data":{"complateReg":"0","destJid":"cdu_51970753537"}}}'
{'gpsd': '4fa04e9978e550f8d6ea1fb5418184ee', 'puid': 'c3c133fab0b2ba4bcb5e0f9b494501cd', 'JSESSIONID': 'aaahPA3kgUc2yhWM_9xsw'}

到现在为止,已经顺利的实现了登陆功能,然后实现了,大家应该都知道为什么要抓取一下cookies打印出来吧?

cookies的真正的意义就是在于当第一次登陆完,之后就可以直接带着服务器返回的cookies去向服务器发送请求。之后就可以肆意妄为了!!!

现在我们来实现一下自动点赞的功能,首先找一篇文章,点个赞:

由上,可以发现点赞的网址,post的数据等,此时取消点赞,写代码:

import requests
headers = {
    "User-Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36"
}
post_dict = {
    'phone': '8615962180289',
    'password': 'zhangy321281',
    'oneMonth': 1
}
response = requests.post(
    url='https://dig.chouti.com/login',
    data=post_dict,
    headers=headers
)
print(response.content)
cookie_dict = response.cookies.get_dict()
print(cookie_dict)

response_vote = requests.post(
    url='https://dig.chouti.com/link/vote?linksId=20819056',
    cookies=cookie_dict
)
print(response_vote)

信心满满写好了:

b'{"result":{"code":"9999", "message":"", "data":{"complateReg":"0","destJid":"cdu_51970753537"}}}'
{'gpsd': '74338b2cda9e9a355a52854b95474e3a', 'puid': '07fd1754895aefa93b4b46fb52990f7f', 'JSESSIONID': 'aaavRXk12M4Kidy5_9xsw'}

什么??怎么会这样??拿着浏览器返回的cookie怎么不可以呢?那该怎么办?

管不了那么多,先用笨方法来测试cookies里面哪一个控制这登陆状态认证:

取消点赞,重新测试代码点赞:

import requests
headers = {
    "User-Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36"
}
response = requests.post(
    url='https://dig.chouti.com/link/vote?linksId=20819056',
    cookies={
        'gpsd': '5db40ff97b8dd603f70288157d2bdd8f'  # 因为没办法,所以只能用浏览器的cookies做验证,一次次取一个值
    },
    headers=headers
)
print(response.text)

测试结果:

{"result":{"code":"9999", "message":"推荐成功", "data":{"jid":"cdu_51970753537","likedTime":"1531564084343000","lvCount":"10","nick":"衰Zzz","uvCount":"1","voteTime":"小于1分钟前"}}}

天呐!!!成功了!!!

Python 爬虫二 requests模块_第5张图片

 

所以我们可以先跟踪确定了,肯定是gpsd有问题。

这时候退出登陆重新刷新网页:

记录一下cookies:

cookie: gpsd=5db40ff97b8dd603f70288157d2bdd8f; gpid=d4a8c7f4454841bd8bd640f5f5565420; JSESSIONID=aaaKh89UOAJIy_PfW8xsw

不免有些疑问,为什么第一次get就有cookies,

此时我们再用代码测试一下gpsd的值:

import requests
headers = {
    "User-Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36"
}
response_get = requests.get(
    url='https://dig.chouti.com/',
    headers=headers
)
print(response_get.cookies.get_dict())
post_dict = {
    'phone': '8615988888888',
    'password': '********',
    'oneMonth': 1
}
response_post = requests.post(
    url='https://dig.chouti.com/login',
    data=post_dict,
    headers=headers
)
print(response_post.content)
cookie_dict = response_post.cookies.get_dict()
print(cookie_dict)

查看cookies对比:

{'gpsd': '38644be424cebb27e1cc631dd84ae9d2', 'JSESSIONID': 'aaaKwMAE52emjedIW-xsw'}
b'{"result":{"code":"9999", "message":"", "data":{"complateReg":"0","destJid":"cdu_51970753537"}}}'
{'gpsd': '7b32421f6a73365b2dbb6b9739afaaff', 'puid': '497b5a7249b8538e70ac87ead562c91f', 'JSESSIONID': 'aaa5sbGP7XecWf15W8xsw'}

发现两次的gpsd不一致,从web开发者角度登陆之后的cookies一定是不会再去进行改变,所以前后一共就有这么两种可能性的cookies,上面的点赞失败了,那就代表返回的cookies一定是没有用的:

那是不是可以猜想认证的gpsd会不会是第一次的gpsd值,但是一想又不太可能,因为第一次的还没有认证,怎么能保证呢?那会不会是第一次的gpsd再登陆的时候传过去做了认证,然后浏览器记录了它,但是为了防止爬虫做了一份假的gpsd给你 ,想到这里不免想测试一下:

import requests
headers = {
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36"
}
response_get = requests.get(
    url='https://dig.chouti.com/',
    headers=headers
)
r1 = response_get.cookies.get_dict()  # 第一次get请求获取服务器给的cookies
post_dict = {
    'phone': '8615988888888',
    'password': '********',
    'oneMonth': 1,
}
response_post = requests.post(
    url='https://dig.chouti.com/login',
    data=post_dict,
    headers=headers,
    cookies=r1  # 第二次再把第一次得到的cookies传回去进行认证授权
)
print(response_post.content)
r2 = response_post.cookies.get_dict()  # 这个是个骗子!!!

response_vote = requests.post(
    url='https://dig.chouti.com/link/vote?linksId=20819056',
    cookies={
        'gpsd': r1['gpsd']
    },
    headers=headers
)
print(response_vote.text)

测试结果:

b'{"result":{"code":"9999", "message":"", "data":{"complateReg":"0","destJid":"cdu_51970753537"}}}'
{"result":{"code":"9999", "message":"推荐成功", "data":{"jid":"cdu_51970753537","likedTime":"1531565602774000","lvCount":"16","nick":"衰Zzz","uvCount":"1","voteTime":"小于1分钟前"}}} 

Python 爬虫二 requests模块_第6张图片

完成!!

爬虫登陆GitHub实战:https://www.cnblogs.com/wuzdandz/p/9338543.html

请求

前面已经讲过基本的GET请求,下面稍微详谈一下带参数的请求:

import requests
# 方法一
response = requests.get('http://****.com/?name=dandy&age=18')
print(response.text)

# 方法二
import requests
data = {
    "name":"dandy",
    "age":18
}
response = requests.get("http://*****.com",params=data)
print(response.url)  # 提交url
print(response.text)
本质上方法二会被转换成方法一
    请求头:
        content-type:application/url-form-encod......
    请求体:
        user=dandy&age=18
    局限性在于传递的value只能是字符串,数字,列表,不能是字典,

上述两种的结果是相同的,通过params参数传递一个字典内容,从而直接构造url。
注意:第二种方式通过字典的方式的时候,如果字典中的参数为None则不会添加到url上

 

json

import requests
import json
url = ''
response = requests.get(url)
print(type(response.text))
print(response.json())
print(json.loads(response.text))
print(type(response.json()))

从结果可以看出requests里面集成的json其实就是执行了json.loads()方法,两者的结果是一样的

data = {'user': 'dandy', 'age': 18}    ==>    json数据 "{'user': 'dandy', 'age': 18}"
    请求头:
        content-type:application/json....
    请求体:
        user=dandy&age=18
    可以传递字典嵌套的字典

获取二进制数据

在上面提到了response.content,这样获取的数据是二进制数据,同样的这个方法也可以用于下载图片以及
视频资源

添加headers
和前面我们将urllib模块的时候一样,我们同样可以定制headers的信息,如当我们直接通过requests请求知乎网站的时候,默认是无法访问的

之前的实例抽屉自动登陆,就定制了请求头headers

Referer:
    requests.request(
        method="POST",
        url = url1,
        params={'k1': v1, 'k2': 'v2'},
        json = {'user': 'dandy', 'age': 18}
        headers={
            "Referer": url/login,  # 判断上一次请求的网站是不是也是本网站,不是的话默认为非正常访问
            "User-Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36"
        }
    )

 

cookie

import requests

response = requests.get("http://www.baidu.com")
print(response.cookies)

for k, v in response.cookies.items():
    print(k + "=" + v)

Cookie放在请求头里面发送的

 

POST请求

通过在发送post请求时添加一个data参数,这个data参数可以通过字典构造成,这样
对于发送post请求就非常方便

import requests

data = {
    "name":"dandy",
    "age":18
}
response = requests.post("http://*****.com",data=data)
print(response.text)

同样的在发送post请求的时候也可以和发送get请求一样通过headers参数传递一个字典类型的数据

*************************************************************************************
在Django内部,如果是以post传递过去数据data = {'user': 'dandy', 'age': 18};
请求头:
content-type:application/url-form-encod......
请求体:
user=dandy&age=18
根据请求头的不同决定是否请求体里面的data转换并放到request.POST里面
*************************************************************************************

 

响应

我们可以通过response获得很多属性,例子如下

import requests

response = requests.get("http://www.baidu.com")
print(type(response.status_code),response.status_code)
print(type(response.headers),response.headers)
print(type(response.cookies),response.cookies)
print(type(response.url),response.url)
print(type(response.history),response.history)

状态码判断
Requests还附带了一个内置的状态码查询对象
主要有如下内容:

100: ('continue',),
101: ('switching_protocols',),
102: ('processing',),
103: ('checkpoint',),
122: ('uri_too_long', 'request_uri_too_long'),
200: ('ok', 'okay', 'all_ok', 'all_okay', 'all_good', '\o/', '✓'),
201: ('created',),
202: ('accepted',),
203: ('non_authoritative_info', 'non_authoritative_information'),
204: ('no_content',),
205: ('reset_content', 'reset'),
206: ('partial_content', 'partial'),
207: ('multi_status', 'multiple_status', 'multi_stati', 'multiple_stati'),
208: ('already_reported',),
226: ('im_used',),

Redirection.
300: ('multiple_choices',),
301: ('moved_permanently', 'moved', '\o-'),
302: ('found',),
303: ('see_other', 'other'),
304: ('not_modified',),
305: ('use_proxy',),
306: ('switch_proxy',),
307: ('temporary_redirect', 'temporary_moved', 'temporary'),
308: ('permanent_redirect',
'resume_incomplete', 'resume',), # These 2 to be removed in 3.0

Client Error.
400: ('bad_request', 'bad'),
401: ('unauthorized',),
402: ('payment_required', 'payment'),
403: ('forbidden',),
404: ('not_found', '-o-'),
405: ('method_not_allowed', 'not_allowed'),
406: ('not_acceptable',),
407: ('proxy_authentication_required', 'proxy_auth', 'proxy_authentication'),
408: ('request_timeout', 'timeout'),
409: ('conflict',),
410: ('gone',),
411: ('length_required',),
412: ('precondition_failed', 'precondition'),
413: ('request_entity_too_large',),
414: ('request_uri_too_large',),
415: ('unsupported_media_type', 'unsupported_media', 'media_type'),
416: ('requested_range_not_satisfiable', 'requested_range', 'range_not_satisfiable'),
417: ('expectation_failed',),
418: ('im_a_teapot', 'teapot', 'i_am_a_teapot'),
421: ('misdirected_request',),
422: ('unprocessable_entity', 'unprocessable'),
423: ('locked',),
424: ('failed_dependency', 'dependency'),
425: ('unordered_collection', 'unordered'),
426: ('upgrade_required', 'upgrade'),
428: ('precondition_required', 'precondition'),
429: ('too_many_requests', 'too_many'),
431: ('header_fields_too_large', 'fields_too_large'),
444: ('no_response', 'none'),
449: ('retry_with', 'retry'),
450: ('blocked_by_windows_parental_controls', 'parental_controls'),
451: ('unavailable_for_legal_reasons', 'legal_reasons'),
499: ('client_closed_request',),

Server Error.
500: ('internal_server_error', 'server_error', '/o\', '✗'),
501: ('not_implemented',),
502: ('bad_gateway',),
503: ('service_unavailable', 'unavailable'),
504: ('gateway_timeout',),
505: ('http_version_not_supported', 'http_version'),
506: ('variant_also_negotiates',),
507: ('insufficient_storage',),
509: ('bandwidth_limit_exceeded', 'bandwidth'),
510: ('not_extended',),
511: ('network_authentication_required', 'network_auth', 'network_authentication'),

通过下面例子测试:(不过通常还是通过状态码判断更方便) 

import requests

response= requests.get("http://www.baidu.com")
if response.status_code == requests.codes.ok:
    print("访问成功")

 

requests高级用法

文件上传

实现方法和其他参数类似,也是构造一个字典然后通过files参数传递

requests.post(
    url='xxx',
    files={
        'f1': open('a.csv', 'rb'),  # 上传文件对象,默认名称为文件名称
        'f2': (filename, open('b.csv', 'rb'))  # 自定义文件名
        }
)

 

证书认证

现在的很多网站都是https的方式访问,所以这个时候就涉及到证书的问题

import requests
response = requests.get("https://www.12306.cn")
print(response.status_code)

默认的12306网站的证书是不合法的,这样就会提示如下错误

Python 爬虫二 requests模块_第7张图片

为了避免这种情况的发生可以通过verify=False,这样是可以访问到页面的:

import requests
response = requests.get("https://www.12306.cn", verify=False)
print(response.status_code)

结果如下:

certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
  InsecureRequestWarning)
200

解决方法:

import requests
import urllib3
urllib3.disable_warnings()
response = requests.get("https://www.12306.cn", verify=False)
print(response.status_code)

这样就不会提示警告信息,当然也可以通过cert参数放入证书路径

关于证书有两种:

verify:证书    https:    ==>    ssl加密
    requests.get(
        url='https://...',
        cert='abc.pem',  # 自己制作的证书    pem证书格式
    )

    requests.get(  # 厂商制作好的,在系统创建时就已经植入,直接购买权限
        url='https://...',
        cert=('abc.crt', 'xxx.key'),

                    
    )
    verify:False  忽略证书,直接交互

代理设置

import requests

proxies= {
    "http":"http://127.0.0.1:9999",
    "https":"http://127.0.0.1:8888"
}
response  = requests.get("https://www.baidu.com",proxies=proxies)
print(response.text)

如果代理需要设置账户名和密码,只需要将字典更改为如下:
proxies = {
  "http":"http://user:[email protected]:9999"
}
如果你的代理是通过sokces这种方式则需要pip install "requests[socks]"
proxies= {
  "http":"socks5://127.0.0.1:9999",
  "https":"sockes5://127.0.0.1:8888"
}

请求不是发送到目的URL的,而是先发送给代理,代理再去发送请求

 

超时设置

通过timeout参数可以设置超时的时间,等服务器多长时间放弃

(a,b)    ==>    a 发送最长时间;b 等待最长时间

 

 

认证设置

如果碰到需要认证的网站可以通过requests.auth模块实现

import requests

from requests.auth import HTTPBasicAuth

response = requests.get("http://120.27.34.24:9001/",auth=HTTPBasicAuth("user","123"))
print(response.status_code)

另一种方式:

import requests

response = requests.get("http://120.27.34.24:9001/",auth=("user","123"))
print(response.status_code)

基本登陆框 md5加密请求头发送过去, 用户名&密码

 

 

重定向设置

allow-redirects:是否允许重定向到新的地址拿数据

 

分流迭代设置

流,如果为false,会一次性下载,如果为true,会一点一点的下载,迭代拿 

 

session设置(持久化)

还记得前面大费周章的去把cookies值装进headers,重新认证么?现在用session来改写一下!!!

import requests
session = requests.Session()

headers = {
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36"
}
# 1、首先登陆任何页面,获取cookie
r1 = session.get(
    url='https://dig.chouti.com/',
    headers=headers
)
# 2、用户登陆,携带上一次的cookie,后台对cookie中的gpsd进行授权
post_dict = {
    'phone': '8615988888888',
    'password': 'zhang1111111',
    'oneMonth': 1,
}
r2 = session.post(
    url='https://dig.chouti.com/login',
    data=post_dict,
    headers=headers
)
# 点赞
r3 = session.post(
    url='https://dig.chouti.com/link/vote?linksId=20819051',
    headers=headers
)
print(r3.text)

结果输出:

{"result":{"code":"9999", "message":"推荐成功", "data":{"jid":"cdu_51970753537","likedTime":"1531744254481000","lvCount":"5","nick":"衰Zzz","uvCount":"2","voteTime":"小于1分钟前"}}}

直接从开始就持久化。。问题迎刃而解。

 

异常处理 

http://www.python-requests.org/en/master/api/#exceptions

Exceptions

exception  requests. RequestException (*args**kwargs)[source]

There was an ambiguous exception that occurred while handling your request.

exception  requests. ConnectionError (*args**kwargs)[source]

A Connection error occurred.

exception  requests. HTTPError (*args**kwargs)[source]

An HTTP error occurred.

exception  requests. URLRequired (*args**kwargs)[source]

A valid URL is required to make a request.

exception  requests. TooManyRedirects (*args**kwargs)[source]

Too many redirects.

exception  requests. ConnectTimeout (*args**kwargs)[source]

The request timed out while trying to connect to the remote server.

Requests that produced this error are safe to retry.

exception  requests. ReadTimeout (*args**kwargs)[source]

The server did not send any data in the allotted amount of time.

exception  requests. Timeout (*args**kwargs)[source]

The request timed out.

Catching this error will catch both ConnectTimeout and ReadTimeout errors.

 

源代码、详细异常关系:http://cn.python-requests.org/zh_CN/latest/_modules/requests/exceptions.html#RequestException 

从源码我们可以看出RequestException继承IOError,
HTTPError,ConnectionError,Timeout继承RequestionException
ProxyError,SSLError继承ConnectionError
ReadTimeout继承Timeout异常

简单sample:

import requests

from requests.exceptions import ReadTimeout,ConnectionError,RequestException


try:
    response = requests.get("http://httpbin.org/get",timout=0.1)
    print(response.status_code)
except ReadTimeout:
    print("timeout")
except ConnectionError:
    print("connection Error")
except RequestException:
    print("error")

其实最后测试可以发现,首先被捕捉的异常是timeout,当把网络断掉就会捕捉到ConnectionError,如果前面异常都没有捕捉到,最后也可以通过RequestExctption捕捉到

 

参照:zhaofan老师的blog http://www.cnblogs.com/zhaof/p/6915127.html

参照源码:

def request(method, url, **kwargs):
    """Constructs and sends a :class:`Request `.

    :param method: method for the new :class:`Request` object.
    :param url: URL for the new :class:`Request` object.
    :param params: (optional) Dictionary or bytes to be sent in the query string for the :class:`Request`.
    :param data: (optional) Dictionary or list of tuples ``[(key, value)]`` (will be form-encoded), bytes, or file-like object to send in the body of the :class:`Request`.
    :param json: (optional) A JSON serializable Python object to send in the body of the :class:`Request`.
    :param headers: (optional) Dictionary of HTTP Headers to send with the :class:`Request`.
    :param cookies: (optional) Dict or CookieJar object to send with the :class:`Request`.
    :param files: (optional) Dictionary of ``'name': file-like-objects`` (or ``{'name': file-tuple}``) for multipart encoding upload.
        ``file-tuple`` can be a 2-tuple ``('filename', fileobj)``, 3-tuple ``('filename', fileobj, 'content_type')``
        or a 4-tuple ``('filename', fileobj, 'content_type', custom_headers)``, where ``'content-type'`` is a string
        defining the content type of the given file and ``custom_headers`` a dict-like object containing additional headers
        to add for the file.
    :param auth: (optional) Auth tuple to enable Basic/Digest/Custom HTTP Auth.
    :param timeout: (optional) How many seconds to wait for the server to send data
        before giving up, as a float, or a :ref:`(connect timeout, read
        timeout) ` tuple.
    :type timeout: float or tuple
    :param allow_redirects: (optional) Boolean. Enable/disable GET/OPTIONS/POST/PUT/PATCH/DELETE/HEAD redirection. Defaults to ``True``.
    :type allow_redirects: bool
    :param proxies: (optional) Dictionary mapping protocol to the URL of the proxy.
    :param verify: (optional) Either a boolean, in which case it controls whether we verify
            the server's TLS certificate, or a string, in which case it must be a path
            to a CA bundle to use. Defaults to ``True``.
    :param stream: (optional) if ``False``, the response content will be immediately downloaded.
    :param cert: (optional) if String, path to ssl client cert file (.pem). If Tuple, ('cert', 'key') pair.
    :return: :class:`Response ` object
    :rtype: requests.Response

    Usage::

      >>> import requests
      >>> req = requests.request('GET', 'http://httpbin.org/get')
      
    """

    # By using the 'with' statement we are sure the session is closed, thus we
    # avoid leaving sockets open which can trigger a ResourceWarning in some
    # cases, and look like a memory leak in others.
    with sessions.Session() as session:
        return session.request(method=method, url=url, **kwargs)
View Code

 

转载于:https://www.cnblogs.com/wuzdandz/p/9310661.html

你可能感兴趣的:(Python 爬虫二 requests模块)