>>> import requests
>>> r = requests.get('https://api.github.com/events') # GET
>>> r = requests.post('https://httpbin.org/post', data={'key': 'value'}) # POST
>>> r = requests.put('https://httpbin.org/put', data={'key': 'value'}) # PUT
>>> r = requests.delete('https://httpbin.org/delete') # DELETE
>>> r = requests.head('https://httpbin.org/get') # HEAD
>>> r = requests.options('https://httpbin.org/get') # OPTIONS
params
字典参数为URL提供查询字符串参数,例如,访问 https://httpbin.org/get?key1=value1&key2=value2
,可使用以下代码:>>> import requests
>>> payload = {'key1': 'value1', 'key2': 'value2', 'key3':'', 'key4':None}
>>> r = requests.get('https://httpbin.org/get', params=payload)
>>> r.url
https://httpbin.org/get?key2=value2&key1=value1&key3=
None
),则该参数不会添加到URL的查询字符串中。>>> import requests
>>> payload = {'key1': 'value1', 'key2': ['value2', 'value3']}
>>> r = requests.get('https://httpbin.org/get', params=payload)
>>> r.url
https://httpbin.org/get?key1=value1&key2=value2&key2=value3
>>> import requests
>>> r = requests.get('https://api.github.com/events')
>>> r.text
[{"id":"27579847062","type":"PushEvent","actor":{"...
r.text
时,将使用requests猜测的文本编码。可以使用r.encoding
属性查找请求使用的编码,并对其进行更改:>>> r.encoding # 输出:utf-8
r.encoding = 'ISO-8859-1'
r.text
时,requests都将使用新的r.encoding
的值。在任何情况下,你都可以应用特殊逻辑来确定内容的编码。例如,HTML和XML可以在其正文中指定其编码。在这种情况下,你应该使用r.content
查找编码,然后设置r.encoding
。这将允许你使用具有正确编码的r.text
。codecs
模块,则可以简单地使用codec名称作为r.encoding
的值,而requests将为你处理解码。>>> r.content
b'[{"id":"27581220674","type":"IssueCommentEvent","actor":{"id":327807...
gzip
和deflate
传输编码。br
传输编码br
传输编码或[brotliffi]已安装。from PIL import Image
from io import BytesIO
img = Image.open(BytesIO(r.content))
>>> import requests
>>> r = requests.get('https://api.github.com/events')
>>> r.json() # JSON
[{'id': '27609416600', 'type': 'PushEvent', ...
r.json()
将抛出异常。例如,如果响应得到一个204(无内容),或者如果响应包含无效的JSON,则r.json()
会抛出requests.exceptions.JSONDecodeError
。此封装的异常可能会因为不同python版本和JSON序列化库可能引发的多个异常提供互操作性。r.json()
的成功调用并不表示响应的成功。一些服务器可能会在失败的响应中返回JSON对象(例如,HTTP 500的错误详细信息)。这样的JSON将被解码并返回。要检查请求是否成功,请使用r.raise_for_status()
或检查r.status_code
r.raw
访问服务器返回的原始socket响应。如果希望这样做,确保在初始请求中设置 stream=True
:>>> import requests
>>> r = requests.get('https://api.github.com/events', stream=True)
>>> r.raw
>>> r.raw.read(10)
b'\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x03'
with open(filename, 'wb') as fd:
for chunk in r.iter_content(chunk_size=128):
fd.write(chunk)
Response.iter_content
将处理很多你在直接使用Resort.raw
时需要处理的事情。当流式传输下载时,以上是检索内容的首选和推荐方法。请注意,chunk_size
可以自由调整为更适合你使用场景的数字。Response.iter_content
与Response.raw
的重要注意事项。 Response.iter_content
将自动解码gzip
和deflate
传输编码。Response.raw
是一个原始字节流–它不会转换响应内容。如果确实需要访问返回的字节,请使用Response.raw
。headers
参数传递一个dict
即可,例如:>>> url = 'https://api.github.com/some/endpoint'
>>> headers = {'user-agent': 'my-app/0.0.1'}
>>> r = requests.get(url, headers=headers)
.netrc
中指定了凭据,则使用headers=
设置的Authorization
请求头将被覆盖,而凭据又将被auth=
参数覆盖。请求将在~/.netrc
、~/_netrc
或NETRC
环境变量指定的路径处中搜索netrc文件。Authorization
请求头。Proxy-Authorization
请求头将被URL中提供的代理凭据覆盖。Content-Length
请求头。data
参数即可。发送请求时,将自动对字典数据进行表单编码:>>> import requests
>>> payload = {'key1': 'value1', 'key2': 'value2'}
>>> r = requests.post("https://httpbin.org/post", data=payload)
>>> r.text
{
"args": {},
"data": "",
"files": {},
"form": {
"key1": "value1",
"key2": "value2"
},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Content-Length": "23",
"Content-Type": "application/x-www-form-urlencoded",
"Host": "httpbin.org",
"User-Agent": "python-requests/2.27.1",
"X-Amzn-Trace-Id": "Root=1-6409fe3b-0cb4118319f09ab3187402bc"
},
"json": null,
"origin": "183.62.127.25",
"url": "https://httpbin.org/post"
}
data
参数中,为每个键可以具有多个值。这可以通过将data
设置为元组列表或以列表为值的字典来实现。当表单中有多个元素使用相同的键时,这特别有用:>>> import requests
>>> payload_tuples = [('key1', 'value1'), ('key1', 'value2')]
>>> r1 = requests.post('https://httpbin.org/post', data=payload_tuples)
>>> payload_dict = {'key1': ['value1', 'value2']}
>>> r2 = requests.post('https://httpbin.org/post', data=payload_dict)
>>> r1.text
{
"args": {},
"data": "",
"files": {},
"form": {
"key1": [
"value1",
"value2"
]
},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Content-Length": "23",
"Content-Type": "application/x-www-form-urlencoded",
"Host": "httpbin.org",
"User-Agent": "python-requests/2.27.1",
"X-Amzn-Trace-Id": "Root=1-6409ff49-11b8232a7cc81fc0290ec4c4"
},
"json": null,
"origin": "183.62.127.25",
"url": "https://httpbin.org/post"
}
>>> re.text == r2.text
True
string
类型的数据,而不是dict
,string
数据将被直接提交。>>> import requests
>>> import json
>>> url = 'https://api.github.com/some/endpoint'
>>> payload = {'some': 'data'}
>>> r = requests.post(url, data=json.dumps(payload))
Content-Type
请求头(特别是不会将其设置为application/json
)。如果需要设置那个请求头('Content-Type': 'application/json
,发送json请求体),并且不想自己对dict
进行编码,你也可以直接使用json
参数传递它,它将自动被编码:>>> url = 'https://api.github.com/some/endpoint'
>>> payload = {'some': 'data'}
>>> r = requests.post(url, json=payload)
data
,或者file
参数,json
参数将被自动忽略。>>> import requests
>>> url = 'https://httpbin.org/post'
>>> files = {'file': open('report.xls', 'rb')}
>>> r = requests.post(url, files=files)
>>> r.text
{
"args": {},
"data": "",
"files": {
"file": "#!/usr/bin/env python\r\n# -*- coding:utf-8 -*-\r\n\r\n#!/usr/bin/env python\r\n# -*- coding:utf-8 -*-\r\n\r\nfrom multiprocessing import Pool\r\nfrom threading import Thread\r\nfrom concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor..."
},
"form": {},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Content-Length": "3035",
"Content-Type": "multipart/form-data; boundary=9ef4437cb1e14427fcba1c42943509cb",
"Host": "httpbin.org",
"User-Agent": "python-requests/2.27.1",
"X-Amzn-Trace-Id": "Root=1-640a03df-1a0a5ce972ce410378cda7a2"
},
"json": null,
"origin": "183.62.127.25",
"url": "https://httpbin.org/post"
}
>>> url = 'https://httpbin.org/post'
files = {'file': ('report.xls', open('report.xls', 'rb'), 'application/vnd.ms-excel', {'Expires': '0'})}
>>> r = requests.post(url, files=files)
>>> r.text
{
"args": {},
"data": "",
"files": {
"file": "data:application/vnd.ms-excel;base64,UEsDBBQAAAAAAHy8iFMAAAAAAA...=="
},
"form": {},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Content-Length": "9667",
"Content-Type": "multipart/form-data; boundary=ff85e1018eb5232f7dcab2b2bc5ffa50",
"Host": "httpbin.org",
"User-Agent": "python-requests/2.27.1",
"X-Amzn-Trace-Id": "Root=1-640def51-43cc213e33437a0e60255add"
},
"json": null,
"origin": "183.62.127.25",
"url": "https://httpbin.org/post"
}
>>> url = 'https://httpbin.org/post'
>>> files = {'file': ('report.csv', 'some,data,to,send\nanother,row,to,send\n')}
>>> r = requests.post(url, files=files)
>>> r.text
{
"args": {},
"data": "",
"files": {
"file": "some,data,to,send\nanother,row,to,send\n"
},
"form": {},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Content-Length": "184",
"Content-Type": "multipart/form-data; boundary=2bfe430e025860528e29c893a09f1198",
"Host": "httpbin.org",
"User-Agent": "python-requests/2.27.1",
"X-Amzn-Trace-Id": "Root=1-640df132-247947ca699e9da35c588f2d"
},
"json": null,
"origin": "183.62.127.25",
"url": "https://httpbin.org/post"
}
multipart/form-data
请求提交,你可能需要流式传输该请求。默认情况下,requests
不支持此功能,但有一个单独的包支持此功能——requests toolbelt
。Content-Length
请求头,如果这样做,该请求头值将被设置为文件中的_字节数_。如果以_文本模式_打开文件,可能会发生错误。>>> import requests
>>> r = requests.get('https://httpbin.org/get')
>>> r.status_code
200
requests
还附带一个内置的状态代码查找对象:>>> r = requests.get('https://httpbin.org/get')
>>> r.status_code == requests.codes.ok
True
response.raise_for_status()
]抛出错误:>>> import requests
>>> bad_r = requests.get('https://httpbin.org/status/404')
>>> bad_r.status_code
404
>>> bad_r.raise_for_status()
Traceback (most recent call last):
File "D:/codePojects/test.py", line 12, in
bad_r.raise_for_status()
File "D:\Program Files (x86)\python36\lib\site-packages\requests\models.py", line 960, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: NOT FOUND for url: https://httpbin.org/status/404
r.status_code
为200
, raise_for_status()
将返回None
>>> r.raise_for_status()
None
>>> r.headers
{
'content-encoding': 'gzip',
'transfer-encoding': 'chunked',
'connection': 'close',
'server': 'nginx/1.0.4',
'x-runtime': '148ms',
'etag': '"e1ca502697e5c9317743dc078f67693f"',
'content-type': 'application/json'
}
>>> r.headers['Content-Type']
'application/json'
>>> r.headers.get('content-type')
'application/json'
>>> url = 'http://example.com/some/cookie/setting/url'
>>> r = requests.get(url)
>>> r.cookies['example_cookie_name'] # 如果存在名为 example_cookie_name的cookie的话
'example_cookie_value'
cookies
参数将cookie发送给服务器:>>> url = 'https://httpbin.org/cookies'
>>> cookies = dict(cookies_are='working')
>>> r = requests.get(url, cookies=cookies)
>>> r.text
'{\n "cookies": {\n "cookies_are": "working"\n }\n}\n'
RequestsCookieJar
], which acts like a dict
but also offers a more complete interface, suitable for use over multiple domains or paths. Cookie jars can also be passed in to requests:RequestsCookieJar
]中,其作用类似于dict
,同时提供了一个更完整的接口,适合在多个域或路径上使用。Cookie jar也可以传递给请求:>>> jar = requests.cookies.RequestsCookieJar()
>>> jar.set('tasty_cookie', 'yum', domain='httpbin.org', path='/cookies')
Cookie(version=0, name='tasty_cookie', value='yum', port=None, port_specified=False, domain='httpbin.org', domain_specified=True, domain_initial_dot=False, path='/cookies', path_specified=True, secure=False, expires=None, discard=True, comment=None, comment_url=None, rest={'HttpOnly': None}, rfc2109=False)
>>> jar.set('gross_cookie', 'blech', domain='httpbin.org', path='/elsewhere')
Cookie(version=0, name='gross_cookie', value='blech', port=None, port_specified=False, domain='httpbin.org', domain_specified=True, domain_initial_dot=False, path='/elsewhere', path_specified=True, secure=False, expires=None, discard=True, comment=None, comment_url=None, rest={'HttpOnly': None}, rfc2109=False)
>>> url = 'https://httpbin.org/cookies'
>>> r = requests.get(url, cookies=jar)
>>> r.text
'{"cookies": {"tasty_cookie": "yum"}}'
requests
将对除HEAD
之外的所有请求执行位置重定向(如果需要重定向的话)。history
属性来跟踪重定向。Response.history
]列表包含为完成请求而创建的[Response
]对象。列表按响应的先后顺序排序。>>> r = requests.get('http://gitee.com/')
>>> r.url
'https://gitee.com/'
>>> r.status_code
200
>>> r.history
[]
OPTIONS
, POST
, PUT
, PATCH
或者DELETE
,可以使用 allow_redirects
参数禁止重定向:>>> r = requests.get('http://gitee.com/', allow_redirects=False)
>>> r.status_code
302
>>> r.history
[]
>>> r = requests.head('http://gitee.com/', allow_redirects=False)
>>> r.url
'http://gitee.com/'
>>> r.status_code
302
>>> r.history
[]
>>> r = requests.head('http://gitee.com/', allow_redirects=True)
>>> r.status_code
200
>>> r.url
'https://gitee.com/'
>>> r.history
[]
timeout
参数告诉requests在给定的秒数后停止等待响应。几乎所有的生产代码都应该在几乎所有的请求中使用此参数。否则会导致程序无限期挂起:>>> requests.get('https://gitee.com/', timeout=0.1)
Traceback (most recent call last):
File "", line 1, in
...
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='gitee.com', port=443): Read timed out. (read timeout=0.1)
timeout
不是整个响应的下载时间限制;相反,如果服务器在timeout
秒内没有发出响应(更准确地说,如果在timeout
秒内底层socket没有接收到任何字节数据),则会引发异常。如果未明确指定timeout
,则请求不会超时。ConnectionError
]异常。Response.raise_for_statu()
]将抛出[HTTPError
]Timeout
]异常。TooManyRedirects
]异常。requests.exceptions.RequestException
]urllib3
的[连接池]([HTTP持久连接]>>> s = requests.Session()
>>> s.get('https://httpbin.org/cookies/set/sessioncookie/123456789')
>>> r = s.get('https://httpbin.org/cookies')
>>> r.text
'{\n "cookies": {\n "sessioncookie": "123456789"\n }\n}\n'
>>>
>>> s = requests.Session()
>>> s.auth = ('user', 'pass')
>>> s.headers.update({'x-test': 'true'})
# 'x-test'和'x-test2'请求头随请求发送了
>>> s.headers.update({'x-test': 'true'})
>>> s.get('https://httpbin.org/headers', headers={'x-test2': 'true'})
>>> s = requests.Session()
>>> r = s.get('https://httpbin.org/cookies', cookies={'from-my': 'browser'})
>>> r.text
'{\n "cookies": {\n "from-my": "browser"\n }\n}\n'
>>> r = s.get('https://httpbin.org/cookies')
>>> r.text
'{\n "cookies": {}\n}\n'
Session.cookies
]Session.cookies
]>>> with requests.Session() as s:
... s.get('https://httpbin.org/cookies/set/sessioncookie/123456789')
...
>>>
with
块后立即关闭会话,即使发生未处理的异常。Remove a Value From a Dict Parameter
Sometimes you’ll want to omit session-level keys from a dict parameter. To do this, you simply set that key’s value to `None` in the method-level parameter. It will automatically be omitted.
从字典参数中删除值
有时,你需要从dict参数中忽略会话级别的键。为此,只需在方法级参数中将该键的值设置为“None”即可。它将被自动忽略。
>>> r = s.get('https://httpbin.org')
>>> r.headers # 获取响应头
{'Date': 'Mon, 13 Mar 2023 15:43:41 GMT', 'Content-Type': 'text/html; charset=utf-8', 'Content-Length': '9593', 'Connection': 'keep-alive', 'Server': 'gunicorn/19.9.0', 'Access-Control-Allow-Origin': '*', 'Access-Control-Allow-Credentials': 'true'}
>>> r.request.headers
{'User-Agent': 'python-requests/2.27.1', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive', 'Cookie': 'sessioncookie=123456789'}
>>>
Response
]对象,request
属性实际上是所使用的PreparedRequest
。在某些情况下,你可能希望在发送请求之前对请求体或请求头(或其他任何内容)做一些额外的工作。简单的做法如下:from requests import Request, Session
s = Session()
req = Request('POST', url, data=data, headers=headers)
prepped = req.prepare()
# do something with prepped.body
prepped.body = 'No, I want exactly this as the body.'
# do something with prepped.headers
del prepped.headers['Content-Type']
resp = s.send(prepped,
stream=stream,
verify=verify,
proxies=proxies,
cert=cert,
timeout=timeout
)
print(resp.status_code)
Session
] object. In particular, [Session
]-level state such as cookies will not get applied to your request. To get a [PreparedRequest
]with that state applied, replace the call to [Request.prepare()
]with a call to [Session.prepare_request()
], like this:Request
对象执行任何特殊操作,因此您可以立即prepare它并修改PreparedRequest
对象。然后将其与发送给requests.*
或Session.*
的其它参数一起发送。Session
]对象的一些优点。特别是[Session
]级别的状态,比如cookie将不会应用于你的请求。如果需要获取应用了那些状态的 [PreparedRequest
],替换 [Request.prepare()
]调用为[Session.prepare_request()
],像这样:from requests import Request, Session
s = Session()
req = Request('GET', url, data=data, headers=headers)
prepped = s.prepare_request(req)
# do something with prepped.body
prepped.body = 'Seriously, send exactly these bytes.'
# do something with prepped.headers
prepped.headers['Keep-Dead'] = 'parrot'
resp = s.send(prepped,
stream=stream,
verify=verify,
proxies=proxies,
cert=cert,
timeout=timeout
)
print(resp.status_code)
REQUESTS_CA_BUNDLE
will not be taken into account. As a result an SSL: CERTIFICATE_VERIFY_FAILED
is thrown. You can get around this behaviour by explicitly merging the environment settings into your session:REQUESTS_CA_BUNDLE
中指定的自签名SSL证书将不起作用,结果引发了SSL:CERTIFICATE_VERIFY_FAILED
。你可以通过将环境设置显式合并到Session中来避免这种行为:from requests import Request, Session
s = Session()
req = Request('GET', url)
prepped = s.prepare_request(req)
# Merge environment settings into session
settings = s.merge_environment_settings(prepped.url, {}, None, None, None)
resp = s.send(prepped, **settings)
print(resp.status_code)
>>> from requests.auth import HTTPBasicAuth
>>> auth = HTTPBasicAuth('your_username', 'your_password')
>>> r = requests.post(url='you_target_url', data=body, auth=auth)
>>> requests.get('https://requestb.in')
requests.exceptions.SSLError: hostname 'requestb.in' doesn't match either of '*.herokuapp.com', 'herokuapp.com'
verify
参数传递拥有受信任CA的证书的CA_BUNDLE文件的路径或者目录:>>> requests.get('https://github.com', verify='/path/to/certfile')
s = requests.Session()
s.verify = '/path/to/certfile'
verify
设置为目录的路径,则必须使用OpenSSL提供的c_rehash
实用程序处理该目录。REQUESTS_CA_BUNDLE
环境变量指定此受信任CA列表。如果未设置REQUESTS_CA_BUNDLE
,将使用CURL_CA_BUNDLE
。verify
设置为False
,则requests也可以忽略SSL证书验证:>>> requests.get('https://kennethreitz.org', verify=False)
verify
设置为False
时,Requests将接受服务器提供的任何TLS证书,并将忽略主机名不匹配,或过期的证书,这将使你的应用程序容易受到中间人(MitM)攻击。在本地开发或测试期间,将verify
设置为False
可能很有用。verify
设置为True
。选项verify
仅适用于主机证书。>>> requests.get('https://kennethreitz.org', cert=('/path/client.cert', '/path/client.key'))
s = requests.Session()
s.cert = '/path/client.cert'
certific
时,当使用较旧版本的requests时,这会导致证书包非常过时。certific
!stream
参数覆盖此行为并延迟下载响应主体直到访问[response.content
]属性tarball_url = 'https://github.com/psf/requests/tarball/main'
r = requests.get(tarball_url, stream=True)
if int(r.headers.get('content-length')) < TOO_LONG:
content = r.content
...
Response.iter_content()
] 和[Response.iter_lines()
] 方法进一步控制工作流。或者,可以从位于[Response.raw
]的底层的[urllib3.HTTPResponse
]中读取未编码的主体.stream
设置为True
,则requests无法释放连接回连接池,除非读取完所有数据或调用[Response.close
](。这可能导致连接效率低下。如果你发现自己在使用stream=True
时部分读取请求体(或根本没有读取它们),则应在with
语句中发出请求,以确保连接最终处于关闭状态:with requests.get('https://httpbin.org/get', stream=True) as r:
# Do things with the response here.
urllib3
,keep-alive
在Session中是100%自动的!你在Session发出的任何请求都将自动重用合适的连接!stream
设置为False
或读取Response
对象的content
属性。with open('massive-body', 'rb') as f:
requests.post('http://some.url/streamed', data=f)
Content-Length
请求头,如果这样做,该请求头值将被设置为文件中的_字节数_。如果以_文本模式_打开文件,可能会发生错误。def gen():
yield 'hi'
yield 'there'
requests.post('http://some.url/chunked', data=gen())
Response.iter_content()
]对数据进行迭代。在理想情况下,将在请求上设置stream=True
,在这种情况下,可以通过使用值为None
的chunk_size
参数调用iter_content
来逐块迭代。如果要设置块的最大大小,可以将chunk_size
参数设置为任意目标大小整数。
files
设置为(form_field_name,file_info)
的元组列表:>>> url = 'https://httpbin.org/post'
>>> multiple_files = [
... ('images', ('foo.png', open('foo.png', 'rb'), 'image/png')),
... ('images', ('bar.png', open('bar.png', 'rb'), 'image/png'))]
>>> r = requests.post(url, files=multiple_files)
>>> r.text
>>> r.text
'{\n "args": {}, \n "data": "", \n "files": {\n "images": "...=="\n }, \n "form": {}, \n "headers": {\n "Accept": "*/*", \n "Accept-Encoding": "gzip, deflate", \n "Content-Length": "1800", \n "Content-Type": "multipart/form-data; boundary=771ef90459071106c5f47075cbca2659", \n "Host": "httpbin.org", \n "User-Agent": "python-requests/2.27.1", \n "X-Amzn-Trace-Id": "Root=1-641122ea-10a6271f0fdf488c70cf90e9"\n }, \n "json": null, \n "origin": "183.62.127.25", \n "url": "https://httpbin.org/post"\n}\n'
response
:{hook_name:callback_function}
字典传递给hooks
请求参数,可以按每个请求分配一个钩子函数:hooks={'response': print_url}
callback_function
将接收一数据块(a chunk of data)作为其第一个参数。def print_url(r, *args, **kwargs):
print(r.url)
def record_hook(r, *args, **kwargs):
r.hook_called = True
return r
>>> requests.get('https://httpbin.org/', hooks={'response': print_url})
https://httpbin.org/
>>> r = requests.get('https://httpbin.org/', hooks={'response': [print_url, record_hook]})
>>> r.hook_called
True
Session
实例添加钩子,这样添加的任何钩子都将在向会话发出的每个请求中被调用。例如:>>> s = requests.Session()
>>> s.hooks['response'].append(print_url)
>>> s.get('https://httpbin.org/')
https://httpbin.org/
Session
实例可个钩子函数,那么将按钩子的添加顺序调用这些钩子。auth
参数传递给请求方法的任何可调用对象都有机会在发送请求之前修改请求。AuthBase
]的子类,并且易于定义。requests在requests.auth
中提供了两种常见的身份验证方案实现:[HTTPBasicAuth
]和[HTTPDigestAuth
].X-Pizza
请求头设置为密码值时才会响应。这不太可能,暂且还是顺着它:from requests.auth import AuthBase
class PizzaAuth(AuthBase):
"""Attaches HTTP Pizza Authentication to the given Request object."""
def __init__(self, username):
# setup any auth-related data here
self.username = username
def __call__(self, r):
# modify and return the request
r.headers['X-Pizza'] = self.username
return r
>>> requests.get('http://pizzabin.org/admin', auth=PizzaAuth('kenneth'))
Response.iter_lines()
],可以很轻易的迭代流式API,比如 [Twitter Streaming API]。简单的设置 stream
为 True
并且使用[iter_lines
]对响应进行迭代:import json
import requests
r = requests.get('https://httpbin.org/stream/20', stream=True)
for line in r.iter_lines():
# filter out keep-alive new lines
if line:
decoded_line = line.decode('utf-8')
print(json.loads(decoded_line))
decode_unicode=True
与 [Response.iter_lines()
]、 或者[Response.iter_content()
]配合使用时,如果服务器未提供编码,则需要提供编码:r = requests.get('https://httpbin.org/stream/20', stream=True)
if r.encoding is None:
r.encoding = 'utf-8'
for line in r.iter_lines(decode_unicode=True):
if line:
print(json.loads(line))
iter_lines
]不是可重入安全的。多次调用此方法会导致一些接收到的数据丢失。如果需要从多个地方调用它,请使用生成的迭代器对象:lines = r.iter_lines()
# Save the first line for later or just skip it
first_line = next(lines)
for line in lines:
print(line)
proxys
参数中为单个请求配置代理import requests
proxies = {
'http': 'http://10.10.1.10:3128',
'https': 'http://10.10.1.10:1080',
}
requests.get('http://example.org', proxies=proxies)
import requests
proxies = {
'http': 'http://10.10.1.10:3128',
'https': 'http://10.10.1.10:1080',
}
session = requests.Session()
session.proxies.update(proxies)
session.get('http://example.org')
session.proxies
提供的值可能被环境代理(由[urllib.request.getproxys]返回的值)覆盖,所以为了确保在环境代理存在的情况下,也使用给定代理,显示为所有单个请求指定proxies
参数,如上述一开始所述。proxies
请求参数的情况下,requests会尝试读取由标准环境变量 http_proxy
, https_proxy
, no_proxy
和all_proxy
定义的代理配置。这些变量名称可大写。所以,可以通过这些变量配置为请求设置代理(请根据实际需要配置):linux:
$ export HTTP_PROXY="http://10.10.1.10:3128"
$ export HTTPS_PROXY="http://10.10.1.10:1080"
$ export ALL_PROXY="socks5://10.10.1.10:3434"
$ python
>>> import requests
>>> requests.get('http://example.org')
win:
set HTTP_PROXY=http://10.10.1.10:3128
>>> import requests
>>> requests.get('http://example.org')
$ export HTTPS_PROXY="http://user:[email protected]:1080"
$ python
>>> proxies = {'http': 'http://user:[email protected]:3128/'}
scheme://hostname
作proxies
字典参数的键来设置代理。这将匹配给定scheme和确切主机名的任何请求。proxies = {'http://10.20.1.128': 'http://10.10.1.10:5323'}
https
连接设置代理,通常需要所在本机机器信任代理根证书。默认的,可以通过以下代码查找requests信任的证书列表:from requests.utils import DEFAULT_CA_BUNDLE_PATH
print(DEFAULT_CA_BUNDLE_PATH)
REQUESTS_CA_BUNDLE
(or CURL_CA_BUNDLE
) 环境变量设置为另一个文件路径,可以覆盖此证书路径:$ export REQUESTS_CA_BUNDLE="/usr/local/myproxy_info/cacert.pem"
$ export https_proxy="http://10.10.1.10:1080"
$ python
>>> import requests
>>> requests.get('https://example.org')
pip
获取该功能需要的依赖:$ python -m pip install requests[socks]
proxies = {
'http': 'socks5://user:pass@host:port',
'https': 'socks5://user:pass@host:port'
}
socks5
会导致DNS解析发生在客户端上,而不是代理服务器上。这与curl
保持一致,curl使用scheme来决定是在客户端还是代理服务器上进行DNS解析。如果要解析代理服务器上的域,请使用socks5h
作为schemeResponse.text
属性时,requests会猜测用于解码响应体的编码。requests将首先检查HTTP请求头中的编码,如果不存在,则使用[charset_normalizer]尝试猜测编码。chardet
,requests
将使用它,但对于python3来说,chardet
不再是强制依赖项。requests
时,没有指定 [use_chardet_on_py3]
,并且chardet
尚未安装时,requests将使用charset normalizer
来猜测编码。Content-Type
请求头包含text
。在这种情况下,[RFC 2616]指定默认字符集必须是ISO-8859-1
。requests遵循该规范。如果需要不同的编码,您可以手动设置[Response.conding
])属性,或使用原始[Response.content
]>>> url = 'https://api.github.com/users/kennethreitz/repos?page=1&per_page=10'
>>> r = requests.head(url=url)
>>> r.headers['link']
'; rel="next", ; rel="last"'
>>> r.links["next"]
{'url': 'https://api.github.com/user/119893/repos?page=2&per_page=10', 'rel': 'next'}
>>> r.links["last"]
{'url': 'https://api.github.com/user/119893/repos?page=5&per_page=10', 'rel': 'last'}
HTTPAdapter
]. 此适配器使用功能强大的urllib3
提供与HTTP和HTTPS的默认请求交互。当初始化 requests [Session
]时,其中一个附加到[Session
]对象表示HTTP,一个表示HTTPS。>>> s = requests.Session()
>>> s.mount('https://github.com/', MyAdapter())
mount
调用将传输适配器的指定实例注册到URL前缀中。一旦挂载,使用该session发起的,URL以给定前缀开头的任何HTTP请求都将使用给定的传输适配器。BaseAdapter
]\实现子类适配器。urllib3
库中默认的SSL版本。 通常情况下,这是可以的,但有时,您可能会发现自己需要连接到使用与默认版本不兼容的SSL版本的服务端。HTTPAdapter
实现自定义传输适配器,import ssl
from urllib3.poolmanager import PoolManager
from requests.adapters import HTTPAdapter
class Ssl3HttpAdapter(HTTPAdapter):
""""Transport adapter" that allows us to use SSLv3."""
def init_poolmanager(self, connections, maxsize, block=False):
self.poolmanager = PoolManager(
num_pools=connections, maxsize=maxsize,
block=block, ssl_version=ssl.PROTOCOL_SSLv3)
Response.content
]属性将阻塞,直到下载完整个响应为止。如果你需要更大的粒度,则库的流式传输功能(请参阅[流式传输请求])允许单次接收较小数量的响应那日。然而,这些调用仍然是阻塞。timeout
参数指定一个具体的时间值:r = requests.get('https://github.com', timeout=5)
r = requests.get('https://github.com', timeout=(3.05, 27))
timeout
参数值为None
r = requests.get('https://github.com', timeout=None)