Python --requests模块

目录

1, get方法

1-1, 不带参数

1-2, 带参数(params)

1-2-1, 参数为字典

1-2-2, 参数为元组

1-3, 设置headers

1-4, 设置cookies

1-4-1, cookies为字典

1-4-2, cookies为RequestsCookieJar对象

1-5, 设置超时时间

1-6, 响应体(Response)

1-6-1, text属性

1-6-2, content属性

1-6-3, json()方法

1-6-4, iter_content()方法

1-6-5, raw属性

1-6-6, 响应头部(headers)

1-6-7, 获取响应体cookie

1-6-8, 请求头header

 1-6-9, encoding属性

2,  POST方法

2-1, 参数为data

2-1-1, data为字典

2-1-2, data为元组

2-2, 参数为json

2-3, 上传文件(files)

2-3-1, 指定文件路径上传

2-3-2, 通过文件对象上传字符串


requests模块是第三方模块, 需要通过pip install requests 命令进行安装

1, get方法

通过get方法, 可请求网页, 返回Response对象

1-1, 不带参数

通过request.get(url), 请求指定网页

In [70]: url = r'https://www.baidu.com'

# 访问指定的url网址, 返回一个Response对象
In [71]: r = requests.get(url)

In [151]: r.url
Out[151]: 'https://www.baidu.com/'

1-2, 带参数(params)

params可以是字典或者元组形式

1-2-1, 参数为字典

# params为字典格式
In [146]: params1 = {'key1': 'value1', 'key2': 'value2'}
In [147]: r1 = requests.get('http://httpbin.org/get', params=params1)

# r1.url返回url信息
In [148]: r1.url
Out[148]: 'http://httpbin.org/get?key1=value1&key2=value2'

# params中参数值None, 则该参数不会下发
In [184]: params1 = {'key1': 'value1', 'key2': None}
In [185]: r1 = requests.get('http://httpbin.org/get', params=params1)

# 由于key2的值为None,所以url中没有下发key2
In [186]: r1.url
Out[186]: 'http://httpbin.org/get?key1=value1'

1-2-2, 参数为元组

# params为元组形式
In [153]: params1 = (('key1', 'value1'), ('key2', 'value2'))
In [154]: r1 = requests.get('http://httpbin.org/get', params=params1)

# 通过r1.url返回url信息
In [155]: r1.url
Out[155]: 'http://httpbin.org/get?key1=value1&key2=value2'

# params中参数值None, 则该参数不会下发
In [156]: params1 = (('key1', 'value1'), ('key2', None))
In [157]: r1 = requests.get('http://httpbin.org/get', params=params1)

In [158]: r1.url
Out[158]: 'http://httpbin.org/get?key1=value1'

1-3, 设置headers

通过设置参数headers={'user-agent': 'xxxx'}, 可在定制header进行访问加入headers的作用是起到浏览器标识作用,若不加,若访问的网页有反爬虫,则会获取失败

# user-agent将覆盖原始的user-agent值,其他值不变
In [56]: header = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chro
    ...: me/114.0.5735.289 Safari/537.36'}

In [57]: r = requests.get('https://www.baidu.com', headers=header)

user-agent的获取:F12->网络->请求标头->User-Agent

1-4, 设置cookies

cookies的作用,可以访问登录后的页面, cookies的值可以是字典或者RequestsCookieJar对象

1-4-1, cookies为字典

# 设置cookies={'cookies_are': 'working'}, 访问指定url
In [178]: r = requests.get('http://httpbin.org/cookies', cookies={'cookies_are': 'working'})

1-4-2, cookies为RequestsCookieJar对象

# 生成RequestsCookieJar对象
In [179]: jar = requests.cookies.RequestsCookieJar()

# 填入RequestsCookieJar需要的信息
In [180]: jar.set('tasty_cookie', 'yum', domain='httpbin.org', path='/cookies')
Out[180]: Cookie(version=0, name='tasty_cookie', value='yum', port=None, port_specified=False, domain='httpbin.org', domain_specified=True, domain_initial_dot=False, path='/cookies', path_specified=True, secure=False, expires=None, discard=True, comment=None, comment_url=None, rest={'HttpOnly': None}, rfc2109=False)

# 通过cookies=jar, 访问指定url
In [181]: r = requests.get('http://httpbin.org/cookies', cookies=jar)

In [182]: r.cookies
Out[182]: 

In [183]: r.json()['cookies']
Out[183]: {'tasty_cookie': 'yum'}

1-5, 设置超时时间

通过timeout='xxx', 设置超时时间, 单位为

# 2.5s后超时
In [190]: r = requests.get(r'https://www.baidu.com', timeout=2.5)

1-6, 响应体(Response)

1-6-1, text属性

通过r.text, 以字符串查看Response对象的内容

# 通过r.text, 以字符串返回Response的内容
In [86]: r.text
Out[86]: '\r\n ... 
\r\n'

1-6-2, content属性

通过r.content, 以bytes查看Response对象的内容

# 通过r.content, 以bytes返回Response的内容
In [85]: r.content
Out[85]: b'\r\n...
\r\n'

1-6-3, json()方法

通过r.json(), 可将json格式结果进行返回, 若结果包含无效json格式,则会报错

In [18]: r = requests.get('https://api.github.com/events')
# 通过r.json(), 将json格式的结果进行返回
In [19]: r.json()
Out[19]:
[{'id': '31993150847',
  'type': 'PushEvent',
  'actor': {'id': 111329684,
....
}]

1-6-4, iter_content()方法

通过r.iter_content(chunk_size), 可将Response对象的内容以迭代方式写入文件

# 将r的内容写入到文件中, chunk_size可自定义大小
In [31]: with open(pravate_key_file, 'wb') as f:
    ...:     for chunk in r.iter_content(chunk_size=128):
    ...:         f.write(chunk)

1-6-5, raw属性

通过r.raw, 返回Response对象的原始字节流

In [33]: r.raw
Out[33]: 

1-6-6, 响应头部(headers)

通过 r.headers, 可获取响应体头部信息

In [129]: url = r'https://www.baidu.com'

In [130]: r = requests.get(url)

In [131]: r.headers
Out[131]: {'Cache-Control': 'private, no-cache, no-store, proxy-revalidate, no-transform', 'Connection': 'keep-alive', 'Content-Encoding': 'gzip', 'Content-Type': 'text/html', 'Date': 'Fri, 22 Sep 2023 13:18:05 GMT', 'Last-Modified': 'Mon, 23 Jan 2017 13:24:13 GMT', 'Pragma': 'no-cache', 'Server': 'bfe/1.0.8.18', 'Set-Cookie': 'BDORZ=27315; max-age=86400; domain=.baidu.com; path=/', 'Transfer-Encoding': 'chunked'}

1-6-7, 获取响应体cookie

In [129]: url = r'https://www.baidu.com'

In [130]: r = requests.get(url)


In [132]: r.cookies
Out[132]: 

1-6-8, 请求头header

In [136]: url = 'https://www.baidu.com'

In [137]: r = requests.get(url)

In [138]: r.request.headers
Out[138]: {'User-Agent': 'python-requests/2.31.0', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}

 1-6-9, encoding属性

通过r.encoding, 可查看编码方式, 通过Response对象.encoding=编码格式,修改编码格式

# 通过r.encoding查询Response对象的编码方式
In [88]: r.encoding
Out[88]: 'ISO-8859-1'

# 先获取Response对象, 在修改通过r.encoding = 'utf-8'修改编码方式, 再查看内容
In [94]: r = requests.get(url)
# 修改Response对象的编码方式为'utf-8'
In [95]: r.encoding = 'utf-8'
# 这里可以看出,更换编码方式后,中文字符正常显示
In [96]: r.text
Out[96]: '\r\n    \r\n'

2,  POST方法

通过requests.post(url, data|json), 可向表单中填写数据

2-1, 参数为data

2-1-1, data为字典

通过request.post(url, data={k, v}), 将data数据写入表单

In [25]: url
Out[25]: 'http://httpbin.org/post'

# 通过requests.post,向指定的url表单中写入data数据
In [26]: r = requests.post(url, data={'k1': 'v1', 'k2': 'v2'})

# 通过json.loads, 将r.text转换为字典
In [27]: r = json.loads(r.text)
In [28]: r
Out[28]:
{'args': {},
 'data': '',
 'files': {},
 'form': {'k1': 'v1', 'k2': 'v2'},
 'headers': {'Accept': '*/*',
  'Accept-Encoding': 'gzip, deflate',
  'Content-Length': '11',
  'Content-Type': 'application/x-www-form-urlencoded',
  'Host': 'httpbin.org',
  'User-Agent': 'python-requests/2.31.0',
  'X-Amzn-Trace-Id': 'Root=1-6506c949-49d62f6f16049bf477544334'},
 'json': None,
 'origin': '110.184.215.54',
 'url': 'http://httpbin.org/post'}

# 查看表单填写数据
In [29]: r['form']
Out[29]: {'k1': 'v1', 'k2': 'v2'}

2-1-2, data为元组

通过request.post(url, data=((k1, v1),  ..., (kn, vn))), 将data数据写入表单

# data为元组形式,元组元素为(键, 值)
In [31]: r = requests.post(url, data=(('k1', 'v1'), ('k2', 'v2'), ('k3', 'v3')))

In [32]: r = json.loads(r.text)

In [33]: r
Out[33]:
{'args': {},
 'data': '',
 'files': {},
 'form': {'k1': 'v1', 'k2': 'v2', 'k3': 'v3'},
 'headers': {'Accept': '*/*',
  'Accept-Encoding': 'gzip, deflate',
  'Content-Length': '17',
  'Content-Type': 'application/x-www-form-urlencoded',
  'Host': 'httpbin.org',
  'User-Agent': 'python-requests/2.31.0',
  'X-Amzn-Trace-Id': 'Root=1-6506ca81-55d3fd0d5a4636046d7e481e'},
 'json': None,
 'origin': '110.184.215.54',
 'url': 'http://httpbin.org/post'}

In [34]: r['form']
Out[34]: {'k1': 'v1', 'k2': 'v2', 'k3': 'v3'}

2-2, 参数为json

payload = {'some': 'data'}
# 通过json=payload, 将payload的数据自动转换为json格式
r = requests.post('https://api.github.com/some/endpoint', json=payload)

2-3, 上传文件(files)

通过设置参数files={'file': open(path, 'rb')}, 可将指定文件上传到指定网页,读写方式必须为rb

2-3-1, 指定文件路径上传

通过requests.post(url, files={'file': open(path, 'rb')}), 将文件上传到指定的url

# 将path_1文件上传到指定网页
In [123]: r = requests.post(r'http://httpbin.org/post', files={'file': open(path_1, 'rb')})

In [124]: r.text
Out[124]: '{\n  "args": {}, \n  "data": "", \n  "files": {\n    "file": "111dfas\\r\\n"\n  }, \n  "form": {}, \n  "headers": {\n    "Accept": "*/*", \n    "Accept-Encoding": "gzip, deflate", \n    "Content-Length": "151", \n    "Content-Type": "multipart/form-data; boundary=500c09c42ab8abcc80b49a8888a6f6d8", \n    "Host": "httpbin.org", \n    "User-Agent": "python-requests/2.31.0", \n    "X-Amzn-Trace-Id": "Root=1-650d57bd-0527a16423c69e35786507ac"\n  }, \n  "json": null, \n  "origin": "118.112.139.28", \n  "url": "http://httpbin.org/post"\n}\n'

2-3-2, 通过文件对象上传字符串

通过requests.post(url, files={'file': (path, 自定义字符串)}), 将自定义字符串自定义字符串, 通过文件文件上传到指定的url

# 通过文件对象path_1, 将字符串'aaaahbbfdsa'上传到指定网页
In [127]: r = requests.post(r'http://httpbin.org/post', files={'file': (path_1, 'aaaahbbfdsa')})

In [128]: r.text
Out[128]: '{\n  "args": {}, \n  "data": "", \n  "files": {\n    "file": "aaaahbbfdsa"\n  }, \n  "form": {}, \n  "headers": {\n    "Accept": "*/*", \n    "Accept-Encoding": "gzip, deflate", \n    "Content-Length": "184", \n    "Content-Type": "multipart/form-data; boundary=4ef067fdd0df95bd4d69cac4589fec10", \n    "Host": "httpbin.org", \n    "User-Agent": "python-requests/2.31.0", \n    "X-Amzn-Trace-Id": "Root=1-650d59d9-45a773183cfcbcd9573aa3e9"\n  }, \n  "json": null, \n  "origin": "118.112.139.28", \n  "url": "http://httpbin.org/post"\n}\n'

你可能感兴趣的:(python,python)