python3的Rquests库,python3爬虫利器

为什么80%的码农都做不了架构师?>>>   hot3.png

1、安装Requests

用pip3来安装Requests

$ pip install requests

通过下git方式下载安装

$ git clone git://github.com/kennethreitz/requests.git

通过下载源码安装

$ curl -OL https://github.com/kennethreitz/requests/tarball/master
  # optionally, zipball is also available (for Windows users).

拷贝完成源码后运行代码

$ python setup.py install

2、Rquests的使用

导入requests模块

>>> import requests

request的使用

r = requests.get('https://api.github.com/events')
r = requests.post('http://httpbin.org/post', data = {'key':'value'})
r = requests.put('http://httpbin.org/put', data = {'key':'value'})
r = requests.delete('http://httpbin.org/delete')
r = requests.head('http://httpbin.org/get')
r = requests.options('http://httpbin.org/get')

具体http中get、put、post、delete、head、options区别参考如下:

https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol

http://www.15yan.com/story/7dz6oXiSHeq/

http://blog.csdn.net/kiyoki/article/details/14127203

3、url中带参数

payload = {'key1': 'value1', 'key2': 'value2'}
r = requests.get('http://httpbin.org/get', params=payload)
print(r.url)

>>>http://httpbin.org/get?key2=value2&key1=value1

如果相同参数含多个值时情况如下:

ayload = {'key1': 'value1', 'key2': ['value2', 'value3']}
r = requests.get('http://httpbin.org/get', params=payload)
print(r.url)

>>>http://httpbin.org/get?key1=value1&key2=value2&key2=value3

4、获取Requests响应内容

>>> import requests
>>> r = requests.get('https://api.github.com/events')
>>> r.status_code
200

>>> r.text
u'[{"repository":{"open_issues":0,"url":"https://github.com/...

>>> r.encoding
'utf-8'
>>> r.encoding = 'ISO-8859-1'

>>> r.content
b'[{"repository":{"open_issues":0,"url":"https://github.com/...

>>> from PIL import Image
>>> from io import BytesIO
>>> i = Image.open(BytesIO(r.content))

>>>r.json()
[{u'repository': {u'open_issues': 0, u'url': 'https://github.com/...

>>> r.headers
{
    'content-encoding': 'gzip',
    'transfer-encoding': 'chunked',
    'connection': 'close',
    'server': 'nginx/1.0.4',
    'x-runtime': '148ms',
    'etag': '"e1ca502697e5c9317743dc078f67693f"',
    'content-type': 'application/json'
}

>>> r.headers['Content-Type']
'application/json'

>>> r.headers.get('content-type')
'application/json'

通过字符串方式的文件传输

>> r = requests.get('https://api.github.com/events', stream=True)

>>> r.raw


>>> r.raw.read(10)
'\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x03'

一般情况下,我们需要存储这些获得的字符串

with open(filename, 'wb') as fd:
    for chunk in r.iter_content(chunk_size=128):
        fd.write(chunk)

5、自定义headers

>>> url = 'https://api.github.com/some/endpoint'
>>> headers = {'user-agent': 'my-app/0.0.1'}
>>> r = requests.get(url, headers=headers)

6、Request的post方法

form-encoded的数据post方法—多用在HTML表单的提交

>>> payload = {'key1': 'value1', 'key2': 'value2'}

>>> r = requests.post("http://httpbin.org/post", data=payload)
>>> print(r.text)
{
  ...
  "form": {
    "key2": "value2",
    "key1": "value1"
  },
  ...
}

JSON-Encoded的数据post方法

url = 'https://api.github.com/some/endpoint'
>>> payload = {'some': 'data'}
>>> r = requests.post(url, json=payload)

注:json方法只能在2.4.2以上版本中使用。

7、Post上传文件

普通上传一个report.xls的文件

>>> url = 'http://httpbin.org/post'
>>> files = {'file': open('report.xls', 'rb')}

>>> r = requests.post(url, files=files)
>>> r.text
{
  ...
  "files": {
    "file": ""
  },
  ...
}

自定义文件名的方式上传文件

>>> url = 'http://httpbin.org/post'
>>> files = {'file': ('report.xls', open('report.xls', 'rb'), 'application/vnd.ms-excel', {'Expires': '0'})}

>>> r = requests.post(url, files=files)
>>> r.text
{
  ...
  "files": {
    "file": ""
  },
  ...
}

以字符串方式传输,到服务后段生成文件:

>>> url = 'http://httpbin.org/post'
>>> files = {'file': ('report.csv', 'some,data,to,send\nanother,row,to,send\n')}

>>> r = requests.post(url, files=files)
>>> r.text
{
  ...
  "files": {
    "file": "some,data,to,send\\nanother,row,to,send\\n"
  },
  ...
}

8、cookie

快速获取网站cookie

>>> url = 'http://example.com/some/cookie/setting/url'
>>> r = requests.get(url)

>>> r.cookies['example_cookie_name']
'example_cookie_value'

更多内容查看:http://www.nigaea.com/dataanalysis/80.html

转载于:https://my.oschina.net/at5/blog/809078

你可能感兴趣的:(python3的Rquests库,python3爬虫利器)