python爬虫 - 使用requests

前面了解了urllib的基本用法,但是其中确实有不方便的地方。比如处理网页验证、处理cookies等等,需要写 Opener、Handler 来进行处理。为了更加方便地实现这些操作,在这里就有了更为强大的库requests,有了它,cookies、登录验证、代理设置等等的操作都不是事儿。

get()

import requests

response = requests.get('https://httpbin.org/get')
print(response.text)

运行结果如下:

{
  "args": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Connection": "close", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.21.0"
  }, 
  "origin": "116.227.107.42", 
  "url": "https://httpbin.org/get"
}

除此之外requstes还有其他类型的请求。

response = requests.post('http://httpbin.org/post')
response = requests.put('http://httpbin.org/put')
response = requests.delete('http://httpbin.org/delete')
response = requests.head('http://httpbin.org/get')
response = requests.options('http://httpbin.org/get')

请求一个带参数的地址。例如:http://httpbin.org/get?name=chris&age=22。

方法一:直接请求

import requests

response = requests.get('http://httpbin.org/get?name=chris&age=22')
print(response.text)

方法二:利用param参数

import requests

params = {
    'name': 'chris',
    'age': 22
}
response = requests.get("http://httpbin.org/get", params=params)
print(response.text)

运行结果如下:

{
  "args": {
    "age": "22", 
    "name": "chris"
  }, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Connection": "close", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.21.0"
  }, 
  "origin": "116.227.107.42", 
  "url": "http://httpbin.org/get?name=chris&age=22"
}

添加 headers

import requests


headers= {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36',
}

response = requests.get("http://httpbin.org/get", headers=headers)
print(response.text)

运行结果如下:

{
  "args": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Connection": "close", 
    "Host": "httpbin.org", 
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36"
  }, 
  "origin": "116.227.107.42", 
  "url": "http://httpbin.org/get"
}

添加cookies

import requests

headers = {
    'Cookie': 'q_c1=31653b264a074fc9a57816d1ea93ed8b|1474273938000|1474273938000; d_c0="AGDAs254kAqPTr6NW1U3XTLFzKhMPQ6H_nc=|1474273938"; __utmv=51854390.100-1|2=registration_date=20130902=1^3=entry_date=20130902=1;a_t="2.0AACAfbwdAAAXAAAAso0QWAAAgH28HQAAAGDAs254kAoXAAAAYQJVTQ4FCVgA360us8BAklzLYNEHUd6kmHtRQX5a6hiZxKCynnycerLQ3gIkoJLOCQ==";z_c0=Mi4wQUFDQWZid2RBQUFBWU1DemJuaVFDaGNBQUFCaEFsVk5EZ1VKV0FEZnJTNnp3RUNTWE10ZzBRZFIzcVNZZTFGQmZn|1474887858|64b4d4234a21de774c42c837fe0b672fdb5763b0',
    'Host': 'www.zhihu.com',
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.116 Safari/537.36',
}
r = requests.get('https://www.zhihu.com', headers=headers)
print(r.text)

添加proxy

import requests

proxies = {
  'http': 'http://127.0.0.1:1080',
  'https': 'http://127.0.0.1:1080',
}

requests.get('http://httpbin.org/get', proxies=proxies)

你可能感兴趣的:(python爬虫)