python爬虫 - 使用urllib(三)

Request

利用urlopen()方法可以实现最基本请求的发起,但这几个简单的参数并不足以构建一个完整的请求,如果请求中需要加入headers等信息,我们就可以利用更强大的Request类来构建一个请求。

import urllib.request

request = urllib.request.Request('http://httpbin.org/get')
response = urllib.request.urlopen(request)
print(response.read().decode('utf-8'))

运行结果如下:

{
  "args": {}, 
  "headers": {
    "Accept-Encoding": "identity", 
    "Connection": "close", 
    "Host": "httpbin.org", 
    "User-Agent": "Python-urllib/3.7"
  }, 
  "origin": "116.227.107.42", 
  "url": "http://httpbin.org/get"
}

添加headers和data

from urllib import request, parse

url = 'http://httpbin.org/post'
headers = {
    'User-Agent': 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)',
}
dict = {
    'name': 'chris'
}
data = bytes(parse.urlencode(dict), encoding='utf8')
req = request.Request(url=url, data=data, headers=headers, method='POST')
response = request.urlopen(req)
print(response.read().decode('utf-8'))

运行结果如下:

{
  "args": {}, 
  "data": "", 
  "files": {}, 
  "form": {
    "name": "chris"
  }, 
  "headers": {
    "Accept-Encoding": "identity", 
    "Connection": "close", 
    "Content-Length": "10", 
    "Content-Type": "application/x-www-form-urlencoded", 
    "Host": "httpbin.org", 
    "User-Agent": "Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)"
  }, 
  "json": null, 
  "origin": "116.227.107.42", 
  "url": "http://httpbin.org/post"
}

你可能感兴趣的:(python爬虫)