官方文档地址:https://docs.python.org/3/library/urllib.request.html#urllib.request.urlopen
urlopen
urllib.request.urlopen(url, data=None, [timeout, ]*, cafile=None, capath=None, cadefault=False, context=None)
参数
返回对象有以下属性
对于 HTTP 和 HTTPS URLs 来说,返回的是稍加修改的 http.client.HTTPResponse 对象。
对于 FTP、文件和数据 URLs 来说,返回的是 urllib.response.addinfourl 对象。
基础示例
import urllib.request url = r'http://www.baidu.com' # 字符串前添加 r,防止 url 中包含转义字符 response = urllib.request.urlopen(url) # url 中不添加协议,报错,ValueError: unknown url type: 'www.baidu.com' content = response.read() content_decode = content.decode() # 默认编码 utf-8 print(type(response)) #
自定义请求
Request
文档:https://docs.python.org/3/library/urllib.request.html#urllib.request.Request
class urllib.request.Request(url, data=None, headers={}, origin_req_host=None, unverifiable=False, method=None)
from urllib import request url = r'http://www.baidu.com' req = request.Request(url) response = request.urlopen(url).read().decode()
添加 UserAgent,伪装浏览器
from urllib import request url = r'http://www.baidu.com' header = { "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.75 Safari/537.36" } req = request.Request(url, headers=header) response = request.urlopen(url).read().decode()