python 爬虫学习入门6 requests库 添加代理proxies等其他信息


通过Requests 库 提供的方法我们可以在请求页面的时候同时添加更多的信息,在urllib库中同时添加代理和User-agent 我在网上找了一些办法但是没有理解就不在这里说了,有兴趣的可以自己在网上搜。

Request 库中添加代理的方法是传递proxies 参数, 同时可以设置超时等待条件

import requests

url = ""

# 添加User-agent 等 头部信息
headers = {
    'User-agent': '123'

# 添加proxies

proxies = {
	'http': ''

response = request.get(url, headers=headers, proxies=proxies, timeout=10)

data = response.content.decode('utf-8')

下面是request中定义的参数, 可以根据下面的注释内容来根据自己的需求来添加信息

def request(method, url, **kwargs):
    """Constructs and sends a :class:`Request `.

    :param method: method for the new :class:`Request` object: ``GET``, ``OPTIONS``, ``HEAD``, ``POST``, ``PUT``, ``PATCH``, or ``DELETE``.
    :param url: URL for the new :class:`Request` object.
    :param params: (optional) Dictionary, list of tuples or bytes to send
        in the query string for the :class:`Request`.
    :param data: (optional) Dictionary, list of tuples, bytes, or file-like
        object to send in the body of the :class:`Request`.
    :param json: (optional) A JSON serializable Python object to send in the body of the :class:`Request`.
    :param headers: (optional) Dictionary of HTTP Headers to send with the :class:`Request`.
    :param cookies: (optional) Dict or CookieJar object to send with the :class:`Request`.
    :param files: (optional) Dictionary of ``'name': file-like-objects`` (or ``{'name': file-tuple}``) for multipart encoding upload.
        ``file-tuple`` can be a 2-tuple ``('filename', fileobj)``, 3-tuple ``('filename', fileobj, 'content_type')``
        or a 4-tuple ``('filename', fileobj, 'content_type', custom_headers)``, where ``'content-type'`` is a string
        defining the content type of the given file and ``custom_headers`` a dict-like object containing additional headers
        to add for the file.
    :param auth: (optional) Auth tuple to enable Basic/Digest/Custom HTTP Auth.
    :param timeout: (optional) How many seconds to wait for the server to send data
        before giving up, as a float, or a :ref:`(connect timeout, read
        timeout) ` tuple.
    :type timeout: float or tuple
    :param allow_redirects: (optional) Boolean. Enable/disable GET/OPTIONS/POST/PUT/PATCH/DELETE/HEAD redirection. Defaults to ``True``.
    :type allow_redirects: bool
    :param proxies: (optional) Dictionary mapping protocol to the URL of the proxy.
    :param verify: (optional) Either a boolean, in which case it controls whether we verify
            the server's TLS certificate, or a string, in which case it must be a path
            to a CA bundle to use. Defaults to ``True``.
    :param stream: (optional) if ``False``, the response content will be immediately downloaded.
    :param cert: (optional) if String, path to ssl client cert file (.pem). If Tuple, ('cert', 'key') pair.
    :return: :class:`Response ` object
    :rtype: requests.Response


      >>> import requests
      >>> req = requests.request('GET', '')
      >>> req
