如何阅读源代码(以 Python requests 库为例)


如何阅读源代码(以 Python requests 库为例)_第1张图片
requests: http library for humans

如何阅读源代码(以 Python requests 库为例)

对任何一位刚开始阅读源码的工程师来说,学会循序渐进,由表及里地对源码进行阅读和剖析,都不是一件容易的事情,因此需要有一些经验和方法论予以指导。

本文是对 PyCon 2016 的 Let's read code: the requests library 的学习和总结,主要目标是通过对 kennethreitz 大神出品的 经典 Python 库 requests 的学习和理解,一方面学习阅读源代码的方法,一方面在其中体会 Pythonic Programming 的细节。

本文将从配置开发环境开始,对requests的一个单元测试进行深入解析,所有的笔记都是笔者实际操作得出。

  • 源代码使用的是最新的 requests master 分支版本。
  • 远程环境是 Ubuntu16.04, Python3.5

文章的内容包括:

准备:配置本地开发环境

TIPS: 配置默认 pip 源为国内源:

(requests) root@cld-chenjiaxi-test:/home/chenjiaxi/requests/src# vim ~/.pip/pip.conf

[global]
trusted-host=mirrors.aliyun.com
index-url=http://mirrors.aliyun.com/pypi/simple/

下载并安装 requests,配置环境,通过基础测试:

(requests) root@cld-chenjiaxi-test:/home/chenjiaxi/requests/src# make

(requests) root@cld-chenjiaxi-test:/home/chenjiaxi/requests/src# python setup.py test
running test
...
running build_ext
========================================================= test session starts =========================================================
platform linux -- Python 3.5.2, pytest-3.6.3, py-1.5.4, pluggy-0.6.0
rootdir: /home/chenjiaxi/requests/src, inifile: pytest.ini
plugins: httpbin-0.0.7, xdist-1.22.2, mock-1.10.0, forked-0.2, cov-2.5.1
gw0 [533] / gw1 [533] / gw2 [533] / gw3 [533]
scheduling tests via LoadScheduling
...
========================================= 518 passed, 13 skipped, 2 xfailed in 149.33 seconds =========================================

问题:理解一个 GET 请求

读懂下面的代码片段

>>> import requests
>>> print(requests)

>>> r = requests.get('https://api.github.com/user', auth=('chenjiaxi1993', ''))
>>> r.status_code
200
>>> r.headers['content-type']
'application/json; charset=utf-8'
>>> r.encoding
'utf-8'
>>> r.text
'{"login":"chenjiaxi1993",...'
>>> r.json()
{'updated_at': '2018-07-02T14:42:13Z', ...}

从单元测试开始

关于requests.get 的单元测试数量居然有 69个:

Jchen@iMac-3  ~/requests   master ●  git grep requests.get tests/test_requests.py | wc -l
      69

挑选其中一个单元测试进行深入阅读。

阅读一个单元测试

tests.test_requests.TestRequests#test_DIGEST_HTTP_200_OK_GET

    def test_DIGEST_HTTP_200_OK_GET(self, httpbin):

        for authtype in self.digest_auth_algo:
            # note1
            auth = HTTPDigestAuth('user', 'pass')
            url = httpbin('digest-auth', 'auth', 'user', 'pass', authtype, 'never')

            # note2
            r = requests.get(url, auth=auth)
            assert r.status_code == 200

            r = requests.get(url)
            assert r.status_code == 401
            print(r.headers['WWW-Authenticate'])

            # note3
            s = requests.session()
            s.auth = HTTPDigestAuth('user', 'pass')
            r = s.get(url)
            assert r.status_code == 200

从以上的测试函数,列出三个方面的知识或者问题

  1. auth,url 对象的初始化 -> 它们是什么?如何初始化?
  2. 两个测试用例 -> requests.get 的基本用法
  3. session的使用 -> 什么是 session

HTTPDigestAuth

第一个问题的探究,先看一下HTTPDigestAuth的类定义:

requests.auth.HTTPDigestAuth

class HTTPDigestAuth(AuthBase):
    """Attaches HTTP Digest Authentication to the given Request object."""

    def __init__(self, username, password):
        self.username = username
        self.password = password
        # Keep state in per-thread local storage
        self._thread_local = threading.local()
        
    def init_per_thread_state(self):
        # Ensure state is initialized just once per-thread
        

    def build_digest_header(self, method, url):

    def handle_redirect(self, r, **kwargs):
        """Reset num_401_calls counter on redirects."""


    def handle_401(self, r, **kwargs):
        """
        Takes the given response and tries digest-auth, if needed.

        :rtype: requests.Response
        """

参考 http://docs.python-requests.org/en/master/user/authentication/ 可以进一步理解 HTTPDigestAuth

Digest authentication
Digest access authentication is one of the agreed-upon methods a web server can use to negotiate credentials, such as username or password, with a user's web browser. This can be used to confirm the identity of a user before sending sensitive information, such as online banking transaction history. It applies a hash function to a password before sending it over the network, which is safer than basic access authentication, which sends plaintext.Technically, digest authentication is an application of MD5 cryptographic hashing with usage of nonce values to prevent replay attacks. It uses the HTTP protocol.
摘要式身份验证
提示用户输入用户名和密码(也称作凭据),并在通过网络进行传输之前使用其他数据进行哈希处理的身份验证方法。
来源于: 维基百科

到此我们知道了 HTTPDigestAuth 可以理解为使用 user + password 进行验证的一种方式。

httpbin

第二个问题是了解httpbin函数,但是在代码中httpbin是上层传入的参数,除此之外无法找到更多的信息。

可以尝试的方法:

  • 查阅 requests 文档
  • 如果找不到相关文档,通过 debugger 来了解

查阅 requests 文档

通过 http://docs.python-requests.org/en/master/search/?q=httpbin&check_keywords=yes&area=default 可以查到
httpbin 的文档: https://httpbin.org/

httpbin: A simple HTTP Request & Response Service.

尝试使用 httpbin:

>>> import requests
>>> resp = requests.post('http://httpbin.org/post',data={'name':"chenjiaxi"})
>>> resp.json()
{'args': {}, 'form': {'name': 'chenjiaxi'}, 'files': {}, 'url': 'http://httpbin.org/post', 'json': None, 'data': '', 'headers': {'Connection': 'close', 'User-Agent': 'python-requests/2.19.1', 'Accept-Encoding': 'gzip, deflate', 'Host': 'httpbin.org', 'Content-Length': '16', 'Content-Type': 'application/x-www-form-urlencoded', 'Accept': '*/*'}, 'origin': '42.186.112.21'}

httpbin 是 requests 用来封装 http 方法的一个组件。

使用 pdb 调试

借助 pdb 可以对 Python 代码进行调试,通过pdb.set_trace() 加入断点:

    def test_DIGEST_HTTP_200_OK_GET(self, httpbin):

        for authtype in self.digest_auth_algo:
            auth = HTTPDigestAuth('user', 'pass')
            url = httpbin('digest-auth', 'auth', 'user', 'pass', authtype, 'never')

            import pdb
            pdb.set_trace()

            ...

运行程序,运行到断点时会停下来,查看当前的url 变量的值:

Testing started at 17:49 ...
ssh://chenjiaxi@xxx/home/chenjiaxi/requests/opt/requests/bin/python -u /home/chenjiaxi/.pycharm_helpers/pycharm/_jb_pytest_runner.py --target tests/test_requests.py::TestRequests.test_DIGEST_HTTP_200_OK_GET
Launching py.test with arguments tests/test_requests.py::TestRequests::test_DIGEST_HTTP_200_OK_GET in /home/chenjiaxi/requests/src

============================= test session starts ==============================
platform linux -- Python 3.5.2, pytest-3.6.3, py-1.5.4, pluggy-0.6.0
rootdir: /home/chenjiaxi/requests/src, inifile: pytest.ini
plugins: xdist-1.22.2, mock-1.10.0, httpbin-0.0.7, forked-0.2, cov-2.5.1
collected 1 item                                                               

tests/test_requests.py 
>>>>>>>>>>>>>>>>>>> PDB set_trace (IO-capturing turned off) >>>>>>>>>>>>>>>>>>>>
> /home/chenjiaxi/requests/src/tests/test_requests.py(595)test_DIGEST_HTTP_200_OK_GET()
-> r = requests.get(url, auth=auth)
(Pdb) url
'http://127.0.0.1:40631/digest-auth/auth/user/pass/MD5/never'

通过 list 命令查看当前运行的代码块:

(Pdb) list
list
590                 url = httpbin('digest-auth', 'auth', 'user', 'pass', authtype, 'never')
591     
592                 import pdb
593                 pdb.set_trace()
594     
595  ->             r = requests.get(url, auth=auth)
596                 assert r.status_code == 200
597     
598                 r = requests.get(url)
599                 assert r.status_code == 401
600                 print(r.headers['WWW-Authenticate'])

通过 c 命令让程序跳过断点继续执行:

(Pdb) c
c
127.0.0.1 - - [14/Jul/2018 20:16:10] "GET /digest-auth/auth/user/pass/MD5/never HTTP/1.1" 401 0
127.0.0.1 - - [14/Jul/2018 20:16:10] "GET /digest-auth/auth/user/pass/MD5/never HTTP/1.1" 200 37
Digest nonce="6eeb7626165a8ffdd8fcac8c608a2350", realm="[email protected]", qop="auth", stale=FALSE, algorithm=MD5, opaque="04b7cffdd42f6a3575401089dab14b16"
127.0.0.1 - - [14/Jul/2018 20:16:10] "GET /digest-auth/auth/user/pass/MD5/never HTTP/1.1" 401 0
127.0.0.1 - - [14/Jul/2018 20:16:10] "GET /digest-auth/auth/user/pass/MD5/never HTTP/1.1" 401 0
127.0.0.1 - - [14/Jul/2018 20:16:10] "GET /digest-auth/auth/user/pass/MD5/never HTTP/1.1" 200 37

>>>>>>>>>>>>>>>>>>> PDB set_trace (IO-capturing turned off) >>>>>>>>>>>>>>>>>>>>
> /home/chenjiaxi/requests/src/tests/test_requests.py(593)test_DIGEST_HTTP_200_OK_GET()
-> pdb.set_trace()
(Pdb) c
c
127.0.0.1 - - [14/Jul/2018 20:16:19] "GET /digest-auth/auth/user/pass/SHA-256/never HTTP/1.1" 401 0
127.0.0.1 - - [14/Jul/2018 20:16:19] "GET /digest-auth/auth/user/pass/SHA-256/never HTTP/1.1" 200 37
Digest nonce="aa4ba69f64892c28a60434d7cc476c59a7b2c4444b0f92fa68f7eb52b3caa7f2", realm="[email protected]", qop="auth", stale=FALSE, algorithm=SHA-256, opaque="303b8f4fdd8360cbb9663099eb5c4bf6f91b9c48ce69f4a5e2d19aec9532b4a4"
127.0.0.1 - - [14/Jul/2018 20:16:19] "GET /digest-auth/auth/user/pass/SHA-256/never HTTP/1.1" 401 0
127.0.0.1 - - [14/Jul/2018 20:16:19] "GET /digest-auth/auth/user/pass/SHA-256/never HTTP/1.1" 401 0
127.0.0.1 - - [14/Jul/2018 20:16:19] "GET /digest-auth/auth/user/pass/SHA-256/never HTTP/1.1" 200 37

>>>>>>>>>>>>>>>>>>> PDB set_trace (IO-capturing turned off) >>>>>>>>>>>>>>>>>>>>
> /home/chenjiaxi/requests/src/tests/test_requests.py(595)test_DIGEST_HTTP_200_OK_GET()
-> r = requests.get(url, auth=auth)
(Pdb) c
c
127.0.0.1 - - [14/Jul/2018 20:16:22] "GET /digest-auth/auth/user/pass/SHA-512/never HTTP/1.1" 401 0
127.0.0.1 - - [14/Jul/2018 20:16:22] "GET /digest-auth/auth/user/pass/SHA-512/never HTTP/1.1" 200 37
127.0.0.1 - - [14/Jul/2018 20:16:22] "GET /digest-auth/auth/user/pass/SHA-512/never HTTP/1.1" 401 0
Digest nonce="5b703ded2588e07187c583973274a037ed48add03f41e21f594affaf2fa359de63bbb200d727e48a27687d4d104927196f2691a4fcdf65a7d453c9422750aba2", realm="[email protected]", qop="auth", stale=FALSE, algorithm=SHA-512, opaque="4cf39ce7c546b8da0125193c5c9539ee071ac882ccf8080431402a8e51f2c8a952b9034c4a5828f56a3187de570cef49d0c587ae901b260fe4310540ae477da6"
127.0.0.1 - - [14/Jul/2018 20:16:22] "GET /digest-auth/auth/user/pass/SHA-512/never HTTP/1.1" 401 0
127.0.0.1 - - [14/Jul/2018 20:16:22] "GET /digest-auth/auth/user/pass/SHA-512/never HTTP/1.1" 200 37
.                                                 [100%]

========================== 1 passed in 22.09 seconds ===========================
Process finished with exit code 0

通过 pdb,可以观察到整个单元测试的运行过程,平时都是通过 Pycharm 进行调试,直观方便,但是掌握 pdb 的调试对服务器编程来说也非常有必要。

requests.get()

结束了对auth, url 初始化过程后, 进入到两个测试用例的代码:

            r = requests.get(url, auth=auth)
            assert r.status_code == 200

            r = requests.get(url)
            assert r.status_code == 401
            print(r.headers['WWW-Authenticate'])

这个部分的问题是,requests.get 中发生了什么?

requests.api.get

def get(url, params=None, **kwargs):
    r"""Sends a GET request.

    :param url: URL for the new :class:`Request` object.
    :param params: (optional) Dictionary or bytes to be sent in the query string for the :class:`Request`.
    :param \*\*kwargs: Optional arguments that ``request`` takes.
    :return: :class:`Response ` object
    :rtype: requests.Response
    """

    kwargs.setdefault('allow_redirects', True)
    return request('get', url, params=params, **kwargs)
  1. 设置可选参数 kwargs 的默认值,允许重定向
  2. 返回一个Response类型的对象,而这个对象是request()函数调用的返回值。

问题:

requst() 函数的返回值是什么?

requst()

requests.api.request

def request(method, url, **kwargs):
    """Constructs and sends a :class:`Request `.

    :param method: method for the new :class:`Request` object.
    :param url: URL for the new :class:`Request` object.
    :param params: (optional) Dictionary or bytes to be sent in the query string for the :class:`Request`.
    :param data: (optional) Dictionary or list of tuples ``[(key, value)]`` (will be form-encoded), bytes, or file-like object to send in the body of the :class:`Request`.
    ...
    :return: :class:`Response ` object
    :rtype: requests.Response

    Usage::

      >>> import requests
      >>> req = requests.request('GET', 'http://httpbin.org/get')
      
    """

    # By using the 'with' statement we are sure the session is closed, thus we
    # avoid leaving sockets open which can trigger a ResourceWarning in some
    # cases, and look like a memory leak in others.
    with sessions.Session() as session:
        return session.request(method=method, url=url, **kwargs)

根据传入的参数完成一次 HTTP 请求,并返回 Response。
在这个函数里可以看到近乎完备的注释,包括对每一个参数的详细解释,简单的调用例子,以及通过with来管理资源对象的经典实用。

通过with来管理对象,可以通过对象上下文保证对象的生命周期管理,对有限的资源类型的对象(比如 HTTP 连接,数据库连接,文件描述符等)非常适用。

OK,看到这里我们发现 api.request() 实际上是通过 session.request() 来完成的,那么问题来了:

session 是什么?

sessions

requests.sessions.Session

class Session(SessionRedirectMixin):
    """A Requests session.

    Provides cookie persistence, connection-pooling, and configuration.

    Basic Usage::

      >>> import requests
      >>> s = requests.Session()
      >>> s.get('http://httpbin.org/get')
      

    Or as a context manager::

      >>> with requests.Session() as s:
      >>>     s.get('http://httpbin.org/get')
      
    """

查阅文档: http://docs.python-requests.org/en/master/user/advanced/

The Session object allows you to persist certain parameters across requests. It also persists cookies across all requests made from the Session instance, and will use urllib3’s connection pooling. So if you’re making several requests to the same host, the underlying TCP connection will be reused, which can result in a significant performance increase (see HTTP persistent connection).

Session(会话):用于同一端到端多次通信下 TCP 连接复用,提高性能

  • 保留参数信息 和 cookie
  • 利用 urllib3 的连接池
  • 可以为 request 对象提供默认数据

需要注意的是: 虽然 request 的调用最终都会由 session 来实现,但是 request 层级上的参数信息是不会保留的?
(这里比较难理解,需要重新再查阅资料)

Section.request()

对 Session 这个类有了基本的理解后,进一步了解Session提供的request() 方法的具体实现:

requests.sessions.Session#request

    def request(self, method, url,
            params=None, data=None, headers=None, cookies=None, files=None,
            auth=None, timeout=None, allow_redirects=True, proxies=None,
            hooks=None, stream=None, verify=None, cert=None, json=None):
        """Constructs a :class:`Request `, prepares it and sends it.
        Returns :class:`Response ` object.

        :param method: method for the new :class:`Request` object.
        :param url: URL for the new :class:`Request` object.
        :param params: (optional) Dictionary or bytes to be sent in the query
            string for the :class:`Request`.
        :param data: (optional) Dictionary, bytes, or file-like object to send
            in the body of the :class:`Request`.
        :param json: (optional) json to send in the body of the
            :class:`Request`.
        :param headers: (optional) Dictionary of HTTP Headers to send with the
            :class:`Request`.
        :param cookies: (optional) Dict or CookieJar object to send with the
            :class:`Request`.
        :param files: (optional) Dictionary of ``'filename': file-like-objects``
            for multipart encoding upload.
        :param auth: (optional) Auth tuple or callable to enable
            Basic/Digest/Custom HTTP Auth.
        :param timeout: (optional) How long to wait for the server to send
            data before giving up, as a float, or a :ref:`(connect timeout,
            read timeout) ` tuple.
        :type timeout: float or tuple
        :param allow_redirects: (optional) Set to True by default.
        :type allow_redirects: bool
        :param proxies: (optional) Dictionary mapping protocol or protocol and
            hostname to the URL of the proxy.
        :param stream: (optional) whether to immediately download the response
            content. Defaults to ``False``.
        :param verify: (optional) Either a boolean, in which case it controls whether we verify
            the server's TLS certificate, or a string, in which case it must be a path
            to a CA bundle to use. Defaults to ``True``.
        :param cert: (optional) if String, path to ssl client cert file (.pem).
            If Tuple, ('cert', 'key') pair.
        :rtype: requests.Response
        """
        # Create the Request.
        req = Request(
            method=method.upper(),
            url=url,
            headers=headers,
            files=files,
            data=data or {},
            json=json,
            params=params or {},
            auth=auth,
            cookies=cookies,
            hooks=hooks,
        )
        prep = self.prepare_request(req)

        proxies = proxies or {}

        settings = self.merge_environment_settings(
            prep.url, proxies, stream, verify, cert
        )

        # Send the request.
        send_kwargs = {
            'timeout': timeout,
            'allow_redirects': allow_redirects,
        }
        send_kwargs.update(settings)
        resp = self.send(prep, **send_kwargs)

        return resp

可以拆分为以下四个步骤:

  • 创建 Request 对象 request
  • 创建 prepare request 对象 prep
  • 发送 request send
  • send 返回值 response,返回给用户

那么问题来了:

  1. 什么是 Request
  2. 什么是 prepare_request
  3. 发送 request 的过程?
  4. resp 是什么?

requests.models.Request

class Request(RequestHooksMixin):
    """A user-created :class:`Request ` object.

    Used to prepare a :class:`PreparedRequest `, which is sent to the server.

    :param method: HTTP method to use.
    :param url: URL to send.
    :param headers: dictionary of headers to send.
    :param files: dictionary of {filename: fileobject} files to multipart upload.
    :param data: the body to attach to the request. If a dictionary is provided, form-encoding will take place.
    :param json: json for the body to attach to the request (if files or data is not specified).
    :param params: dictionary of URL parameters to append to the URL.
    :param auth: Auth handler or (user, pass) tuple.
    :param cookies: dictionary or CookieJar of cookies to attach to this request.
    :param hooks: dictionary of callback hooks, for internal usage.

    Usage::

      >>> import requests
      >>> req = requests.Request('GET', 'http://httpbin.org/get')
      >>> req.prepare()
      
    """

根据用户传入的一系列传输构建的 request,用于准备真正传送出去的 PreparedRequest

prepare_request()

    def prepare_request(self, request):
        """Constructs a :class:`PreparedRequest ` for
        transmission and returns it. The :class:`PreparedRequest` has settings
        merged from the :class:`Request ` instance and those of the
        :class:`Session`.

        :param request: :class:`Request` instance to prepare with this
            session's settings.
        :rtype: requests.PreparedRequest
        """
        ...
        p = PreparedRequest()
        p.prepare(
            method=request.method.upper(),
            url=request.url,
            files=request.files,
            data=request.data,
            json=request.json,
            headers=merge_setting(request.headers, self.headers, dict_class=CaseInsensitiveDict),
            params=merge_setting(request.params, self.params),
            auth=merge_setting(auth, self.auth),
            cookies=merged_cookies,
            hooks=merge_hooks(request.hooks, self.hooks),
        )
        return p
  • 创建 PreparedRequest 对象 p
  • 调用 p.prepare() 然后返回 p

问题来了:

  1. PreparedRequest 是什么?
  2. p.prepare() 中发生了什么?

PreparedRequest

requests.models.PreparedRequest

class PreparedRequest(RequestEncodingMixin, RequestHooksMixin):
    """The fully mutable :class:`PreparedRequest ` object,
    containing the exact bytes that will be sent to the server.

    Generated from either a :class:`Request ` object or manually.

    Usage::

      >>> import requests
      >>> req = requests.Request('GET', 'http://httpbin.org/get')
      >>> r = req.prepare()
      

      >>> s = requests.Session()
      >>> s.send(r)
      
    """

包含实际传输的字节,session 传输到 server 的实际对象。

    def prepare(self,
            method=None, url=None, headers=None, files=None, data=None,
            params=None, auth=None, cookies=None, hooks=None, json=None):
        """Prepares the entire request with the given parameters."""

        self.prepare_method(method)
        self.prepare_url(url, params)
        self.prepare_headers(headers)
        self.prepare_cookies(cookies)
        self.prepare_body(data, files, json)
        self.prepare_auth(auth, url)

        # Note that prepare_auth must be last to enable authentication schemes
        # such as OAuth to work on a fully prepared request.

        # This MUST go after prepare_auth. Authenticators could add a hook
        self.prepare_hooks(hooks)

通过一系列的步骤, 完成了整个 request 的准备。

send

准备好了要发送的对象后,调用Session.send() 发送到 server:

requests.sessions.Session#send

    def send(self, request, **kwargs):
        """Send a given PreparedRequest.

        :rtype: requests.Response
        """
        # Set defaults that the hooks can utilize to ensure they always have
        # the correct parameters to reproduce the previous request.
        ...  

        # Get the appropriate adapter to use
        adapter = self.get_adapter(url=request.url)

        # Start time (approximately) of the request
        start = preferred_clock()

        # Send the request
        r = adapter.send(request, **kwargs)

        # Total elapsed time of the request (approximately)
        elapsed = preferred_clock() - start
        r.elapsed = timedelta(seconds=elapsed)

        # Response manipulation hooks
        r = dispatch_hook('response', hooks, r, **kwargs)

        ...
        return r

在这里可以看到实际上 send 是通过 adapater 来实现的,有出现了新的问题:

  1. 为什么要用 adapter?
  2. 什么是 adapter?
  3. adapter 是怎么实现的?

transport adapters

查阅官方文档: http://docs.python-requests.org/en/latest/user/advanced/?highlight=adapter

Transport Adapters provide a mechanism to define interaction methods for an HTTP service. In particular, they allow you to apply per-service configuration.
Requests ships with a single Transport Adapter, the HTTPAdapter. This adapter provides the default Requests interaction with HTTP and HTTPS using the powerful urllib3 library.
Requests enables users to create and use their own Transport Adapters that provide specific functionality.

  • 提供定义 HTTP 服务的通讯方法的机制
  • 默认使用 HTTPAdapter,基于 urllib3
  • 用户可以自定义 Adapter,而 send 机制不需要做改变

这里对 Adapter 的使用可以说是非常好的抽象,面向接口编程的典范,可以通过这个例子进一步了解 Adapter 这个设计模式的知识。

HTTPAdapter

进一步通过默认的 HTTPAdapter 的代码,了解一个 Adapter 的定义:

requests.adapters.HTTPAdapter

class HTTPAdapter(BaseAdapter):
    """The built-in HTTP Adapter for urllib3.

    Provides a general-case interface for Requests sessions to contact HTTP and
    HTTPS urls by implementing the Transport Adapter interface. This class will
    usually be created by the :class:`Session ` class under the
    covers.

    :param pool_connections: The number of urllib3 connection pools to cache.
    :param pool_maxsize: The maximum number of connections to save in the pool.
    :param max_retries: The maximum number of retries each connection
        should attempt. Note, this applies only to failed DNS lookups, socket
        connections and connection timeouts, never to requests where data has
        made it to the server. By default, Requests does not retry failed
        connections. If you need granular control over the conditions under
        which we retry a request, import urllib3's ``Retry`` class and pass
        that instead.
    :param pool_block: Whether the connection pool should block for connections.

    Usage::

      >>> import requests
      >>> s = requests.Session()
      >>> a = requests.adapters.HTTPAdapter(max_retries=3)
      >>> s.mount('http://', a)
    """

HTTPAdapter.send()

    def send(self, request, stream=False, timeout=None, verify=True, cert=None, proxies=None):
        """Sends PreparedRequest object. Returns Response object.

        :param request: The :class:`PreparedRequest ` being sent.
        :param stream: (optional) Whether to stream the request content.
        :param timeout: (optional) How long to wait for the server to send
            data before giving up, as a float, or a :ref:`(connect timeout,
            read timeout) ` tuple.
        :type timeout: float or tuple or urllib3 Timeout object
        :param verify: (optional) Either a boolean, in which case it controls whether
            we verify the server's TLS certificate, or a string, in which case it
            must be a path to a CA bundle to use
        :param cert: (optional) Any user-provided SSL certificate to be trusted.
        :param proxies: (optional) The proxies dictionary to apply to the request.
        :rtype: requests.Response
        """

        # Connection establish
        conn = self.get_connection(request.url, proxies)

        ...
        chunked = not (request.body is None or 'Content-Length' in request.headers)

        # Timeout mechanism
        ...

        try:
            if not chunked:
                resp = conn.urlopen(
                    ...
                )

            # Send the request.
            else:
                if hasattr(conn, 'proxy_pool'):
                    conn = conn.proxy_pool

                low_conn = conn._get_conn(timeout=DEFAULT_POOL_TIMEOUT)

                try:
                    low_conn.putrequest(request.method,
                                        url,
                                        skip_accept_encoding=True)

                    for header, value in request.headers.items():
                        low_conn.putheader(header, value)

                    low_conn.endheaders()

                    for i in request.body:
                        low_conn.send(hex(len(i))[2:].encode('utf-8'))
                        low_conn.send(b'\r\n')
                        low_conn.send(i)
                        low_conn.send(b'\r\n')
                    low_conn.send(b'0\r\n\r\n')

                    # Receive the response from the server
                    try:
                        # For Python 2.7+ versions, use buffering of HTTP
                        # responses
                        r = low_conn.getresponse(buffering=True)
                    except TypeError:
                        # For compatibility with Python 2.6 versions and back
                        r = low_conn.getresponse()

                    resp = HTTPResponse.from_httplib(
                        r,
                        pool=conn,
                        connection=low_conn,
                        preload_content=False,
                        decode_content=False
                    )
                except:
                    # If we hit any problems here, clean up the connection.
                    # Then, reraise so that we can handle the actual exception.
                    low_conn.close()
                    raise
                    
        # Exception Handling
        except (ProtocolError, socket.error) as err:
            raise ConnectionError(err, request=request)

        ....

        return self.build_response(request, resp)

略过连接建立,超时机制,异常处理的部分,只看实际发送请求的部分:

  1. 从 urllib3 维护的 Connection Pool 中获取连接
  2. 添加 request 主体 putrequest
  3. 添加 request 头部 putheader
  4. 序列化request.body
  5. 发送 request
  6. 接受 response

最后通过调用 build_response 来基于 urllib3 response 构建 request.Respnse 对象返回给用户,到此为止一次 requests.get() 动作便结束。

阻塞和非阻塞

阅读官方文档时看到有关 Blocking Or Non-Blocking? 的部分,摘录如下:

With the default Transport Adapter in place, Requests does not provide any kind of non-blocking IO. The Response.content property will block until the entire response has been downloaded. If you require more granularity, the streaming features of the library (see Streaming Requests) allow you to retrieve smaller quantities of the response at a time. However, these calls will still block.
If you are concerned about the use of blocking IO, there are lots of projects out there that combine Requests with one of Python’s asynchronicity frameworks. Some excellent examples are requests-threads, grequests, and requests-futures.

requests 默认是阻塞的,当通过 requests 进行 IO 时延长的 同步 HTTP 请求时,可以使用 grequests,基于 gevent 提供 协程调用 requests。

总结

通过以上的分析,可以将一次requests.get()总结为以下的流程图:

同时通过本次的学习,也可以感受到真正的开源不止是代码,还得包括一系列的文档和社区,kennethreitz 大神同时还开源了教你如何写 Pythonic 代码的指引: The Hitchhiker’s Guide to Python!,另外 GitHub 上有开源阅读 requests 源码的笔记 read_requests 也可供参考。

你可能感兴趣的:(如何阅读源代码(以 Python requests 库为例))