AsyncIO for the Working Python Developer翻译

asyncio是从python3.4被引入的一个并发模块。它被设计成使用coroutines和futures来简化异步代码，并把代码变得和同步代码一样简明，因为他没有回调。

线程、事件循环、协程和futures

线程是一个广为人知的工具，但是asyncio使用了完全不同的结构:event loops,coroutines和futures

event loop 事件循环程序员会把一些函数注册到事件循环上，事件循环负责管理它们。当满足事件发生的时候，调用相应的协程函数。
Coroutines 协程是一个特殊的函数（使用async关键字定义的函数），其作用于python中的生成器类似，当await时，它会释放控制权并将它交还给事件循环。一个Coroutine需要使用事件循环进行调用，为此，我们需创建一个Task，它是Future类型。
Futures 代表将来执行或没有执行的任务的结果，其结果有可能是一个异常。

同步和异步

在文章 Concurrency is not parallelism, it’s better 中，罗伯·派克（Rob Pike）指出：

Breaking down tasks into concurrent subtasks only allows parallelism, it’s the scheduling of these subtasks that creates it.

asyncio就是这么做的，你可以构造你的代码，所以子任务被定义为协程，并允许你按照你的要求来安排它们，让他们同时进行。如果其他任务处于等待状态，协程则可能发生上下文切换，但如果没有其他任务处于待处理状态，将不会发生切换。

asyncio实现并发，就需要多个协程来完成任务，每当有任务阻塞的时候就await挂起，然后其他协程继续工作。创建多个协程的列表，然后将这些协程注册到事件循环中。

让我们来看一个基本的例子：

import asyncio

async def foo():
    print('Running in foo')
    await asyncio.sleep(0)
    print('Explicit context switch to foo again')

async def bar():
    print('Explicit context to bar')
    await asyncio.sleep(0)
    print('Implicit context switch back to bar')

ioloop = asyncio.get_event_loop()
tasks = [ioloop.create_task(foo()), ioloop.create_task(bar())]
wait_tasks = asyncio.wait(tasks)
ioloop.run_until_complete(wait_tasks)
ioloop.close()

$ python3 1-sync-async-execution-asyncio-await.py
Running in foo
Explicit context to bar
Explicit context switch to foo again
Implicit context switch back to bar

首先，我们声明了几个简单的协程，并使用asyncio模块里的sleep方法来模拟非阻塞工作。
协程不能直接运行，只能通过其他协程来调用，或被包装成task并在之后被调用，我们使用create_task来创建task
asyncio.ensure_future(coroutine) 和 loop.create_task(coroutine)都可以创建一个task，run_until_complete的参数是一个futrue对象。当传入一个协程，其内部会自动封装成task，task是Future的子类。isinstance(task, asyncio.Future)将会输出True。
一旦我们有了两个tasks，我们将他们合为一个(wait_tasks)，并通过wait方法来等待两个tasks完成
最后，我们使用run_until_complete方法来执行wait_task

通过使用await关键字，可以对耗时的操作进行挂起，就像生成器里的yield一样，函数让出控制权，在本例中，协程foo在执行到await asyncio.sleep(0) 时，协程会yield并且发生上下文切换，事件循环将切换执行下一个任务：bar 。相似地，协程bar执行到await sleep时，事件循环将控制权交给foo 并继续执行之前未执行完的部分，是不是和python中的生成器很像？

这次我们来模拟两个阻塞的tasks：gr1和gr2，假设他们的作用是给外部服务器发送两个请求。当他们正在执行的时候，第三个任务(gr3)也能够同时被执行，示例代码如下：

import time
import asyncio

start = time.time()


def tic():
    return 'at %1.1f seconds' % (time.time() - start)


async def gr1():
    # Busy waits for a second, but we don't want to stick around...
    print('gr1 started work: {}'.format(tic()))
    await asyncio.sleep(2)
    print('gr1 ended work: {}'.format(tic()))


async def gr2():
    # Busy waits for a second, but we don't want to stick around...
    print('gr2 started work: {}'.format(tic()))
    await asyncio.sleep(2)
    print('gr2 ended work: {}'.format(tic()))


async def gr3():
    print("Let's do some stuff while the coroutines are blocked, {}".format(tic()))
    await asyncio.sleep(1)
    print("Done!")


ioloop = asyncio.get_event_loop()
tasks = [
    ioloop.create_task(gr1()),
    ioloop.create_task(gr2()),
    ioloop.create_task(gr3())
]
ioloop.run_until_complete(asyncio.wait(tasks))
ioloop.close()

$ python3 1b-cooperatively-scheduled-asyncio-await.py
gr1 started work: at 0.0 seconds
gr2 started work: at 0.0 seconds
Lets do some stuff while the coroutines are blocked, at 0.0 seconds
Done!
gr1 ended work: at 2.0 seconds
gr2 Ended work: at 2.0 seconds

注意理解I/O循环是如何管理并调度任务并允许你的单线程代码实现并发的，当前两个任务被阻塞时，第三个任务能够的到控制权。

执行顺序

在同步的世界，我们习惯了线性思维。如果我们有一系列耗时不同的任务，它们将按照代码顺序依次执行。

然而，当使用并发时，任务何时完成与它们在程序中被调用的顺序无必然关系。

import random
from time import sleep
import asyncio


def task(pid):
    """Synchronous non-deterministic task.
    """
    sleep(random.randint(0, 2) * 0.001)
    print('Task %s done' % pid)


async def task_coro(pid):
    """Coroutine non-deterministic task
    """
    await asyncio.sleep(random.randint(0, 2) * 0.001)
    print('Task %s done' % pid)


def synchronous():
    for i in range(1, 10):
        task(i)


async def asynchronous():
    tasks = [asyncio.ensure_future(task_coro(i)) for i in range(1, 10)]
    await asyncio.wait(tasks)


print('Synchronous:')
synchronous()

ioloop = asyncio.get_event_loop()
print('Asynchronous:')
ioloop.run_until_complete(asynchronous())
ioloop.close()

$ python3 1c-determinism-sync-async-asyncio-await.py
Synchronous:
Task 1 done
Task 2 done
Task 3 done
Task 4 done
Task 5 done
Task 6 done
Task 7 done
Task 8 done
Task 9 done
Asynchronous:
Task 2 done
Task 5 done
Task 6 done
Task 8 done
Task 9 done
Task 1 done
Task 4 done
Task 3 done
Task 7 done

当然，输出结果会有所不同，因为每个任务都会随机sleep一段时间，但是要注意结果顺序和同步代码是完全不同的，即使我们使用range函数以相同的顺序构建任务列表。

注意我们是如何将我们简单的同步代码改为并发版本的。asyncio模块将任务变为非阻塞形式并不是什么魔法。在撰写asyncio并将它作为独立标准库的那个时期，其他大多数模块并不支持异步，你可以使用concurrent.futures模块在线程或进程中封装阻塞任务，并返回一个asyncio可以使用的模块：Future 。这个使用线程的例子可以在 Github 仓库 中找到。

这可能是当下使用asyncio的主要不足，但是通过一些库能解决这一问题。

通过HTTP服务从网络获取数据就是一个典型的阻塞任务，我使用 aiohttp 库来进行非阻塞HTTP请求，从Github的公开事件API中检索数据。

import time
import urllib.request
import asyncio
import aiohttp

URL = 'https://api.github.com/events'
MAX_CLIENTS = 3


def fetch_sync(pid):
    print('Fetch sync process {} started'.format(pid))
    start = time.time()
    response = urllib.request.urlopen(URL)
    datetime = response.getheader('Date')

    print('Process {}: {}, took: {:.2f} seconds'.format(
        pid, datetime, time.time() - start))

    return datetime


async def fetch_async(pid):
    print('Fetch async process {} started'.format(pid))
    start = time.time()
    response = await aiohttp.request('GET', URL)
    datetime = response.headers.get('Date')

    print('Process {}: {}, took: {:.2f} seconds'.format(
        pid, datetime, time.time() - start))

    response.close()
    return datetime


def synchronous():
    start = time.time()
    for i in range(1, MAX_CLIENTS + 1):
        fetch_sync(i)
    print("Process took: {:.2f} seconds".format(time.time() - start))


async def asynchronous():
    start = time.time()
    tasks = [asyncio.ensure_future(
        fetch_async(i)) for i in range(1, MAX_CLIENTS + 1)]
    await asyncio.wait(tasks)
    print("Process took: {:.2f} seconds".format(time.time() - start))


print('Synchronous:')
synchronous()

print('Asynchronous:')
ioloop = asyncio.get_event_loop()
ioloop.run_until_complete(asynchronous())
ioloop.close()

$ python3 1d-async-fetch-from-server-asyncio-await.py
Synchronous:
Fetch sync process 1 started
Process 1: Wed, 17 Feb 2016 13:10:11 GMT, took: 0.54 seconds
Fetch sync process 2 started
Process 2: Wed, 17 Feb 2016 13:10:11 GMT, took: 0.50 seconds
Fetch sync process 3 started
Process 3: Wed, 17 Feb 2016 13:10:12 GMT, took: 0.48 seconds
Process took: 1.54 seconds
Asynchronous:
Fetch async process 1 started
Fetch async process 2 started
Fetch async process 3 started
Process 3: Wed, 17 Feb 2016 13:10:12 GMT, took: 0.50 seconds
Process 2: Wed, 17 Feb 2016 13:10:12 GMT, took: 0.52 seconds
Process 1: Wed, 17 Feb 2016 13:10:12 GMT, took: 0.54 seconds
Process took: 0.54 seconds

首先，注意时间的差异，通过使用异步调用，我们同时向服务发出所有请求。正如之前讨论的，每个请求产生控制流到下一个，并在完成时返回。处理所有请求所花费的总时间和最慢的请求所花费得时间是相同的！仅花费0.54秒。非常酷，是吧？

其次，异步代码与它的同步版本十分相似，它本质上是一样的！主要区别在于执行GET请求和创建任务并等待它们完成的异步库的实现。

创建协程

到目前为止，我们一直使用一种方法来创建协程：创建一组任务并等待所有这些任务完成。

但是我们可以按照不同的方式安排协程运行或检索结果。设想一个场景，我们需要尽快处理HTTP GET请求的结果，这个过程实际上和我们前面的例子非常相似：

import time
import random
import asyncio
import aiohttp

URL = 'https://api.github.com/events'
MAX_CLIENTS = 3


async def fetch_async(pid):
    start = time.time()
    sleepy_time = random.randint(2, 5)
    print('Fetch async process {} started, sleeping for {} seconds'.format(
        pid, sleepy_time))

    await asyncio.sleep(sleepy_time)

    response = await aiohttp.request('GET', URL)
    datetime = response.headers.get('Date')

    response.close()
    return 'Process {}: {}, took: {:.2f} seconds'.format(
        pid, datetime, time.time() - start)


async def asynchronous():
    start = time.time()
    futures = [fetch_async(i) for i in range(1, MAX_CLIENTS + 1)]
    for i, future in enumerate(asyncio.as_completed(futures)):
        result = await future
        print('{} {}'.format(">>" * (i + 1), result))

    print("Process took: {:.2f} seconds".format(time.time() - start))


ioloop = asyncio.get_event_loop()
ioloop.run_until_complete(asynchronous())
ioloop.close()

$ python3 2a-async-fetch-from-server-as-completed-asyncio-await.py
Fetch async process 1 started, sleeping for 4 seconds
Fetch async process 3 started, sleeping for 5 seconds
Fetch async process 2 started, sleeping for 3 seconds
>> Process 2: Wed, 17 Feb 2016 13:55:19 GMT, took: 3.53 seconds
>>>> Process 1: Wed, 17 Feb 2016 13:55:20 GMT, took: 4.49 seconds
>>>>>> Process 3: Wed, 17 Feb 2016 13:55:21 GMT, took: 5.48 seconds
Process took: 5.48 seconds

这种情况下的代码只是略有不同，我们正在把协程收集到一个列表中，每个程序都准备好被调度和执行。 as_complete函数返回一个迭代器，当它们进来时会产生一个完整的Future。顺便一提，as_completed和wait都是来自concurrent.futures.的两个函数。

让我们来看看另一个例子，想象一下你正试图获得你的IP地址。你可以使用类似的服务来检索它，但不确定它们是否可以在运行时访问。你不想逐一检查每一个。你会发送并发请求到每个服务，并选择第一个回应，对不对？没错！

那么，原来我们的老朋友wait 有一个参数来做到这一点：return_when。到目前为止，我们忽视了wait的返回值，因为我们只是把任务并行化。但是现在我们要从协程中检索结果，所以我们可以使用两组Futures： done和pending。

from collections import namedtuple
import time
import asyncio
from concurrent.futures import FIRST_COMPLETED
import aiohttp

Service = namedtuple('Service', ('name', 'url', 'ip_attr'))

SERVICES = (
    Service('ipify', 'https://api.ipify.org?format=json', 'ip'),
    Service('ip-api', 'http://ip-api.com/json', 'query')
)


async def fetch_ip(service):
    start = time.time()
    print('Fetching IP from {}'.format(service.name))

    response = await aiohttp.request('GET', service.url)
    json_response = await response.json()
    ip = json_response[service.ip_attr]

    response.close()
    return '{} finished with result: {}, took: {:.2f} seconds'.format(
        service.name, ip, time.time() - start)


async def asynchronous():
    futures = [fetch_ip(service) for service in SERVICES]
    done, pending = await asyncio.wait(
        futures, return_when=FIRST_COMPLETED)

    print(done.pop().result())


ioloop = asyncio.get_event_loop()
ioloop.run_until_complete(asynchronous())
ioloop.close()

$ python3 2c-fetch-first-ip-address-response-await.py
Fetching IP from ip-api
Fetching IP from ipify
ip-api finished with result: 82.34.76.170, took: 0.09 seconds
Unclosed client session
client_session: 
Task was destroyed but it is pending!
task:  wait_for=>

等一下，发生了什么？代码执行结果没有问题，但那些警告信息是什么？

我们安排了两个任务，但是一旦第一个任务完成，关闭第二个任务。 asyncio认为这是一个错误，并打印出一个警告。我们应该让事件循环知道不打扰pending future。那么该怎么做？

Future的状态

(As in states that a Future can be in, not states that are in the future… you know what I mean)

future有以下几种状态:

Pending
Running
Done
Cancelled

创建future的时候，task为pending，事件循环调用执行的时候当然就是running，调用完毕自然就是done，如果需要停止事件循环，就需要先把task取消。可以使用asyncio.Task获取事件循环的task

当future完成时，result方法将返回future的结果，如果它挂起或取消，则引发InvalidStateError，如果取消它将引发CancelledError，最后如果协程引发异常，则会再次引发异常与调用异常相同的行为。

你也可以调用done，cancel或者running若Future处于这一状态，注意，done意味着返回结果或者抛出异常。你可以通过调用cancel方法来明确地取消Future，这听起来就像我们在前面的例子中需要修复警告：

from collections import namedtuple
import time
import asyncio
from concurrent.futures import FIRST_COMPLETED
import aiohttp

Service = namedtuple('Service', ('name', 'url', 'ip_attr'))

SERVICES = (
    Service('ipify', 'https://api.ipify.org?format=json', 'ip'),
    Service('ip-api', 'http://ip-api.com/json', 'query')
)


async def fetch_ip(service):
    start = time.time()
    print('Fetching IP from {}'.format(service.name))

    response = await aiohttp.request('GET', service.url)
    json_response = await response.json()
    ip = json_response[service.ip_attr]

    response.close()
    return '{} finished with result: {}, took: {:.2f} seconds'.format(
        service.name, ip, time.time() - start)


async def asynchronous():
    futures = [fetch_ip(service) for service in SERVICES]
    done, pending = await asyncio.wait(
        futures, return_when=FIRST_COMPLETED)

    print(done.pop().result())

    for future in pending:
        future.cancel()


ioloop = asyncio.get_event_loop()
ioloop.run_until_complete(asynchronous())
ioloop.close()

$ python3 2c-fetch-first-ip-address-response-no-warning-await.py
Fetching IP from ipify
Fetching IP from ip-api
ip-api finished with result: 82.34.76.170, took: 0.08 seconds

这次的输出很完美。

若你想添加额外的逻辑，Futures还允许附加回调，当他们到达完成状态。你甚至可以手动设置Future的结果或异常，通常用于单元测试目的。

原文链接：AsyncIO for the Working Python Developer

后记：

在下才疏学浅，第一次尝试翻译，内容偏差和蹩脚的措辞还请各位谅解。理解不当之处还请各位前辈替在下指出，不胜感激。