Python中的异步asyncio

本文主要参考这个

对于以下代码,如果在jupyter notebook/lab中执行,需要将

asyncio.run(main())

替换为:

await main()

因为,jupyter notebook本身就包含了异步的event loop,不能再加一个了,否则会报错:

RuntimeError: asyncio.run() cannot be called from a running event loop

import asyncio
import time

async def count():
    print("one")
    await asyncio.sleep(1)
    print("two")

async def main():
    await asyncio.gather(count(),count(),count())
if __name__ == "__main__":
    s = time.perf_counter()
#     asyncio.run(main())
    await main()
    elapsed = time.perf_counter() - s
    print(f'excuted in {elapsed:0.2f} seconds')
one
one
one
two
two
two
excuted in 1.00 seconds

异步的规则

  • async def 定义了一个协程(coroutine)或者异步生成器。async with async for也是合法的

  • await关键词将控制权转给event loop。例如下面的例子里,遇到await f() 则会在等待f(x)的结果的同时,暂停g(x)的执行,然后执行别的程序。

async def g():
    # Pause here and come back to g() when f() is ready
    r = await f()
    return r
  • 通过async def定义的函数是一个协程,可以使用await, return, yield 也可以都不使用。

    • 当使用了await 和/或 return时,则创建了一个协程函数,要调用一个协程函数,需要使用await
    • 也可以使用yield创建一个异步生成器(最近的python版本中才可以),然后可以使用async for来迭代这个生成器。这个用法比较少见
    • 不能使用yield from 会报SyntaxError
  • 在协程之外使用await会报错

import random
import asyncio

# ANSI colors
c = (
    "\033[0m",   # End of color
    "\033[36m",  # Cyan
    "\033[91m",  # Red
    "\033[35m",  # Magenta
)

async def makerandom(idx: int, threshold: int = 6) -> int:
    print(c[idx + 1] + f"Initiated makerandom({idx}).")
    i = random.randint(0, 10)
    while i <= threshold:
        print(c[idx + 1] + f"makerandom({idx}) == {i} too low; retrying.")
        await asyncio.sleep(idx+1)
        i = random.randint(0, 10)
    print(c[idx + 1] + f"---> Finished: makerandom({idx}) == {i}" + c[0])
    return i

async def main():
    res  = await asyncio.gather(*(makerandom(i,10 - i - 1) for i in range(3)))
    return res

random.seed(444)
r1,r2,r3 = await main()
print(f"r1: {r1}, r2: {r2}, r3: {r3}")
�[36mInitiated makerandom(0).
�[36mmakerandom(0) == 4 too low; retrying.
�[91mInitiated makerandom(1).
�[91mmakerandom(1) == 4 too low; retrying.
�[35mInitiated makerandom(2).
�[35mmakerandom(2) == 0 too low; retrying.
�[36mmakerandom(0) == 4 too low; retrying.
�[91mmakerandom(1) == 7 too low; retrying.
�[36mmakerandom(0) == 4 too low; retrying.
�[35mmakerandom(2) == 4 too low; retrying.
�[36mmakerandom(0) == 8 too low; retrying.
�[91m---> Finished: makerandom(1) == 10�[0m
�[36mmakerandom(0) == 7 too low; retrying.
�[36mmakerandom(0) == 8 too low; retrying.
�[35mmakerandom(2) == 4 too low; retrying.
�[36mmakerandom(0) == 7 too low; retrying.
�[36mmakerandom(0) == 1 too low; retrying.
�[36mmakerandom(0) == 6 too low; retrying.
�[35m---> Finished: makerandom(2) == 9�[0m
�[36mmakerandom(0) == 3 too low; retrying.
�[36mmakerandom(0) == 9 too low; retrying.
�[36mmakerandom(0) == 7 too low; retrying.
�[36m---> Finished: makerandom(0) == 10�[0m
r1: 10, r2: 10, r3: 9

设计模式

链式协程

协程的一个特点是可以链接起来,因为协程是awaitable的,另一个协程可以await它。这就使得程序能够分解为小的、可循环的协程。看例子:在这个例子中,一个任务是由一系列的协程完成的,每个协程都对结果有所贡献。

import asyncio
import random
import time

async def part1(n: int) -> str:
    i = random.randint(0, 10)
    print(f"part1({n}) sleeping for {i} seconds.")
    await asyncio.sleep(i)
    result = f"result{n}-1"
    print(f"Returning part1({n}) == {result}.")
    return result

async def part2(n: int, arg: str) -> str:
    i = random.randint(0, 10)
    print(f"part2{n, arg} sleeping for {i} seconds.")
    await asyncio.sleep(i)
    result = f"result{n}-2 derived from {arg}"
    print(f"Returning part2{n, arg} == {result}.")
    return result

async def chain(n: int) -> None:
    start = time.perf_counter()
    p1 = await part1(n)
    p2 = await part2(n, p1)
    end = time.perf_counter() - start
    print(f"-->Chained result{n} => {p2} (took {end:0.2f} seconds).")

async def main(*args):
    await asyncio.gather(*(chain(n) for n in args))

if __name__ == "__main__":
    random.seed(444)
    args = [1, 2, 3]
    start = time.perf_counter()
    await (main(*args))
    end = time.perf_counter() - start
    print(f"Program finished in {end:0.2f} seconds.")
part1(1) sleeping for 4 seconds.
part1(2) sleeping for 4 seconds.
part1(3) sleeping for 0 seconds.
Returning part1(3) == result3-1.
part2(3, 'result3-1') sleeping for 4 seconds.
Returning part1(1) == result1-1.
part2(1, 'result1-1') sleeping for 7 seconds.
Returning part1(2) == result2-1.
part2(2, 'result2-1') sleeping for 4 seconds.
Returning part2(3, 'result3-1') == result3-2 derived from result3-1.
-->Chained result3 => result3-2 derived from result3-1 (took 4.00 seconds).
Returning part2(2, 'result2-1') == result2-2 derived from result2-1.
-->Chained result2 => result2-2 derived from result2-1 (took 8.00 seconds).
Returning part2(1, 'result1-1') == result1-2 derived from result1-1.
-->Chained result1 => result1-2 derived from result1-1 (took 11.00 seconds).
Program finished in 11.00 seconds.

使用队列

asyncio提供了queue类,与queue模块的相关类很像。假设有很多生产者都向队列中生产物品,他们相互不关联。每个生产者随机地向队列里生产的物品,且不会提前通知。一群消费者从队列里贪婪地,不等待信号就从队列里取东西。
在这个问题里,消费者和生产者没有关联。消费者不知道有多少生产者,也不知道队列里有多少东西。
每个物品的生产和消耗都需要花费时间。这种场景下就需要队列。

注意:由于queue.Queue()的线程安全性,队列一般用在多线程编程中。一般而言,在异步编程中,你不需要关系线程安全性,除非你将多线程和异步结合在一起。

程序如下.
关于q.join的讨论参见StackOverflow

import asyncio
import itertools as it
import os
import random
import time

async def makeitem(size: int = 5) -> str:
    return os.urandom(size).hex()

async def randsleep(a: int = 1, b: int = 5, caller=None) -> None:
    i = random.randint(0, 10)
    if caller:
        print(f"{caller} sleeping for {i} seconds.")
    await asyncio.sleep(i)

async def produce(name: int, q: asyncio.Queue) -> None:
    n = random.randint(0, 10)
    for _ in it.repeat(None, n):  # Synchronous loop for each single producer
        await randsleep(caller=f"Producer {name}")
        i = await makeitem()
        t = time.perf_counter()
        await q.put((i, t))
        print(f"Producer {name} added <{i}> to queue.")

async def consume(name: int, q: asyncio.Queue) -> None:
    while True:
        await randsleep(caller=f"Consumer {name}")
        i, t = await q.get()
        now = time.perf_counter()
        print(f"Consumer {name} got element <{i}>"
              f" in {now-t:0.5f} seconds.")
        q.task_done()

async def main(nprod: int, ncon: int):
    q = asyncio.Queue()
    producers = [asyncio.create_task(produce(n, q)) for n in range(nprod)]
    consumers = [asyncio.create_task(consume(n, q)) for n in range(ncon)]
    await asyncio.gather(*producers)
    await q.join()  # Implicitly awaits consumers, too
    for c in consumers:
        c.cancel()

if __name__ == "__main__":

    random.seed(444)
    ns = {'nprod':5,'ncon':10}
    start = time.perf_counter()
#     asyncio.run(main(**ns.__dict__))
    await main(**ns)
    elapsed = time.perf_counter() - start
    print(f"Program completed in {elapsed:0.5f} seconds.")
Producer 0 sleeping for 4 seconds.
Producer 2 sleeping for 7 seconds.
Producer 3 sleeping for 4 seconds.
Producer 4 sleeping for 10 seconds.
Consumer 0 sleeping for 7 seconds.
Consumer 1 sleeping for 8 seconds.
Consumer 2 sleeping for 4 seconds.
Consumer 3 sleeping for 7 seconds.
Consumer 4 sleeping for 1 seconds.
Consumer 5 sleeping for 6 seconds.
Consumer 6 sleeping for 9 seconds.
Consumer 7 sleeping for 3 seconds.
Consumer 8 sleeping for 9 seconds.
Consumer 9 sleeping for 7 seconds.
Producer 3 added  to queue.
Producer 3 sleeping for 10 seconds.
Producer 0 added  to queue.
Producer 0 sleeping for 0 seconds.
Consumer 4 got element  in 0.00209 seconds.
Consumer 4 sleeping for 1 seconds.
Consumer 7 got element  in 0.00034 seconds.
Consumer 7 sleeping for 0 seconds.
Producer 0 added <18a095afe9> to queue.
Producer 0 sleeping for 1 seconds.
Consumer 7 got element <18a095afe9> in 0.00081 seconds.
Consumer 7 sleeping for 9 seconds.
Producer 0 added <974eeee56f> to queue.
Producer 0 sleeping for 0 seconds.
Consumer 2 got element <974eeee56f> in 0.00042 seconds.
Consumer 2 sleeping for 5 seconds.
Producer 0 added  to queue.
Consumer 4 got element  in 0.00012 seconds.
Consumer 4 sleeping for 10 seconds.
Producer 2 added <562a1af9c6> to queue.
Producer 2 sleeping for 5 seconds.
Consumer 3 got element <562a1af9c6> in 0.00023 seconds.
Consumer 3 sleeping for 8 seconds.
Producer 4 added  to queue.
Producer 4 sleeping for 2 seconds.
Consumer 9 got element  in 0.00076 seconds.
Consumer 9 sleeping for 5 seconds.
Producer 4 added <34f8bf21e5> to queue.
Producer 4 sleeping for 5 seconds.
Producer 2 added  to queue.
Producer 2 sleeping for 0 seconds.
Consumer 0 got element <34f8bf21e5> in 0.00043 seconds.
Consumer 0 sleeping for 3 seconds.
Consumer 5 got element  in 0.00024 seconds.
Consumer 5 sleeping for 1 seconds.
Producer 2 added <6311405957> to queue.
Producer 2 sleeping for 5 seconds.
Consumer 1 got element <6311405957> in 0.00013 seconds.
Consumer 1 sleeping for 6 seconds.
Producer 3 added <1d41a6d6ac> to queue.
Producer 3 sleeping for 10 seconds.
Consumer 8 got element <1d41a6d6ac> in 0.00024 seconds.
Consumer 8 sleeping for 5 seconds.
Producer 4 added  to queue.
Producer 4 sleeping for 6 seconds.
Producer 2 added  to queue.
Consumer 6 got element  in 0.00025 seconds.
Consumer 6 sleeping for 4 seconds.
Consumer 2 got element  in 0.00078 seconds.
Consumer 2 sleeping for 10 seconds.
Producer 4 added  to queue.
Producer 4 sleeping for 8 seconds.
Consumer 5 got element  in 0.00048 seconds.
Consumer 5 sleeping for 10 seconds.
Producer 3 added  to queue.
Producer 3 sleeping for 10 seconds.
Consumer 7 got element  in 0.00040 seconds.
Consumer 7 sleeping for 7 seconds.
Producer 4 added <50ecadf350> to queue.
Producer 4 sleeping for 6 seconds.
Consumer 3 got element <50ecadf350> in 0.00046 seconds.
Consumer 3 sleeping for 8 seconds.
Producer 3 added  to queue.
Consumer 0 got element  in 0.00019 seconds.
Consumer 0 sleeping for 3 seconds.
Producer 4 added <61abd6c15b> to queue.
Producer 4 sleeping for 6 seconds.
Consumer 9 got element <61abd6c15b> in 0.00050 seconds.
Consumer 9 sleeping for 9 seconds.
Producer 4 added <5773f1cc4f> to queue.
Producer 4 sleeping for 3 seconds.
Consumer 4 got element <5773f1cc4f> in 0.00083 seconds.
Consumer 4 sleeping for 2 seconds.
Producer 4 added <49b57af2c2> to queue.
Consumer 9 got element <49b57af2c2> in 0.00026 seconds.
Consumer 9 sleeping for 10 seconds.
Program completed in 45.99805 seconds.

异步的本质

异步的本质就是加强版的生成器。await的功能和yield类似。都是在程序执行时跳出,但是同时保留上下文,等待返回时继续执行。await更像是yield from。不过yield from x()也只是 for i in x():yield i的语法糖。

另外一个生成器的特征时,可以通过.send()方法向生成器中传递数据,也就允许生成器相互之间非阻塞式的调用。(协程也就可以了)不过这一点一般不用担心。如果要深究,可以看PEP 342.

event loop

event loop可以看作是一个while True的循环,不停的查看协程状态,如果协程等待的东西完成了就唤醒它。实现起来就是:
asyncio.run(main())
这个函数在Python3.7中引入。当所有协程完成后,自动关闭。
在之前的版本中会有这样的代码:

loop = asyncio.get_event_loop()
try:
    loop.run_until_complete(main())
finally:
    loop.close()

现在除非你要精细操作这个loop,否则是不需要的了。

关于event loop有几点需要注意

  1. 协程要绑定到event loop才有用,一般都是放到一个函数中,然后这个函数放到asyncio.run()
  2. 一般在单核上跑单线程的event loop已经够了,如果你想要多核运行,参照这个
  3. 你可以自己实现event loop

异步爬虫

使用异步模块aiohttp来进行

什么时候用异步

异步和多进程可以共存,一起使用。参照这里, 这里

异步和多线程,推荐使用异步。多线程调试困难,并且难以大规模使用。因为线程是系统资源

当你的任务有多个IO限制的子任务时,你可以考虑异步。
限制使用异步的主要原因是有些库不支持异步,因为你要找到能够awaitable的函数。
这个列表给出了支持异步的库。

你可能感兴趣的:(Python中的异步asyncio)