情景
在开发中,我们处理耗时任务时,通常考虑使用异步处理
实现方式
一般我们实现异步的方式有三种,分别如下:
- 多进程
- 多线程
- 异步IO/协程
三种方式异同点
1,多进程能利用多核 CPU,但内存开销大
2,多线程在操作系统层面也可以使用多核CPU,但是由于锁的问题写法比较繁琐且不易理解,虽然加了个GIL(Global Interpreter Lock),但是加了后又只能同时执行一个任务,也就是只能使用一个CPU。内存开销比进程小
3,py3新加的 asyncio 是用来做异步 io 的,3.5前使用@asyncio.coroutine注解加yield&from搭配来使用,3.5后使用关键字async&await使用。asyncio/协程相当于在python 中实现一个内核调度系统,且在一个线程执行(主线程),协程在进行 io 阻塞时,安排别的协程(任务)继续运行;内存开销更小
实战编码实现(基于py 3.8.x)
首先我们预先编写三个耗时任务,分别是做鱼,做汤,煮饭,如果只有一个锅的话,我们就只能等待上一个做好才能继续下一个,也就是同步执行。现在我们需要买三个锅,就是异步执行
import os
from time import sleep
class SyncTest:
# 做鱼
def do_fish(self):
print("do_fish start")
# 模拟任务耗时
sleep(3)
print('do_fish finished',
datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")) # 洗完的时候, 洗衣机会响一下, 告诉我们洗完了
# 煲汤
def do_soap(self):
print("do_soap start")
# 模拟io耗时
sleep(2)
print('do_soap finished', datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"))
# 煮米饭
def do_rice(self):
print("do_rice start")
# 模拟io耗时
sleep(5)
print('do_rice finished', datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"))
def start(self):
print("start do", datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"))
self.do_fish()
self.do_soap()
self.do_rice()
if __name__ == "__main__":
print("main start...")
# 同步
SyncTest().start()
print("main end...")
先看下同步结果:
start do 2020-01-03 11:05:24
do_fish start 16440 MainThread
do_fish finished 2020-01-03 11:05:27
do_soap start 16440 MainThread
do_soap finished 2020-01-03 11:05:29
do_rice start 16440 MainThread
do_rice finished 2020-01-03 11:05:34
main end...
可以看到同步任务就是需要34-24 =10s,也就是3+2+5,且都是在同一个进程和线程执行的;那么我们现在又多买了两口锅,三口锅一起做;我们采用异步任务来实验,实现如下
-
多进程做饭
在Windows中使用process模块的注意事项
在Windows操作系统中由于没有fork(Linux操作系统中创建进程的机制),在创建 子进程的是时候会自动import启动它的这个文件,而在import的时候又执行了整个文件。因此如果process()直接写在文件中就会无限递归创建子进程报错。所以必须把创建子进程的部分使用if name =='main'判断保护起来,import的时候,就不会递归运行了
import multiprocessing
if __name__ == "__main__":
print("main start...")
# 多进程
for i in range(1, 4):
if i == 1:
print("process start do", datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"))
multiprocessing.Process(target=do_fish, args=(i,)).start()
elif i == 2:
multiprocessing.Process(target=do_soap).start()
elif i == 3:
multiprocessing.Process(target=do_rice).start()
print("main end...")
看下执行结果:
main start...
process start do 2020-01-03 11:11:12
main end...
process do_soap start 25692 MainThread
process do_rice start 216 MainThread
process do_fish start 1428 MainThread
process do_soap finished 2020-01-03 11:11:14
process do_fish finished 2020-01-03 11:11:15
process do_rice finished 2020-01-03 11:11:17
可以看到,总耗时17-12=5s,不同进程相同线程,异步执行任务,主线程不会等待子线程执行完毕
-
多线程做饭
import threading
class ThreadTest:
# 做鱼
def do_fish(self):
print("thread do_fish start", os.getpid(), threading.current_thread().getName())
# 模拟任务耗时
sleep(3)
print('thread do_fish finished', datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"))
# 煲汤
def do_soap(self):
print("thread do_soap start", os.getpid(), threading.current_thread().getName())
# 模拟io耗时
sleep(2)
print('thread do_soap finished', datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"))
# 煮米饭
def do_rice(self):
print("thread do_rice start\n", os.getpid(), threading.current_thread().getName())
# 模拟io耗时
sleep(5)
print('thread do_rice finished', datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"))
def start(self):
print("thread start do", datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"))
for i in range(1, 4):
if i == 1:
threading.Thread(target=self.do_fish).start()
elif i == 2:
threading.Thread(target=self.do_soap).start()
elif i == 3:
threading.Thread(target=self.do_rice).start()
if __name__ == "__main__":
print("main start...")
ThreadTest().start()
print("main end...")
看下执行结果:
main start...
thread start do 2020-01-03 11:15:19
thread do_fish start 20144 Thread-1
thread do_soap start 20144 Thread-2
thread do_rice start 20144 Thread-3
main end...
thread do_soap finished 2020-01-03 11:15:21
thread do_fish finished 2020-01-03 11:15:22
thread do_rice finished 2020-01-03 11:15:24
总耗时24-19=5s,同进程不同线程,异步执行任务,主线程不会等待子线程执行完毕
-
异步io/协程做饭
import asyncio
class AsyncIoTest:
async def do_fish(self):
print("asyncio do_fish start", os.getpid(), threading.current_thread().getName())
# 模拟io耗时
await asyncio.sleep(3)
print('asyncio do_fish finished',
datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")) # 洗完的时候, 洗衣机会响一下, 告诉我们洗完了
async def do_soap(self):
print("asyncio do_soap start", os.getpid(), threading.current_thread().getName())
# 模拟io耗时
await asyncio.sleep(2)
print('asyncio do_soap finished', datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"))
# raise Exception("aaaa")
async def do_rice(self):
print("asyncio do_rice start", os.getpid(), threading.current_thread().getName())
# 模拟io耗时
await asyncio.sleep(5)
print('asyncio do_rice finished', datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"))
def start(self):
print("asyncio start do", datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"))
loop = asyncio.get_event_loop()
loop.run_until_complete(asyncio.wait([self.do_fish(), self.do_soap(), self.do_rice()]))
loop.close()
if __name__ == "__main__":
print("main start...")
# 协程
AsyncIoTest().start()
print("main end...")
查看结果
main start...
asyncio start do 2020-01-03 11:17:29
asyncio do_fish start 16304 MainThread
asyncio do_soap start 16304 MainThread
asyncio do_rice start 16304 MainThread
asyncio do_soap finished 2020-01-03 11:17:31
asyncio do_fish finished 2020-01-03 11:17:32
asyncio do_rice finished 2020-01-03 11:17:34
main end...
总耗时34-29=5s,同进程同线程,异步执行任务,主线程会等待协程执行完毕
通过上述实现我们对异步方式有了初步的了解,那么现在我们需要执行批量任务,而且需要在其中一个任务失败后终止其他的任务,不管是否在运行,我们考虑使用异步,先看看使用线程如何实现
-
线程批量任务
class ThreadTest:
def do_fish(self):
print("thread do_fish start", os.getpid(), threading.current_thread().getName())
# 模拟任务耗时
sleep(3)
print('thread do_fish finished', datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"))
# 煲汤
def do_soap(self):
print("thread do_soap start", os.getpid(), threading.current_thread().getName())
# 模拟io耗时
sleep(2)
print('thread do_soap finished', datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"))
raise Exception("do_soap raise an exception.")
# 煮米饭
def do_rice(self):
print("thread do_rice start\n", os.getpid(), threading.current_thread().getName())
# 模拟io耗时
sleep(5)
print('thread do_rice finished', datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"))
def start(self):
print("thread start do", datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"))
for i in range(1, 4):
if i == 1:
threading.Thread(target=self.do_fish).start()
elif i == 2:
threading.Thread(target=self.do_soap).start()
elif i == 3:
threading.Thread(target=self.do_rice).start()
还是刚才的代码,现在在do_soap中模拟2s耗时后抛出异常,这时候其实do_fish和do_rice都还没执行完毕;我们期望应该是后面两个任务的finish不会打印,让我们看看日志
main start...
thread start do 2020-01-03 13:09:30
thread do_fish start 22720 Thread-1
thread do_soap start 22720 Thread-2
thread do_rice start 22720 Thread-3
main end...
thread do_soap finished 2020-01-03 13:09:32
Exception in thread Thread-2:
Traceback (most recent call last):
File "C:\Users\e-song.li\AppData\Local\Programs\Python\Python38\lib\threading.py", line 932, in _bootstrap_inner
self.run()
File "C:\Users\e-song.li\AppData\Local\Programs\Python\Python38\lib\threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "C:/Users/e-song.li/PycharmProjects/untitled/bk_test.py", line 79, in do_soap
raise Exception("do_soap raise an exception.")
Exception: do_soap raise an exception.
thread do_fish finished 2020-01-03 13:09:33
thread do_rice finished 2020-01-03 13:09:35
Process finished with exit code 0
虽然我们抛出了异常,但是并未影响到后面的两个任务,do_fish和do_rice还是继续运直到各自任务结束,这并不是我们希望的,我们对代码进行修改,将线程设置成守护线程,daemon=True
class ThreadTest:
def do_fish(self):
print("thread do_fish start", os.getpid(), threading.current_thread().getName())
# 模拟任务耗时
sleep(3)
print('thread do_fish finished', datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"))
# 煲汤
def do_soap(self):
print("thread do_soap start", os.getpid(), threading.current_thread().getName())
# 模拟io耗时
sleep(2)
print('thread do_soap finished', datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"))
raise Exception("do_soap raise an exception.")
# 煮米饭
def do_rice(self):
print("thread do_rice start\n", os.getpid(), threading.current_thread().getName())
# 模拟io耗时
sleep(5)
print('thread do_rice finished', datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"))
def start(self):
print("thread start do", datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"))
for i in range(1, 4):
if i == 1:
threading.Thread(target=self.do_fish, daemon=True).start()
elif i == 2:
threading.Thread(target=self.do_soap, daemon=True).start()
elif i == 3:
threading.Thread(target=self.do_rice, daemon=True).start()
运行结果如下
main start...
thread start do 2020-01-03 13:16:46
thread do_fish start 20384 Thread-1
thread do_soap start 20384 Thread-2
thread do_rice start
main end...
仔细看下好像不太对,thread do_soap finished,这行好像并未打印,这条执行后才抛出异常呢,怎么都没有了;原来是守护线程执行优先级低于主线程,主线程结束了,子守护线程就结束了,主线程并不会等待守护子线程执行完毕
那么问题来了,我们虽然想异步执行,但是也需要主线程等待所有子线程执行完毕或者某个任务遇到异常后,在主线程继续执行剩余逻辑,不加守护进程又无法及时结束其他子线程任务,加了守护线程主线程又不能等待子线程任务执行结束
这里,我们可以使用线程的事件Event,可用于线程间的通信,控制线程间执行顺序;我们加入后,在主线程中等待子线程,如果子线程执行结束或遇到异常,我们直接通知主线程进行下一步处理即可
class ThreadTest:
wake_event = threading.Event()
def do_fish(self):
print("thread do_fish start", os.getpid(), threading.current_thread().getName())
# 模拟任务耗时
sleep(3)
print('thread do_fish finished', datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"))
# 煲汤
def do_soap(self):
print("thread do_soap start", os.getpid(), threading.current_thread().getName())
# 模拟io耗时
sleep(2)
print('thread do_soap finished', datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"))
# 通知主线程可以执行了
self.wake_event.set()
raise Exception("do_soap raise an exception.")
# 煮米饭
def do_rice(self):
print("thread do_rice start\n", os.getpid(), threading.current_thread().getName())
# 模拟io耗时
sleep(5)
print('thread do_rice finished', datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"))
def start(self):
print("thread start do", datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"))
for i in range(1, 4):
if i == 1:
threading.Thread(target=self.do_fish, daemon=True).start()
elif i == 2:
threading.Thread(target=self.do_soap, daemon=True).start()
elif i == 3:
threading.Thread(target=self.do_rice, daemon=True).start()
if __name__ == "__main__":
print("main start...")
# 多线程
thread_test = ThreadTest()
thread_test.start()
# 等待event的通知才继续执行
thread_test.wake_event.wait()
# 重置event的状态标识
thread_test.wake_event.clear()
print("main end...")
执行结果如下
main start...
thread start do 2020-01-03 13:33:22
thread do_fish start 23708 Thread-1
thread do_soap start 23708 Thread-2
thread do_rice start 23708 Thread-3
Exception in thread Thread-2:
Traceback (most recent call last):
File "C:\Users\e-song.li\AppData\Local\Programs\Python\Python38\lib\threading.py", line 932, in _bootstrap_inner
thread do_soap finished 2020-01-03 13:33:24
main end...
main end...最后打印,可见主线程等待了子线程,而且抛出异常后后面再无打印do_fish和do_rice任务的结束日志,这点来看,满足了我们一个子线程任务异常,结束其他任务的需求;至于另外两种方式,大家可以自行尝试
总结:
- 异步任务实现方式不只是使用thread,还可以使用多进程,协程的方式
- 线程和进程都可以设置为守护的,设置后主线程不会等待他们运行结束,主线程结束他们也就结束了
- 协程其实是系统内部调度实现的,所有的任务都是运行在一个进程和线程(主线程)中的
- 执行shell命令时,推荐使用p=subprocess.Popen,可以实时获取执行进度,通过readline或者readlines或者p.wait函数皆可以阻塞shell命令
- 如果将subprocess.Popen放在thread中,不仅需要停止线程,还需要停止进程,即线程设置为守护,进程调用p.kill结束,两个都调用才能结束当前执行的任务
- 如果p=subprocess.Popen中设置了shell=True,则会开启两个进程,虽然调用p.kill但是无法停止,需要将shell=False,即可