为了释放python GIL锁实现多个任务的并发运行(实际上并非真正的并行只是看起来并发),往往采用多线程或者线程池的方式来实现。
从Python3.2开始,concurrent.futures模块提供了线程池ThreadPoolExecutor和进程池ProcessPoolExecutor两个对象,线程池模块和进程池模块通过submit提交一个任务并返回一个future对象,它是一个未来可期的对象,通过它可以获得线程或进程的任务执行状态及执行结果。
import time
from concurrent.futures import ThreadPoolExecutor
def io_bound_work(page):
time.sleep(page)
print(f"work {page} finished.")
return page
if __name__ == '__main__':
executor = ThreadPoolExecutor(max_workers=32) # ThreadPoolExecutor构造示例,max_workers参数表示最多工作的线程数
future_1 = executor.submit(io_bound_work, 1) # 提交线程需要执行的任务到线程池中,并返回该任务的句柄,注意submit()不是阻塞的,而是立即返回的
future_2 = executor.submit(io_bound_work, 2)
future_3 = executor.submit(io_bound_work, 3)
# 通过done()方法来判断线程是否完成, True表示已完成、False表示未完成
print(f"future_1 {future_1.done()}")
print(f"future_2: {future_2.done()}")
print(f"future_3: {future_3.done()}")
# 通过result()来获取线程返回的结果, timeout为获取结果的最长等待时间, 若为None则一直等待直到线程结束
print(f"main 1 result: {future_1.result(timeout=None)}")
print(f"main 2 result: {future_2.result(timeout=None)}")
print(f"main 3 result: {future_3.result(timeout=None)}")
运行结果:
future_1 False
future_2: False
future_3: False
work 1 finished.
main 1 result: 1
work 2 finished.
main 2 result: 2
work 3 finished.
main 3 result: 3
import time
from concurrent.futures import ThreadPoolExecutor, wait, FIRST_COMPLETED, ALL_COMPLETED
def io_bound_work(page):
time.sleep(page)
print(f"work {page} finished.")
return page
if __name__ == '__main__':
executor = ThreadPoolExecutor(max_workers=32)
work_list = [executor.submit(io_bound_work, i) for i in range(1, 4)]
# fs:要执行的序列, timeout:等待的最大时间, return_when:wait返回结果的条件;
# ALL_COMPLETED等待全部线程执行结束后返回, FIRST_COMPLETED等待第一个线程结束时返回, FIRST_EXCEPTION线程一旦产生异常事件就结束
ret = wait(fs=work_list, timeout=None, return_when=FIRST_COMPLETED) # 返回的条件是当完成第一个任务时返回
print(f"work_list FIRST_COMPLETED, ret:{ret}") # wait的返回值包含了已完成done和未完成not_done的句柄
ret = wait(fs=work_list, timeout=None, return_when=ALL_COMPLETED) # 返回的条件是当完成所有任务时返回
print(f"work_list ALL_COMPLETED, ret:{ret}")
运行结果:
work 1 finished.
work_list FIRST_COMPLETED, ret:DoneAndNotDoneFutures(done={}, not_done={, })
work 2 finished.
work 3 finished.
work_list ALL_COMPLETED, ret:DoneAndNotDoneFutures(done={, , }, not_done=set())
import time
from concurrent.futures import ThreadPoolExecutor, as_completed
def io_bound_work(page):
time.sleep(page)
print(f"work {page} finished.")
return page
if __name__ == '__main__':
executor = ThreadPoolExecutor(max_workers=32)
work_list = [executor.submit(io_bound_work, i) for i in range(1, 4)]
for future in as_completed(work_list): # 每完成一个线程响应一个结果,直到work_list中线程全部结束
data = future.result()
print(f"main: {data}")
print('finished')
运行结果:
work 1 finished.
main: 1
work 2 finished.
main: 2
work 3 finished.
main: 3
finished
import time
from concurrent.futures import ThreadPoolExecutor
def io_bound_work(page):
time.sleep(page)
print(f"work {page} finished.")
return page
if __name__ == '__main__':
executor = ThreadPoolExecutor(max_workers=32)
# fn: 线程要执行的函数, iterables:可迭代对象, timeout:等待线程执行的结果
for result in executor.map(io_bound_work, [3, 2, 1], timeout=None):
print(f"work result: {result}")
使用map方法,无需提前使用submit方法,map方法与python高阶函数map的含义相同,都是将序列中的每个元素都执行同一个函数。上面的代码对列表中的每个元素都执行 io_bound_work 方法,并分配各个线程。
map与as_completed()方法的结果不同,map输出顺序与列表的顺序相同,会先打印前面提交的任务返回的结果。