Python编程:基于multiprocessing的Pool的并行计算

前言

之前做机器视觉任务处理图片时会遇到耗时较长的情况,当时就想着如果使用多进程应该能有提升。于是这里先做一个代码框架的记录,以后需要时再用上。

方法

原有的代码:

import tqdm,glob
def process_img(fname:str):
    try:
        # do something
        i=1
        return True,""
    except Exception as e:
        return False,str(e)

if __name__ == '__main__':
    pbar = tqdm(total=1400)
    for fname in glob.iglob("./imgs/*.jpg",recursive=True):
        pbar.update()
        succ,msg = process_img(fname)
        if not succ:
            print(msg)
    pbar.close()

为了方便改写,我们使用map重构:

import tqdm,glob
def process_img(fname:str):
    try:
        # do something
        i=1
        return True,""
    except Exception as e:
        return False,str(e)

if __name__ == '__main__':
    pbar = tqdm(total=1400)
    for succ,msg in map(process_img,glob.iglob("./imgs/*.jpg",recursive=True)):
        pbar.update()
        if not succ:
            print(msg)
    pbar.close()

然后我们基于multiprocessing改写为并行方式,步骤如下;

  1. 定义进程池
  2. 定义并行处理函数
  3. 使用Pool.map或者imap替代内置的map函数

代码如下:

import tqdm,glob
def process_img(fname:str):
    try:
        # do something
        i=1
        return True,""
    except Exception as e:
        return False,str(e)

if __name__ == '__main__':
    from multiprocessing import Pool
    with Pool(processes=None) as p: # 进程池,缺省值为CPU核心数
        pbar = tqdm(total=1400)
        for succ,msg in p.map(
            process_img,                                # 并行计算函数
            glob.iglob("./imgs/*.jpg",recursive=True),  # 可迭代对象
            20                                          # chunk数?
            ):
            pbar.update()
            if not succ:
                print(msg)
        pbar.close()

关于p.map的的chunksize参数,官方文档是这么写的:

This method chops the iterable into a number of chunks which it submits to the process pool as separate tasks. The (approximate) size of these chunks can be specified by setting chunksize to a positive integer.

应该也可以不设置?以后实践了再试试

本文参考: 实用模块-9-多任务之蚂蚁搬家:multiprocessing模块_哔哩哔哩_bilibili

你可能感兴趣的:(Python编程,python,开发语言)