python多线程编程,获取各个线程返回值及相关问题

1. multiprocessing.Process() 

针对使用multiprocessing.Process() 的多线程机制

获取返回值的方法:multiprocessing.Manager(),构造线程返回结果存储结构,本质是共享内存

具体方法样例:

import os
import sys
import random
import threading
import multiprocessing
 
# 线程执行函数
def worker(procnum, return_dict):
    """worker function"""
    print(str(procnum) + " represent!")
    num = random.randint(5,20)
    arr = []
    for i in range(num):
        arr.append(i)
    # 依据线程id来存储各线程对应的处理结果
    return_dict[procnum] = (procnum,arr)
 
if __name__ == "__main__":
    manager = multiprocessing.Manager()
    # 构造返回值存储结构,本质是共享内存方式
    return_dict = manager.dict()
    jobs = []
    for i in range(5):
        # 将构造的返回值存储结构传递给多线程执行函数,并标识各个线程id
        p = multiprocessing.Process(target=worker, args=(i, return_dict))
        jobs.append(p)
        p.start()
 
    for proc in jobs:
        proc.join()
    
    # 所有线程处理完毕后,遍历结果输出
    for id,arr in return_dict.values():
        print(id,arr)

但是,当返回数据非常大的时候,当线程执行完毕,存储结果时会报错,实验平台(vscode,centos 7).目前还没找到解决方法。

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/multiprocessing/managers.py", line 788, in _callmethod
    conn = self._tls.connection
AttributeError: 'ForkAwareLocal' object has no attribute 'connection'
 
During handling of the above exception, another exception occurred:
 
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/usr/local/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "excute_fanci2.py", line 357, in product
    # return_dict[thread_id] = ( domains,has_word)
  File "", line 2, in __setitem__
  File "/usr/local/lib/python3.7/multiprocessing/managers.py", line 792, in _callmethod
    self._connect()
  File "/usr/local/lib/python3.7/multiprocessing/managers.py", line 779, in _connect
    conn = self._Client(self._token.address, authkey=self._authkey)
  File "/usr/local/lib/python3.7/multiprocessing/connection.py", line 492, in Client
    c = SocketClient(address)
  File "/usr/local/lib/python3.7/multiprocessing/connection.py", line 619, in SocketClient
    s.connect(address)
FileNotFoundError: [Errno 2] No such file or directory

2. multiprocessing.Pool()

使用另外一种方法 multiprocessing.Pool() ,可以解决返回数据过大问题,目前实验没有出现问题:

import os
import sys
import random
import threading
import multiprocessing
import time
 
 
 
def worker(args):
    ## 该方法在传递多个参数时,似乎只能通过这样进行传递,否则会报错
    threadname, res_path, thread_id = args[0], args[1], args[2]
    
    result = [i for i in range(10000000)]
    
    return (result, thread_id)
 
 
if __name__ == "__main__":
    process_num = 20
    ## 仍然是线程池方法
    pool = multiprocessing.Pool(processes = process_num)
    args_list = []
    ## 下面构造各个线程的参数列表,如果每个线程接受多个参数,注意在多参数接收方式
    for i in range(process_num):
        threadname  = "thread"+str(i)
        res_path = str(i)+'_'
        args=(threadname,res_path,i)
        args_list.append(args)
    
    ## 将参数传递给线程池,绑定执行方法,map方法返回的是一个结果列表,包含各个线程的执行结果
    results = pool.map(worker,args_list)
 
    for result,id in results:
        print(results)

原文:python多线程编程,获取各个线程返回值 及 相关问题_AdvSoul的博客-CSDN博客

你可能感兴趣的:(python,python,开发语言,爬虫)