python多进程编程使用进程池非常的方便管理进程,但是有时候子进程之间会抢占一些独占资源,比如consol或者比如日志文件的写入权限,这样的时候我们一般需要共享一个Lock来对独占资源加锁。lock作为一个不可直接打包的资源是没有办法作为一个参数直接给Pool的map方法里的函数传参的。为了解决这个问题,有两种解决方法,一种是使用多进程的管理器Manager(),并使用偏函数的办法传递对象Manager.Lock()。第二种是在进程池创建时传递multiprocessing.Lock()对象。
下面以一个具体的栗子来说明。
比如我现在有一个数据列表我想通过多进程的方式将里面的数据发送到指定的API并且在日志文件中记录每次请求所用的时间。
我们最容易想到的解决办法就是把锁作为一个参数传进去:
from multiprocessing import Pool, Lock
import urllib2
from time import clock
from functools import partial
def send_request(lock, data):
api_url = 'http://api.xxxx.com/?data=%s'
start_time = clock()
print urllib2.urlopen(api_url % data).read()
end_time = clock()
lock.acquire()
whit open('request.log', 'a+') as logs:
logs.write('request %s cost: %s\n' % (data, end_time - start_time))
lock.release()
if __name__ == '__main__':
data_list = ['data1', 'data2', 'data3']
pool = Pool(8)
lock = Lock()
partial_send_request(send_request, lock=lock)
pool.map(partial_send_request, data_list)
pool.close()
pool.join()
在这样的情况下,lock作为一个不可直接打包的资源是没有办法作为一个参数直接给Pool的map方法里的函数传参的。
会出现一个运行时错误:
Runtime Error: Lock objects should only be shared between processes through inheritance.
根据一开始的思路我们可以把代码改成下面的样子:
第一种思路,使用Manager。
send_request函数不用改变,只改变main中的内容:
if __name__ == '__main__':
from multiprocessing import Manager
data_list = ['data1', 'data2', 'data3']
pool = Pool(8)
manager = Manager()
lock = manager.Lock()
partial_send_request(send_request, lock=lock)
pool.map(partial_send_request, data_list)
pool.close()
pool.join()
这是第一种方法,但是对于仅仅需要一个日志写入锁就用一个Manager显的十分重了。这种方式其实是需要一个专门的进程去处理Manager服务。所有的加锁和释放锁的操作都是通过IPC传递给Manager服务的。
第二种解决思路就是通过initializer参数在Pool对象创建时传递Lock对象。这种方式将Lock对象变为了所有子进程的全局对象。
代码可以作如下修改:
def send_request(data):
api_url = 'http://api.xxxx.com/?data=%s'
start_time = clock()
print urllib2.urlopen(api_url % data).read()
end_time = clock()
lock.acquire()
whit open('request.log', 'a+') as logs:
logs.write('request %s cost: %s\n' % (data, end_time - start_time))
lock.release()
def init(l):
global lock
lock = l
if __name__ == '__main__':
data_list = ['data1', 'data2', 'data3']
lock = Lock()
pool = Pool(8, initializer=init, initargs=(lock,))
pool.map(send_request, data_list)
pool.close()
pool.join()
转自:https://zhuanlan.zhihu.com/p/22223656
群中有同学贴了如下一段代码,问为何 list 最后打印的是空值?
from multiprocessing import Process, Manager
import os
manager = Manager()
vip_list = []
#vip_list = manager.list()
def testFunc(cc):
vip_list.append(cc)
print 'process id:', os.getpid()
if __name__ == '__main__':
threads = []
for ll in range(10):
t = Process(target=testFunc, args=(ll,))
t.daemon = True
threads.append(t)
for i in range(len(threads)):
threads[i].start()
for j in range(len(threads)):
threads[j].join()
print "------------------------"
print 'process id:', os.getpid()
print vip_list
其实如果你了解 python 的多线程模型,GIL 问题,然后了解多线程、多进程原理,上述问题不难回答,不过如果你不知道也没关系,跑一下上面的代码你就知道是什么问题了。
python aa.py
process id: 632
process id: 635
process id: 637
process id: 633
process id: 636
process id: 634
process id: 639
process id: 638
process id: 641
process id: 640
------------------------
process id: 619
[]
将第 6 行注释开启,你会看到如下结果:
process id: 32074
process id: 32073
process id: 32072
process id: 32078
process id: 32076
process id: 32071
process id: 32077
process id: 32079
process id: 32075
process id: 32080
------------------------
process id: 32066
[3, 2, 1, 7, 5, 0, 6, 8, 4, 9]
(1) Shared memory:
Data can be stored in a shared memory map using Value or Array. For example, the following codehttp://docs.python.org/2/library/multiprocessing.html#sharing-state-between-processes
from multiprocessing import Process, Value, Array
def f(n, a):
n.value = 3.1415927
for i in range(len(a)):
a[i] = -a[i]
if __name__ == '__main__':
num = Value('d', 0.0)
arr = Array('i', range(10))
p = Process(target=f, args=(num, arr))
p.start()
p.join()
print num.value
print arr[:]
结果:
3.1415927
[0, -1, -2, -3, -4, -5, -6, -7, -8, -9]
(2) Server proc ess:
A manager object returned by Manager() controls a server process which holds Python objects and allows other processes to manipulate them using proxies.
A manager returned by Manager() will support types list, dict, Namespace, Lock, RLock, Semaphore, BoundedSemaphore, Condition, Event, Queue, Value and Array.
代码见开头的例子。
http://docs.python.org/2/library/multiprocessing.html#managers
看段简单的 代码:一个简单的计数器:
from multiprocessing import Process, Manager
import os
manager = Manager()
sum = manager.Value('tmp', 0)
def testFunc(cc):
sum.value += cc
if __name__ == '__main__':
threads = []
for ll in range(100):
t = Process(target=testFunc, args=(1,))
t.daemon = True
threads.append(t)
for i in range(len(threads)):
threads[i].start()
for j in range(len(threads)):
threads[j].join()
print "------------------------"
print 'process id:', os.getpid()
print sum.value
结果:
------------------------
process id: 17378
97
也许你会问:WTF?其实这个问题在多线程时代就存在了,只是在多进程时代又杯具重演了而已:Lock!
from multiprocessing import Process, Manager, Lock
import os
lock = Lock()
manager = Manager()
sum = manager.Value('tmp', 0)
def testFunc(cc, lock):
with lock:
sum.value += cc
if __name__ == '__main__':
threads = []
for ll in range(100):
t = Process(target=testFunc, args=(1, lock))
t.daemon = True
threads.append(t)
for i in range(len(threads)):
threads[i].start()
for j in range(len(threads)):
threads[j].join()
print "------------------------"
print 'process id:', os.getpid()
print sum.value
这段代码性能如何呢?跑跑看,或者加大循环次数试一下。。。
Note that usually sharing data between processes may not be the best choice, because of all the synchronization issues; an approach involving actors exchanging messages is usually seen as a better choice. See also Python documentation : As mentioned above, when doing concurrent programming it is usually best to avoid using shared state as far as possible. This is particularly true when using multiple processes. However, if you really do need to use some shared data then multiprocessing provides a couple of ways of doing so.
http://stackoverflow.com/questions/14124588/python-multiprocessing-shared-memory
http://eli.thegreenplace.net/2012/01/04/shared-counter-with-pythons-multiprocessing/
http://docs.python.org/2/library/multiprocessing.html#multiprocessing.sharedctypes.synchronized
【原文链接】http://my.oschina.net/leejun2005/blog/203148