Python多进程Pool与Process区别,以及用Process实现Pool--part1

  • Python多进程Pool与Process主要区别

(1)Process需要自己管理进程,起一个Process就是起一个新进程;
(2)Pool是进程池,它可以开启固定数量的进程,然后将任务放到一个池子里,系统来调度多进程执行池子里的任务;
Python中多进程主要是通过multiprocessing实现的,通过私有函数all查看,需带双下划线;
import multiprocessing
multiprocessing.all
[‘Array’, ‘AuthenticationError’, ‘Barrier’, ‘BoundedSemaphore’, ‘BufferTooShort’, ‘Condition’, ‘Event’, ‘JoinableQueue’, ‘Lock’, ‘Manager’, ‘Pipe’, ‘Pool’, ‘Process’, ‘ProcessError’, ‘Queue’, ‘RLock’, ‘RawArray’, ‘RawValue’, ‘Semaphore’, ‘SimpleQueue’, ‘TimeoutError’, ‘Value’, ‘active_children’, ‘allow_connection_pickling’, ‘cpu_count’, ‘current_process’, ‘freeze_support’, ‘get_all_start_methods’, ‘get_context’, ‘get_logger’, ‘get_start_method’, ‘log_to_stderr’, ‘set_executable’, ‘set_forkserver_preload’, ‘set_start_method’]

再看库里面Process函数的定义:

class Process(object):
    def __init__(self, group=None, target=None, name=None, args=(), kwargs={}):
        self.name = ''
        self.daemon = False
        self.authkey = None
        self.exitcode = None
        self.ident = 0
        self.pid = 0
        self.sentinel = None
    def run(self):
        pass
    def start(self):
        pass
    def terminate(self):
        pass
    def join(self, timeout=None):
        pass	
    def is_alive(self):
        return False

进程创建:Process(target=主要运行的函数,name=自定义进程名称可不写,args=(参数))
方法:is_alive():判断进程是否存活
join([timeout]):子进程结束再执行下一步,timeout为超时时间,有时进程遇到阻塞,为了程序能够运行下去而设置超时时间
run():如果在创建Process对象的时候不指定target,那么就会默认执行Process的run方法
start():启动进程,区分run()
terminate():终止进程,关于终止进程没有这么简单,貌似用psutil包会更好,有机会以后了解更多再写下。
其中,Process以start()启动某个进程。
属性:authkey: 在文档中authkey()函数找到这么一句话:Set authorization key of process设置过程的授权密钥 ,目前没找到相关应用实例,这个密钥是怎么用的呢?文章不提
daemon:父进程终止后自动终止,且自己不能产生新进程,必须在start()之前设置
exitcode:进程在运行时为None、如果为–N,表示被信号N结束
name:进程的名字,自定义
pid:每个进程有唯一的PID编号。
参考:https://www.jianshu.com/p/a7a2ee7c5463

import multiprocessing
from multiprocessing import Process
#
# multiprocessing.__all__
# ['Array', 'AuthenticationError', 'Barrier', 'BoundedSemaphore', 'BufferTooShort', 'Condition', 'Event', 'JoinableQueue', 'Lock', 'Manager', 'Pipe', 'Pool', 'Process', 'ProcessError', 'Queue', 'RLock', 'RawArray', 'RawValue', 'Semaphore', 'SimpleQueue', 'TimeoutError', 'Value', 'active_children', 'allow_connection_pickling', 'cpu_count', 'current_process', 'freeze_support', 'get_all_start_methods', 'get_context', 'get_logger', 'get_start_method', 'log_to_stderr', 'set_executable', 'set_forkserver_preload', 'set_start_method']
# a = Process()
# -*- coding:utf-8 -*-

################start(),join()#########################
from multiprocessing import Process
import time
from multiprocessing.pool import Pool


def fun1(t):
    print ('this is fun1',time.ctime())
    time.sleep(t)
    print ('fun1 finish',time.ctime())

def fun2(t):
    print ('this is fun2',time.ctime())
    time.sleep(t)
    print ('fun2 finish',time.ctime())


############demo 1##################
# if __name__ == '__main__':
#     a=time.time()
#     p1=Process(target=fun1,args=(4,))
#     p2 = Process(target=fun2, args=(6,))
#     p1.start()
#     p2.start()
#     p1.join()
#     p2.join()
#     b=time.time()
#     print ('finish',b-a)
'''
只有func2:
this is fun2 Sun Jan  6 17:02:25 2019
fun2 finish Sun Jan  6 17:02:31 2019
finish 6.02819037437439'''


'''
func1和func2都有:
this is fun1 Sun Jan  6 17:03:09 2019
this is fun2 Sun Jan  6 17:03:09 2019
fun1 finish Sun Jan  6 17:03:13 2019
fun2 finish Sun Jan  6 17:03:15 2019
finish 6.0113677978515625

证明当start和join位置为:
p1.start()
p2.start()
p1.join()
p2.join()
的时候,func1和func2是同时运行的,主进程最后运行!!!!!

'''

##############demo 2 ################
# if __name__ == '__main__':
#     a=time.time()
#     p1=Process(target=fun1,args=(4,))
#     p2 = Process(target=fun2, args=(6,))
#     p1.start()
#     p1.join()
#     p2.start()
#     p2.join()
#     b=time.time()
#     print ('finish',b-a)

'''

this is fun1 Sun Jan  6 17:21:24 2019
fun1 finish Sun Jan  6 17:21:28 2019
this is fun2 Sun Jan  6 17:21:28 2019
fun2 finish Sun Jan  6 17:21:34 2019
finish 10.016567707061768

证明当start和join位置为:
p1.start()
p1.join()
p2.start()
p2.join()
的时候,func1和func2不是同时运行的!!!!!

'''


############demo 3##############
# if __name__ == '__main__':
#     a=time.time()
#     p1=Process(target=fun1,args=(4,))
#     p2 = Process(target=fun2, args=(6,))
#     p1.start()
#     p2.start()
#     p1.join()
#     #p2.join()
#     b=time.time()
#     print ('finish',b-a)

'''
this is fun1 Sun Jan  6 17:46:36 2019
this is fun2 Sun Jan  6 17:46:36 2019
fun1 finish Sun Jan  6 17:46:40 2019
finish 4.007489204406738
fun2 finish Sun Jan  6 17:46:42 2019
结果:
这次是运行完fun1(因为p1进程有用join(),
所以主程序等待p1运行完接着执行下一步),
接着继续运行主进程的print 'finish',最后fun2运行完毕才结束;

fun2开启之后会等待主进程结束之后再运行;
'''



################daemon,alive属性##########################
# if __name__ == '__main__':
#     a=time.time()
#     p1=Process(name='fun1进程',target=fun1,args=(4,))
#     p2 = Process(name='fun2进程',target=fun2, args=(6,))
#     p1.daemon=True
#     p2.daemon = True
#     p1.start()
#     p2.start()
#     p1.join()
#     print(p1,p2)
#     print('进程1:',p1.is_alive(),'进程2:',p2.is_alive())
#     #p2.join()
#     b=time.time()
#     print('finish',b-a)


'''
结果:
this is fun1 Sun Jan  6 17:52:08 2019
this is fun2 Sun Jan  6 17:52:08 2019
fun1 finish Sun Jan  6 17:52:12 2019
 
进程1: False 进程2: True
finish 4.006983280181885

可以看到,name是给进程赋予名字, 
运行到print '进程1:',p1.is_alive(),'进程2:',p2.is_alive() 
这句的时候,p1进程已经结束(返回False),
p2进程仍然在运行(返回True),
但p2没有用join(),
所以直接接着执行主进程,由于用了daemon=Ture,
父进程终止后自动终止,p2进程没有结束就强行结束整个程序了.


############
属性:authkey: 在文档中authkey()函数找到这么一句话:
Set authorization key of process设置过程的授权密钥 ,
目前没找到相关应用实例,这个密钥是怎么用的呢?文章不提
daemon:父进程终止后自动终止,且自己不能产生新进程,必须在start()之前设置
exitcode:进程在运行时为None、如果为–N,表示被信号N结束
name:进程的名字,自定义
pid:每个进程有唯一的PID编号。


'''


#Pool进程池使用-----------单进程---------##
#####################Pool---pool.apply_async---非阻塞---###############################
# if __name__ == '__main__':
#     a=time.time()
#     pool = Pool(processes =3)  # 可以同时跑3个进程
#     for i in range(3,8):
#         pool.apply_async(fun1,(i,))
#     pool.close()
#     pool.join()
#     b=time.time()
#     print('finish',b-a)

'''this is fun1 Sun Jan  6 17:58:29 2019
this is fun1 Sun Jan  6 17:58:29 2019
this is fun1 Sun Jan  6 17:58:29 2019
fun1 finish Sun Jan  6 17:58:32 2019
this is fun1 Sun Jan  6 17:58:32 2019
fun1 finish Sun Jan  6 17:58:33 2019
this is fun1 Sun Jan  6 17:58:33 2019
fun1 finish Sun Jan  6 17:58:34 2019
fun1 finish Sun Jan  6 17:58:38 2019
fun1 finish Sun Jan  6 17:58:40 2019
finish 11.06075119972229  


从上面的结果可以看到,设置了3个运行进程上限,6 17:58:29这个时间同时开始三个进程,
当第一个进程结束时(参数为3秒那个进程),会添加新的进程,如此循环,
直至进程池运行完再执行主进程语句b=time.time() 
print 'finish',b-a .这里用到非阻塞apply_async(),再来对比下阻塞apply()

'''

#####################Pool---pool.apply--阻塞---###############################
# if __name__ == '__main__':
#     a=time.time()
#     pool = Pool(processes =3)  # 可以同时跑3个进程
#     for i in range(3,8):
#         pool.apply(fun1,(i,))
#     pool.close()
#     pool.join()
#     b=time.time()
#     print('finish',b-a)

'''
this is fun1 Sun Jan  6 18:05:10 2019
fun1 finish Sun Jan  6 18:05:13 2019
this is fun1 Sun Jan  6 18:05:13 2019
fun1 finish Sun Jan  6 18:05:17 2019
this is fun1 Sun Jan  6 18:05:17 2019
fun1 finish Sun Jan  6 18:05:22 2019
this is fun1 Sun Jan  6 18:05:22 2019
fun1 finish Sun Jan  6 18:05:28 2019
this is fun1 Sun Jan  6 18:05:28 2019
fun1 finish Sun Jan  6 18:05:36 2019
finish 25.07084083557129

可以看到,阻塞是当一个进程结束后,再进行下一个进程,一般我们都用非阻塞apply_async()
'''
#Pool进程池使用-----------多个进程池--for循环实现---------##
#下面程序相当于开了两个进程池,分别实现两个不同的功能,每个进程池含3个进程;
####################################################
if __name__ == '__main__':
    a=time.time()
    pool = Pool(processes =3)  # 可以同时跑3个进程
    for fun in [fun1,fun2]:
        for i in range(3,8):
            pool.apply_async(fun,(i,))
    pool.close()
    pool.join()
    b=time.time()
    print('finish',b-a)
'''
this is fun1 Sun Jan  6 18:13:20 2019
this is fun1 Sun Jan  6 18:13:20 2019
this is fun1 Sun Jan  6 18:13:20 2019
fun1 finish Sun Jan  6 18:13:23 2019
this is fun1 Sun Jan  6 18:13:23 2019
fun1 finish Sun Jan  6 18:13:24 2019
this is fun1 Sun Jan  6 18:13:24 2019
fun1 finish Sun Jan  6 18:13:25 2019
this is fun2 Sun Jan  6 18:13:25 2019
fun2 finish Sun Jan  6 18:13:28 2019
this is fun2 Sun Jan  6 18:13:28 2019
fun1 finish Sun Jan  6 18:13:29 2019
this is fun2 Sun Jan  6 18:13:29 2019
fun1 finish Sun Jan  6 18:13:31 2019
this is fun2 Sun Jan  6 18:13:31 2019
fun2 finish Sun Jan  6 18:13:32 2019
this is fun2 Sun Jan  6 18:13:32 2019
fun2 finish Sun Jan  6 18:13:34 2019
fun2 finish Sun Jan  6 18:13:37 2019
fun2 finish Sun Jan  6 18:13:39 2019
finish 19.056739807128906

结果:
在fun1运行完接着运行fun2.
另外对于没有参数的情况,就直接 pool.apply_async(funtion),无需写上参数.
'''

你可能感兴趣的:(spark海量数据分析)