python 多线程详述

参见 python核心编程第二版第18章

0 引言

在多线程编程出现之前，电脑程序的运行是由一个执行序列组成，执行序列按照顺序在主机的cpu中运行。无论是任务本身要求顺序执行还是整个程序是由多个子任务组成，程序都是按此方式顺序执行。针对子任务互相独立也是这样。由此产生了多线程，多线程的目的：并行运行这些互相独立的子任务，并行执行可以大幅度提高整个任务的效率

1 进程和线程

什么是进程？

计算机程序只不过是磁盘中可执行的二进制的数据，只有它们在被读到内存被操作系统调用时才会开始他们的生命周期。

进程是程序的一次执行, 每个进程都有自己的地址空间，内存，数据栈以及其他记录其运行轨迹的辅助工具

如下：test是一个二进制数据，为一个进程

什么是线程？

线程有时被称为轻量级进程，与进程相似。不同的是，所有的线程运行在同一个进程中，共享相同的运行环境，它们可以被想象成在主进程或者“主线程”中并行运行的“迷你进程”

线程有开始，顺序执行和结束三部分。它有一个自己的指令指针，记录自己运行到什么地方。线程的运行可能被抢占，或暂时的挂起, 让其他的线程先运行，这叫做让步。
注：在单核cpu的系统中，真正的并发是不可能的，每个线程每次都只是会被安排运行一小会儿，然后就把cpu让出来，让其他的线程去运行。

2 实例

单线程顺序执行

单线程顺序执行loop0 () 和 loop1() 函数，loop0() 结束后，loop1() 才会开始。其中loop 0() “睡眠”4秒，loop 1 () “睡眠”2秒

# -*- coding: utf-8 -*-
import os
import time

def loop0():
    print "start loop 0 begin: ", time.strftime("%Y-%m-%d %H:%M:%S")
    time.sleep(4)
    print "end loop 0 done! ", time.strftime("%Y-%m-%d %H:%M:%S")

def loop1():
    print "start loop 1 begin: ", time.strftime("%Y-%m-%d %H:%M:%S")
    time.sleep(2)
    print "end loop 1 done! ", time.strftime("%Y-%m-%d %H:%M:%S")

def main():
    """ main func """
    print "before excute main: ", time.strftime("%Y-%m-%d %H:%M:%S")
    loop0()
    loop1()
    print "all loop func done", time.strftime("%Y-%m-%d %H:%M:%S")

if __name__ == '__main__':
    main()

运行结果输出：可以看到，loop 0() 函数执行消耗4s， loop 1() 函数执行消耗2s。main函数总共消耗6s。

before excute main:  2019-11-24 14:52:13
start loop 0 begin:  2019-11-24 14:52:13
end loop 0 done!  2019-11-24 14:52:17
start loop 1 begin:  2019-11-24 14:52:17
end loop 1 done!  2019-11-24 14:52:19
all loop func done 2019-11-24 14:52:19

多线程执行

假定loop0 () 和 loop 1() 做的不是睡眠，而是各自独立，不相关的运算。那么考虑这些计算并行执行，是不是可以减少总得运行时间？

使用thread模块

待更新……

使用threading模块

避免使用thread模块，使用更高级别地threading模块，threading模块更先进，对线程的支持更为完善，而使用thread模块里的属性有可能会与threading出现冲突。
官方文档：https://docs.python.org/2.7/library/threading.html

使用Thread类，可以有多种方法创建线程：（通常选择最后一类）

创建Thread类的实例，传递一个函数
创建Thread类的实例，传递一个可调用的类实例
派生Thread类的子类，并创建子类的实例

方法1 创建Thread类的实例，传递一个函数

# -*- coding: utf-8 -*-

import threading
import time

loops = [4, 2]


def loop(nloop, nesc=None):
    """ loop func """
    print "start [%s], at [%s]" % (nloop, time.strftime("%Y-%m-%d %H:%M:%S"))
    time.sleep(nesc)
    print "end [%s], at [%s]" % (nloop, time.strftime("%Y-%m-%d %H:%M:%S"))


def main():
    """ main func """
    print "start at [%s]" % time.strftime("%Y-%m-%d %H:%M:%S")
    threads = []
    nloops = range(len(loops))

    for i in nloops:
        t = threading.Thread(target=loop, args=(i, loops[i]))  # 创建第i个线程
        threads.append(t)  # 把创建的线程加入线程组
    print "threads: ", threads
    # 在所有的线程都创建了之后，一起统一调用start()进行启动线程，不需要创建一个启动一个
    for i in nloops:
        threads[i].start()
    # 等待线程结束
    for i in nloops:
        threads[i].join()
    print "end at [%s]" % time.strftime("%Y-%m-%d %H:%M:%S")

if __name__ == '__main__':
    main()

注意点1：t = threading.Thread(target=loop, args=(i, loops[i])) 是创建了一个Thread的类实例，并传给它loop函数，用来创建线程
注意点2： join() 会等到线程结束时，或者再给了timeout参数的时候，等到超时为止。
注意点3： join() 可以完全不用调用。一旦线程启动后，就会一直运行，直到线程的函数结束，退出为止
运行输出：如下，0号线程线程sleep 4s，1号线程sleep 2s，整体执行4s。

start at [2019-11-24 16:08:07]
threads:  [, ]
start [0], at [2019-11-24 16:08:07]
start [1], at [2019-11-24 16:08:07]
end [1], at [2019-11-24 16:08:09]
end [0], at [2019-11-24 16:08:11]
end at [2019-11-24 16:08:11]

方法2 创建Thread类的实例，传递一个可调用的类实例

# -*- coding: utf-8 -*-

import threading
import time

loops = [4, 2]

class ThreadFunc(object):
    """ define Thread class """
    def __init__(self, func, args, name=''):
        """ init func"""
        self.name = name
        self.func = func
        self.args = args

    def __call__(self):
        """ 创建新线程的时候，Thread对象会调用ThreadFunc对象，
         会使用此特殊的函数call
        """
        self.res = self.func(*self.args)

def loop(nloop, nesc=None):
    """ loop func """
    print "start [%s], at [%s]" % (nloop, time.strftime("%Y-%m-%d %H:%M:%S"))
    time.sleep(nesc)
    print "end [%s], at [%s]" % (nloop, time.strftime("%Y-%m-%d %H:%M:%S"))

def main():
    """ main func """
    print "start at [%s]" % time.strftime("%Y-%m-%d %H:%M:%S")
    threads = []
    nloops = range(len(loops))
    print "loop.__name__:", loop.__name__
    for i in nloops:
        t = threading.Thread(target=ThreadFunc(loop, (i, loops[i]), loop.__name__))  # 传入类的实例，而不仅是传一个函数
        threads.append(t)  # 把创建的线程加入线程组
    print "threads: ", threads
    # 在所有的线程都创建了之后，一起统一调用start()进行启动线程，不需要创建一个启动一个
    for i in nloops:
        threads[i].start()
    # 等待线程结束
    for i in nloops:
        threads[i].join()
    print "end at [%s]" % time.strftime("%Y-%m-%d %H:%M:%S")

if __name__ == '__main__':
    main()

注意点1：t = threading.Thread(target=ThreadFunc(loop, (i, loops[i]), loop.__name__)) 是创建了一个Thread的类实例，实例化一个可调用的类对象ThreadFunc，相当与实例化两个对象用来创建线程
注意点2： __call__的作用？待续……
运行结果：

start at [2019-11-24 16:34:24]
loop.__name__: loop
threads:  [, ]
start [0], at [2019-11-24 16:34:24]
start [1], at [2019-11-24 16:34:24]
end [1], at [2019-11-24 16:34:26]
end [0], at [2019-11-24 16:34:28]
end at [2019-11-24 16:34:28]

方法3 派生Thread类的子类，并创建子类的实例

# -*- coding: utf-8 -*-

import threading
import time

loops = [4, 2]

class MyThread(threading.Thread):
    """ define Thread class """
    def __init__(self, func, args, name=''):
        threading.Thread.__init__(self)
        """ init func"""
        self.name = name
        self.func = func
        self.args = args

    def run(self):
        """ 创建新线程的时候，Thread对象会调用ThreadFunc对象，
         会使用此特殊的函数call
        """
        self.res = self.func(*self.args)

def loop(nloop, nesc=None):
    """ loop func """
    print "start [%s], at [%s]" % (nloop, time.strftime("%Y-%m-%d %H:%M:%S"))
    time.sleep(nesc)
    print "end [%s], at [%s]" % (nloop, time.strftime("%Y-%m-%d %H:%M:%S"))

def main():
    """ main func """
    print "start at [%s]" % time.strftime("%Y-%m-%d %H:%M:%S")
    threads = []
    nloops = range(len(loops))
    print "loop.__name__:", loop.__name__
    for i in nloops:
        t = MyThread(loop, (i, loops[i]), loop.__name__)
        threads.append(t)  # 把创建的线程加入线程组
    print "threads: ", threads
    # 在所有的线程都创建了之后，一起统一调用start()进行启动线程，不需要创建一个启动一个
    for i in nloops:
        threads[i].start()
    # 等待线程结束
    for i in nloops:
        threads[i].join()
    print "end at [%s]" % time.strftime("%Y-%m-%d %H:%M:%S")


if __name__ == '__main__':
    main()

运行结果：

start at [2019-11-24 16:49:53]
loop.__name__: loop
threads:  [, ]
start [0], at [2019-11-24 16:49:53]
start [1], at [2019-11-24 16:49:53]
end [1], at [2019-11-24 16:49:55]
end [0], at [2019-11-24 16:49:57]
end at [2019-11-24 16:49:57]

Thread类常用类属性和类方法如下：

类属性	描述
name	线程名
ident	标识名
daemon	是否为守护进程

类方法	描述
start()	开启线程的执行. `同一个thread对象最多被调用一次。如果多次被调用会抛出异常`
run()	定义线程功能的函数
join(timeout=None)	程序挂起，直到线程结束；`最多阻塞timeout秒`
getName()	返回线程的名字
setName()	设定线程的名字
isAlive()	布尔标志，表示这个线程是否还在运行中
isDaemon()	返回现成的daemon标志
setDaemon(daemon)	把线程的daemon标志设为daemonic（在调用start()函数前调用）