Python 多线程下的多种同步机制

概述

这篇博客是我翻译Python threads synchronization: Locks, RLocks, Semaphores, Conditions, Events and Queues，这篇博客对Python多线程的集中实现同步机制及其遇到的一些问题，说明的淋漓尽致。废话少说，直接撸代码。
这篇文章详细描述了python多线程机制，包括Lock、RLock，Semaphore，Condition,Event and Queue.下面一一通过代码展示了这些同步机制的内部细节。首先，让我们看一个不适用任何同步的线程模块。

Python多线程的同步机制

threading

我们要编写一个获取通过一些URL的内容并将其写入到一个文件中。我们可以在没有线程的情况下连续完成它，但是为了使它更快，我们可以创建两个线程来处理.

# 继承于高级线程库
class FetchUrls(threading.Thread):
    """
    Thread checking URLs.
    """

    def __init__(self, urls, output):
        """
        Constructor.

        @param urls list of urls to check
        @param output file to write urls output
        """
        threading.Thread.__init__(self)
        self.urls = urls
        self.output = output
    # 线程在start()后自动调用run方法。
    def run(self):
        """
        Thread run method. Check URLs one by one.
        """
        while self.urls:
            url = self.urls.pop()
            req = urllib2.Request(url)
            try:
                d = urllib2.urlopen(req)
            except urllib2.URLError, e:
                print 'URL %s failed: %s' % (url, e.reason)
            self.output.write(d.read())
            print 'write done by %s' % self.name
            print 'URL %s fetched by %s' % (url, self.name)
def main():
    # list 1 of urls to fetch
    urls1 = ['http://www.google.com', 'http://www.facebook.com']
    # list 2 of urls to fetch
    urls2 = ['http://www.yahoo.com', 'http://www.youtube.com']
    f = open('output.txt', 'w+')
    t1 = FetchUrls(urls1, f)
    t2 = FetchUrls(urls2, f)
    t1.start()
    t2.start()
    t1.join()
    t2.join()
    f.close()

if __name__ == '__main__':
    main()
#输出为
write done by Thread-2
URL http://www.163.com fetched by Thread-2
write done by Thread-2
URL http://www.qq.com fetched by Thread-2
write done by Thread-1
URL http://www.sina.com.cn/ fetched by Thread-1
write done by Thread-1
URL http://www.baidu.com fetched by Thread-1
......

上述代码会造成一个冲突，就是在两个线程同时写入到同一个文件，内容是混乱的。我们需要控制的是在任何时刻只有一个线程在写入文件，一种实现方式是使用同步机制比如：锁机制。

Lock

Lock拥有两种状态：locked and unlocked，并通过acquire() and release()来改变状态。有如下规则:

如果当前状态是unlocked状态，调用acquire()方法改变状态为locked。

如果当前状态是locked状态，调用acquire()方法将会阻塞知道另一个线程调用release()方法。

如果当前状态是unlocked状态，调用release()方法将会造成RuntiemError 异常。

如果当前状态是locked状态，调用release()方法改变状态为unlocked。
为了解决两个线程同时写入同一个文件，我们需要在程序里面引入lock机制，代码如下：

import threading
import urllib2


class FetchUrls(threading.Thread):
    """
    Thread checking URLs.
    """

    def __init__(self, urls, output, lock):
        """
        Constructor.

        @param urls list of urls to check
        @param output file to write urls output
        """
        threading.Thread.__init__(self)
        self.lock = lock
        self.urls = urls
        self.output = output

    def run(self):
        """
        Thread run method. Check URLs one by one.
        """
        while self.urls:
            self.lock.acquire()
            url = self.urls.pop()
            # req = urllib2.Request(url)
            try:
                # d = urllib2.urlopen(req)
                self.output.writelines(url)
            except urllib2.URLError, e:
                print 'URL %s failed: %s' % (url, e.reason)

            print 'write done by %s' % self.name
            print 'URL %s fetched by %s' % (url, self.name)
            self.lock.release()

def main():
    lock = threading.Lock()
    # list 1 of urls to fetch
    urls1 = ['http://www.baidu.com\n'] * 10
    # list 2 of urls to fetch
    urls2 = ['http://www.qq.com\n'] * 10
    f = open('output.txt', 'w+')
    t1 = FetchUrls(urls1, f, lock)
    t2 = FetchUrls(urls2, f, lock)
    t1.start()
    t2.start()
    t1.join()
    t2.join()
    f.close()

if __name__ == '__main__':
    main()
# 输出为
write done by Thread-1
URL http://www.baidu.com
 fetched by Thread-1
URL http://www.baidu.com
 fetched by Thread-1
write done by Thread-1
URL http://www.baidu.com
 fetched by Thread-1
write done by Thread-2
 fetched by Thread-2
write done by Thread-2
URL http://www.qq.com
 fetched by Thread-2
write done by Thread-2
URL http://www.qq.com
 fetched by Thread-2
write done by Thread-2
......

我们通过代码分析，没有两个线程同时操作同一个文件，一般而言lock是一个global的变量。那么是如何实现的呢？我们一起来看一下Python的内部实现机制(Python 2.6.6 base Linux)。

# threading.Lock()等同于thread.allocate_lock，详细的代码在Lib/threading.py文件中。
Lock = _allocate_lock
_allocate_lock = thread.allocate_lock

C语言的实现在Python模块下的thread_pthread.h，我们假设我们的操作系统支持POSXI的信号量(semaphore)，Lock是通过信号量的内部机制实现的。在指向锁的地址处sem_init()初始化一个信号量，该信号量的默认初始值是1也就是unlocked状态。该信号量在进程内的多线程之间共享。。代码如下：

PyThread_type_lock
PyThread_allocate_lock(void)
{
    ...
    // 动态分配一个lock信号量
    lock = (sem_t *)malloc(sizeof(sem_t));
    // 如果动态分配成功，则初始化该信号量的value。
    if (lock) {
        status = sem_init(lock,0,1);
        CHECK_STATUS("sem_init");
        ....
    }
    ...
}

当调用acquire()时，下面的代码被执行。waitflag默认为1也就意味着调用后阻塞直到状态变为unlocked状态。sem_wait()减少信号量的value或者阻塞直到信号量的值大于1，例如通过另外一个线程将状态变为unlocked(通过release()方法可以实现)。

int
PyThread_acquire_lock(PyThread_type_lock lock, int waitflag)
{
    ...
    do {
        if (waitflag)
            status = fix_status(sem_wait(thelock));
        else
            status = fix_status(sem_trywait(thelock));
    } while (status == EINTR); /* Retry if interrupted by a signal */
    ....
}

当release()调用时，下面代码被执行。sem_post增加信号量的值(unlock信号量的值).

void
PyThread_release_lock(PyThread_type_lock lock)
{
    ...
    status = sem_post(thelock);
    ...
}

由于lock已经实现了上下文协议管理器，所以你可以通过with来管理lock的上下文:

class FetchUrls(threading.Thread):
    ...
    def run(self):
        ...
        while self.urls:
            ...
            with self.lock:
                print 'lock acquired by %s' % self.name
                self.output.write(d.read())
                print 'write done by %s' % self.name
                print 'lock released by %s' % self.name
            ...

总结：Lock是阻塞其他线程对共享资源的访问，且同一线程只能acquire一次，如多于一次就出现了死锁，程序无法继续执行。为了保证线程对共享资源的独占，又避免死锁的出现，就有了RLock。RLock允许在同一线程中被多次acquire，线程对共享资源的释放需要把所有锁都release。即n次acquire，需要n次release。

RLock

虽然RLock也是使用thread.allocate_lock()方法，但是RLock附加了owner属性(线程所有者)来支持reentrant(可重入)功能，下面是RLock的acquire()方法，如果当前线程是资源的拥有者则每次调用counter加1，如果不是，首先获取lock，再将owner设置为当前线程并初始化为1.

def acquire(self, blocking=1):
    me = _get_ident()
    if self.__owner == me:
        self.__count = self.__count + 1
        ...
        return 1
    rc = self.__block.acquire(blocking)
    if rc:
        self.__owner = me
        self.__count = 1
        ...
    ...
    return rc

让我们来看一下release()方法，当该方法调用时，它会确保该线程是lock的拥有者（owner），并且counter-1。如果counter等于0，然后资源将被unlocked，直到另一个线程抢占资源才变为有效。代码如下：

def release(self):
    if self.__owner != _get_ident():
        raise RuntimeError("cannot release un-acquired lock")
    self.__count = count = self.__count - 1
    if not count:
        self.__owner = None
        self.__block.release()
        ...
    ...

下面我们看一个简单的Demo，

import threading,time
lock=threading.RLock()
result=[]


def func1():
    global result
    if lock.acquire():
        result.append('func1')
        time.sleep(1)
        lock.release()
def func2():
    global result
    if lock.acquire():
        result.append('step2')
        time.sleep(1)
        lock.release()

def create():
    global result
    if lock.acquire():
        func1()
        func2()
        lock.release()
print result

def clear():
    global result
    if lock.acquire():
        result=None
        time.sleep(2)
        lock.release()
    print result

t1=threading.Thread(target= create)
t2=threading.Thread(target= clear)

t1.start()
t2.start()
t1.join()
t2.join()

Condition

一个线程在等待特定的条件而另一个线程表明这个特定条件已经发生。只要条件发生，线程就需要获得lock为了独立的使用共享资源，比如生产者-消费者模式。在随机时间内，一个生产者追加一个随机整数到共享资源list；一个消费者从这个共享资源list取出这些整数。
我们先来看一下生产者这个类，消费者(线程)获得lock，往列表里追加整数然后通知消费者有东西需要从共享资源取出，最后释放lock。生产者会在每次追加元素后随机暂停一点时间。

#! -*- coding: UTF-8 -*-

import threading
import random
import time


class Producer(threading.Thread):
    """
    Produces random integers to a list
    """

    def __init__(self, integers, condition):
        """
        Constructor.

        @param integers list of integers
        @param condition condition synchronization object
        """
        threading.Thread.__init__(self)
        self.integers = integers
        self.condition = condition

    def run(self):
        """
        Thread run method. Append random integers to the integers list
        at random time.
        """
        while True:
            integer = random.randint(0, 256)
            self.condition.acquire()
            print 'condition acquired by %s' % self.name
            self.integers.append(integer)
            print '%d appended to list by %s' % (integer, self.name)
            print 'condition notified by %s' % self.name
            self.condition.notify()
            print 'condition released by %s' % self.name
            self.condition.release()
            time.sleep(1)

再来看一下消费者这个类，消费者首先获得lock，检查共享资源是否有整数。如果没有就会通过wait()通知生产者需要生产数据，如果有整数就会取出数据最后释放lock。

#! -*- coding: UTF-8 -*-

import threading


class Consumer(threading.Thread):
    """
    Consumes random integers from a list
    """

    def __init__(self, integers, condition):
        """
        Constructor.

        @param integers list of integers
        @param condition condition synchronization object
        """
        threading.Thread.__init__(self)
        self.integers = integers
        self.condition = condition

    def run(self):
        """
        Thread run method. Consumes integers from list
        """
        while True:
            self.condition.acquire()
            print 'condition acquired by %s' % self.name
            while True:
                if self.integers:
                    integer = self.integers.pop()
                    print '%d popped from list by %s' % (integer, self.name)
                    break
                print 'condition wait by %s' % self.name
                self.condition.wait()
            print 'condition released by %s' % self.name
            self.condition.release()

每一次调用wait() 会释放lock，所以生产者可以获得lock来生产数据。
然后通过main方法调用，我们来看一下输出结果:

def main():
    integers = []
    condition = threading.Condition()
    t1 = Producer(integers, condition)
    t2 = Consumer(integers, condition)
    t1.start()
    t2.start()
    t1.join()
    t2.join()


if __name__ == '__main__':
    main()
###output
condition acquired by Thread-1
46 appended to list by Thread-1
condition notified by Thread-1
condition released by Thread-1
condition acquired by Thread-2
46 popped from list by Thread-2
condition released by Thread-2
condition acquired by Thread-2
condition wait by Thread-2
condition acquired by Thread-1
19 appended to list by Thread-1
condition notified by Thread-1
condition released by Thread-1
19 popped from list by Thread-2
condition released by Thread-2
condition acquired by Thread-2
condition wait by Thread-2
condition acquired by Thread-1
228 appended to list by Thread-1
condition notified by Thread-1
condition released by Thread-1
228 popped from list by Thread-2
......

输出结果很多，我们简单的来分析一下输出结果，Thread-1 获得lock并追加46到共享资源list中，然后通知消费者需要从共享资源里面取出资源并消费，最后释放lock。Thread-2获得lock，从共享资源取出46，最后释放lock。这时生产者仍然在等待(sleep(1)),所以消费者Thread-2获得lock，此时共享资源list没有生产者生产出来的数据，通过wait()通知生产者需要生产数据。当wait被调用时，消费者解锁共享资源以便于生产者来获得它并生产数据即追加一个新的整数到共享资源list。
对于Conditio的同步机制，我们来看一下Python内部实现机制。在构造函数里面初始化了一个RLock，这个lock可以通过acquire()和release()控制lock的状态。

class _Condition(_Verbose):

    def __init__(self, lock=None, verbose=None):
        _Verbose.__init__(self, verbose)
        if lock is None:
            lock = RLock()
        self.__lock = lock

我们再来看一下wait()方法，我们假设调用的wait方法没有任何timeout的值，只是简单的解释wait方法的代码。一个名为waiter的新lock被创建并且状态是locked。waiter常用于线程间的通信，所以生产者可以通知消费者通过释放waiter lock。Lock对象被加到waiters的列表中并且在waiter.acquire()后，方法马上阻塞。值得注意的是condition lock在开始时lock的state被保存，当wait()返回时lock state被重置。

def wait(self, timeout=None):
    ...
    waiter = _allocate_lock()
    waiter.acquire()
    self.__waiters.append(waiter)
    saved_state = self._release_save()
    try:    # restore state no matter what (e.g., KeyboardInterrupt)
        if timeout is None:
            waiter.acquire()
            ...
        ...
    finally:
        self._acquire_restore(saved_state)

notify方法常用于释放waiter lock，生产者调用notify()通知阻塞在wait()的消费者。

def notify(self, n=1):
    ...
    __waiters = self.__waiters
    waiters = __waiters[:n]
    ...
    for waiter in waiters:
        waiter.release()
        try:
            __waiters.remove(waiter)
        except ValueError:
            pass

由于condition也实现了上下文管理器，所以我们也可以通过with来处理。

class Producer(threading.Thread):
    ...
    def run(self):
        while True:
            integer = random.randint(0, 256)
            with self.condition:
                print 'condition acquired by %s' % self.name
                self.integers.append(integer) 
                print '%d appended to list by %s' % (integer, self.name)
                print 'condition notified by %s' % self.name
                self.condition.notify()
                print 'condition released by %s' % self.name
            time.sleep(1)

class Consumer(threading.Thread):
    ... 
    def run(self):
        while True:
            with self.condition:
                print 'condition acquired by %s' % self.name
                while True:
                    if self.integers:
                        integer = self.integers.pop()
                        print '%d popped from list by %s' % (integer, self.name)
                        break
                    print 'condition wait by %s' % self.name
                    self.condition.wait()
                print 'condition released by %s' % self.name

Semaphore

信号量是基于内部计数器counter，每次acquire()被调用时counter减1，每次release()被调用计数器加1。如果counter==0，再去调用acquire()将阻塞。这是Python内部实现了Dijkstra的信号概念：P()和V()原语.当您想要像服务器一样控制对资源的访问时，使用信号量是有意义的。

semaphore = threading.Semaphore()
semaphore.acquire()
# work on a shared resource
...
semaphore.release()

让我们来看一下Python内部实现的细节。构造函数获取一个值，这个值是counter的初始值。这个值得初始化为1，一个condition的instance用一个lock被创建去保护这个counter的instance，并且当这个信号量被释放时通知另一个线程。我们来看一下详细的代码。

class _Semaphore(_Verbose):
    ...    
    def __init__(self, value=1, verbose=None):
        _Verbose.__init__(self, verbose)
        self.__cond = Condition(Lock())
        self.__value = value

我们来看一下acquire()方法，如果信号量的counter等于0，它一直阻塞在condition的wait()方法直到它得到另一个线程的通知。如果信号量的counter大于0，它将减少counter的值。我们来看一下代码：

def acquire(self, blocking=1):
    rc = False
    self.__cond.acquire()
    while self.__value == 0:
        ...
        self.__cond.wait()
    else:
        self.__value = self.__value - 1
        rc = True
    self.__cond.release()
    return rc

信号量的release()方法增加counter的值并且然后通知其他的线程。

def release(self):
    self.__cond.acquire()
    self.__value = self.__value + 1
    self.__cond.notify()
    self.__cond.release()

值得注意一点是，一个有界的信号量用来确保信号量不会多次调用release(),下面是Python的内部代码：

class _BoundedSemaphore(_Semaphore):
    """Semaphore that checks that # releases is <= # acquires"""
    def __init__(self, value=1, verbose=None):
        _Semaphore.__init__(self, value, verbose)
        self._initial_value = value

    def release(self):
        if self._Semaphore__value >= self._initial_value:
            raise ValueError, "Semaphore released too many times"
        return _Semaphore.release(self)

你同样可以通过with语句来管理信号量对象.

semaphore = threading.Semaphore()
with semaphore:
  # work on a shared resource
  ...

Event

这是一个相对于前面几种方法是一个比较简单的机制。一个线程发出一个event的信号并且其他的线程等待它。让我重新返回到我们的生产者和消费者的例子，并用event代替condition。首先来看一下生产者的类，我们传递一个Event的实例给构造函数代替原来的Condition实例。每次一个整数倍追加到list，event被设置，然后立即清除通知消费者。默认情况下event 实例是被清除状态。

class Producer(threading.Thread):
    """
    Produces random integers to a list
    """

    def __init__(self, integers, event):
        """
        Constructor.

        @param integers list of integers
        @param event event synchronization object
        """
        threading.Thread.__init__(self)
        self.integers = integers
        self.event = event
    
    def run(self):
        """
        Thread run method. Append random integers to the integers list
        at random time.
        """
        while True:
            integer = random.randint(0, 256)
            self.integers.append(integer) 
            print '%d appended to list by %s' % (integer, self.name)
            print 'event set by %s' % self.name
            self.event.set()
            self.event.clear()
            print 'event cleared by %s' % self.name
            time.sleep(1)

下面我们来看一下消费者类。我们同样创第一个event的实例给构造函数。消费者实例将一直阻塞在wait()方法直到event被设置即调用set()表明有个整数需要被消费掉。

class Consumer(threading.Thread):
    """
    Consumes random integers from a list
    """

    def __init__(self, integers, event):
        """
        Constructor.

        @param integers list of integers
        @param event event synchronization object
        """
        threading.Thread.__init__(self)
        self.integers = integers
        self.event = event
    
    def run(self):
        """
        Thread run method. Consumes integers from list
        """
        while True:
            self.event.wait()
            try:
                integer = self.integers.pop()
                print '%d popped from list by %s' % (integer, self.name)
            except IndexError:
                # catch pop on empty list
                time.sleep(1)

让我们来看一下Python的内部细节。首先来看一下Event的构造函数。使用lock创建event的实例，以保护event flag值，并在设置(set()) event时通知其他线程。

class _Event(_Verbose):
    def __init__(self, verbose=None):
        _Verbose.__init__(self, verbose)
        self.__cond = Condition(Lock())
        self.__flag = False

下面是set()方法。它设置flag值为True并通知其他的线程。当flag的值发生变化或者被改变时，condition的对象常用于保护极其重要的部分。

def set(self):
    self.__cond.acquire()
    try:
        self.__flag = True
        self.__cond.notify_all()
    finally:
        self.__cond.release()

clear()方法设置flag的值为Fale。

def clear(self):
    self.__cond.acquire()
    try:
        self.__flag = False
    finally:
        self.__cond.release()

wait()方法一直阻塞直到set()方法被调用。如果event的flag已经被设置了值，那么wait()方法什么也不做。

def wait(self, timeout=None):
    self.__cond.acquire()
    try:
        if not self.__flag:
            self.__cond.wait(timeout)
    finally:
        self.__cond.release()

Queue

Queue是一种有效的机制，特别是当我们需要在线程之间交换一些数据。有四个主要的方法分别如下：
1.put:放入一个item到队列中。
2.get:从队列中删除一个item，并返回删除的item。
3.task_done:每次一个item被处理完，就需要调用该方法。
4.join：一直阻塞直到所有的items都被处理完。

当你熟悉了这几个重要的方法，那么我们改写成Queue的方法就很简单。

class Producer(threading.Thread):
    """
    Produces random integers to a list
    """

    def __init__(self, queue):
        """
        Constructor.

        @param integers list of integers
        @param queue queue synchronization object
        """
        threading.Thread.__init__(self)
        self.queue = queue
    
    def run(self):
        """
        Thread run method. Append random integers to the integers list at
        random time.
        """
        while True:
            integer = random.randint(0, 256)
            self.queue.put(integer) 
            print '%d put to queue by %s' % (integer, self.name)
            time.sleep(1)



class Consumer(threading.Thread):
    """
    Consumes random integers from a list
    """

    def __init__(self, queue):
        """
        Constructor.

        @param integers list of integers
        @param queue queue synchronization object
        """
        threading.Thread.__init__(self)
        self.queue = queue
    
    def run(self):
        """
        Thread run method. Consumes integers from list
        """
        while True:
            integer = self.queue.get()
            print '%d popped from list by %s' % (integer, self.name)
            self.queue.task_done()

Queue对于我们来说是一个非常好的机制。它把lock的机制替我们实现了，我们不需要关心和重新实现，只需要专注于我们自己的业务逻辑。这是一个很大的优势。我们来看一下Python内部是如何实现的？
当一个元素被增加或者被删除时，为了保护队列Queue的构造函数创建了一个lock对象。一些condition对象被创建去通知一些events，比如：queue is not empty(get() call stops blocking),queue is not full(put() call stops blocking) and all items have been processed(join() call stops blocking).

class Queue:
    def __init__(self, maxsize=0):
        ...
        self.mutex = threading.Lock()
        self.not_empty = threading.Condition(self.mutex)
        self.not_full = threading.Condition(self.mutex)
        self.all_tasks_done = threading.Condition(self.mutex)
        self.unfinished_tasks = 0

put()往队列增加一个元素，如果队列满时将等待。put通知多线程要从队列里面取出数据，如果该队列不是空时将阻塞在get方法。我们来看一下代码：

def put(self, item, block=True, timeout=None):
    ...
    self.not_full.acquire()
    try:
        if self.maxsize > 0:
            ...
            elif timeout is None:
                while self._qsize() == self.maxsize:
                    self.not_full.wait()
        self._put(item)
        self.unfinished_tasks += 1
        self.not_empty.notify()
    finally:
        self.not_full.release()

get()方法是从队列中删除一个元素并返回删除的元素，但是如果该队列为空时则等待。get取出元素后如果该队列还不满则通知所有线程。

def get(self, block=True, timeout=None):
    ...
    self.not_empty.acquire()
    try:
        ...
        elif timeout is None:
            while not self._qsize():
                self.not_empty.wait()
        item = self._get()
        self.not_full.notify()
        return item
    finally:
        self.not_empty.release()

当take_done调用时，未完成的数量自减1.如果计数等于0，然后线程等待队列的join()方法继续执行。

def task_done(self):
    self.all_tasks_done.acquire()
    try:
        unfinished = self.unfinished_tasks - 1
        if unfinished <= 0:
            if unfinished < 0:
                raise ValueError('task_done() called too many times')
            self.all_tasks_done.notify_all()
        self.unfinished_tasks = unfinished
    finally:
        self.all_tasks_done.release()

def join(self):
    self.all_tasks_done.acquire()
    try:
        while self.unfinished_tasks:
            self.all_tasks_done.wait()
    finally:
        self.all_tasks_done.release()

总结

文章比较长，但是对Python的同步机制讲解的淋漓尽致。如果本文对你有帮助，请打赏一下。