Python的并行(持续更新)

0. 参考:

  1. 《Python并行编程 中文版》https://python-parallel-programmning-cookbook.readthedocs.io/zh_CN/latest/index.html

1. 线程和进程:

  1. 进程可以包含多个并行运行的线程;
  2. 通常,操作系统创建和管理线程比进程更省CPU资源;
  3. 线程用于一些小任务,进程用于繁重的任务;
  4. 同一个进程下的线程共享地址空间和其他资源,进程之间相互独立;

2. 在Python中使用线程:

2.1 多线程简介:

  1. 线程是独立的处理流程,可以和系统的其他线程并行或并发地执行。
  2. 多线程可以共享数据和资源,利用所谓的共享内存空间。
  3. 每一个线程基本上包含3个元素:程序计数器,寄存器和栈。
  4. 线程的状态大体上可以分为ready, running, blocked
  5. 多线程编程一般使用共享内容空间进行线程间的通讯,这就使管理内容空间成为多线程编程的重点和难点。
  6. 线程的典型应用是应用软件的并行化。
  7. 相比于进程,使用线程的优势主要是性能。

2.2 threading 库实现多线程:

1. 如何定义一个线程 threading.Thread():

class threading.Thread(group=None,   ## 一般设置为 None ,这是为以后的一些特性预留的
                       target=None,  ## 当线程启动的时候要执行的函数
                       name=None,    ## 线程的名字,默认会分配一个唯一名字 Thread-N
                       args=(),      ## 使用 tuple 类型给 target 传递参数
                       kwargs={})    ## 使用 dict 类型给 target 传递参数

举例:

import threading

def function(i):
    print("function called by thread %i\n" % i)
    return

#threads = []
for i in range(5):
    t = threading.Thread(target=function, args=(i,)) ## 用 function 函数初始化一个 Thread 对象 t,并将参数 i 传入;
    #threads.append(t) 
    t.start() ## 线程被创建后不会马上执行,需要手动调用 .start() 方法执行线程
    t.join() ## 阻塞调用 t 线程的主线程,t 线程执行结束,主线程才会继续执行

运行结果:

function called by thread 0

function called by thread 1

function called by thread 2

function called by thread 3

function called by thread 4

2. 如何确定当前线程 threading.current_thread().name

  1. 通常一个服务进程中有多个线程服务,负责不同的操作,所以对于线程的命名是很重要的;
  2. Python中每一个线程在被 Thread 被创建时都有一个默认的名字(可以修改);

举例:

import threading
import time

def first_func():
    print(threading.current_thread().name + str(" is Starting"))
    time.sleep(2)
    print(threading.current_thread().name + str("is Exiting"))
    return

def second_func():
    print(threading.current_thread().name + str(" is Starting"))
    time.sleep(2)
    print(threading.current_thread().name + str("is Exiting"))
    return

def third_func():
    print(threading.current_thread().name + str(" is Starting"))
    time.sleep(2)
    print(threading.current_thread().name + str("is Exiting"))
    return

if __name__ == "__main__":
    t1 = threading.Thread(name="first_func", target=first_func)
    t2 = threading.Thread(name="second_func", target=second_func)
    t3 = threading.Thread(target=third_func)
    t1.start()
    t2.start()
    t3.start()
    t1.join()
    t2.join()
    t3.join()

运行结果:

first_func is Starting
second_func is Starting
Thread-36 (third_func) is Starting
first_funcis Exiting
second_funcis Exiting
Thread-36 (third_func)is Exiting

从上面运行结果可以看出,如果不用 name= 参数指定线程名称的话,那么线程名称将使用默认值。

3. 如何实现一个线程 threading:

使用 threading 模块实现一个线程,需要3步:

  1. 定义一个 Thread 类的子类;
  2. 重写 __init__(self, [,args]) 方法;
  3. 重写 run(self, [,args]) 方法实现一个线程;

举例:

import threading
#import _thread
import time

#exitFlag = 0

class myThread(threading.Thread): ## 定义一个 threading 子类,继承 threading.Thread 父类
    def __init__(self, threadID, name, counter):  ## 重写 __init__() 方法,并添加额外的参数
        threading.Thread.__init__(self) ## 初始化继承自Thread类的属性,使子类对象能够正确地继承和使用父类的属性和方法
        self.threadID = threadID ## 子类额外的属性
        self.name = name
        self.counter = counter

    def run(self):
        print("Starting " + self.name)
        print_time(self.name, self.counter, 5)
        print("Exiting " + self.name)

def print_time(threadName, delay, counter):
    while counter:
        ##if exitFlag:  ## 当 exitFlag != 0时,执行 _thread.exit(),线程退出 (但是在该段代码中,exitFlag的值没有被改变,所以不会执行 _thread.exit(),所以可以直接注释掉)
        ##    _thread.exit()
        time.sleep(delay)
        print("%s: %s" % (threadName, time.ctime(time.time())))
        counter -= 1

## 创建线程
thread1 = myThread(1, "Thread-1", 1)
thread2 = myThread(2, "Thread-2", 2)
## 开启线程
thread1.start()
thread2.start()
## .join()
thread1.join()
thread2.join()
print("Exiting Main Thread")

运行结果:

Starting Thread-1
Starting Thread-2

Thread-1: Wed Jun 21 11:12:09 2023
Thread-2: Wed Jun 21 11:12:10 2023
Thread-1: Wed Jun 21 11:12:10 2023
Thread-1: Wed Jun 21 11:12:11 2023
Thread-2: Wed Jun 21 11:12:12 2023
Thread-1: Wed Jun 21 11:12:12 2023
Thread-1: Wed Jun 21 11:12:13 2023
Exiting Thread-1
Thread-2: Wed Jun 21 11:12:14 2023
Thread-2: Wed Jun 21 11:12:16 2023
Thread-2: Wed Jun 21 11:12:18 2023
Exiting Thread-2
Exiting Main Thread

由于 thread1 的 sleep 时间比 thread2 的时间短,所以 thread2 会执行更久一些,退出也就更晚一些。

4. 使用 Lock 进行线程同步 (lock()):

  1. 并发线程中,多个线程对共享内存进行操作,并且至少有一个可以改变数据。这种情况下如果没有同步机制,那么多个线程之间就会产生竞争,从而导致代码无效或出错。
  2. 解决多线程竞争问题的最简单的方法就是用 (Lock)。当一个线程需要访问共享内存时,它必须先获得 Lock 之后才能访问;当该线程对共享资源使用完成后,必须释放 Lock,然后其他线程在拿到 Lock 进行访问资源。因此,为了避免多线程竞争的出现,必须保证:同一时刻只能允许一个线程访问共享内存。
  3. 在实际使用中,该方法经常会导致一种 死锁 现象,原因是不同线程互相拿着对方需要的 Lock,导致死锁的发生。
    详见:https://python-parallel-programmning-cookbook.readthedocs.io/zh_CN/latest/chapter2/06_Thread_synchronization_with_Lock_and_Rlock.html

举例:

import threading

shared_resource_with_lock = 0
shared_resource_with_no_lock = 0
COUNT = 100000
shared_resource_lock = threading.Lock() ## 锁

## 有锁的情况
def increment_with_lock():
    global shared_resource_with_lock ## shared_resource_with_lock 即最外面的 shared_resource_with_lock,这样写就不需要再通过函数的参数引入 shared_resource_with_lock 了
    for _ in range(COUNT):
        shared_resource_lock.acquire() ## 获取 锁
        shared_resource_with_lock += 1
        shared_resource_lock.release() ## 释放 锁

def decrement_with_lock():
    global shared_resource_with_lock
    for _ in range(COUNT):
        shared_resource_lock.acquire()
        shared_resource_with_lock -= 1
        shared_resource_lock.release()


## 没有锁的情况
def increment_without_lock():
    global shared_resource_with_no_lock
    for _ in range(COUNT):
        shared_resource_with_no_lock += 1

def decrement_without_lock():
    global shared_resource_with_no_lock
    for _ in range(COUNT):
        shared_resource_with_no_lock -= 1


if __name__ == "__main__":
    t1 = threading.Thread(target=increment_with_lock)
    t2 = threading.Thread(target=decrement_with_lock)
    t3 = threading.Thread(target=increment_without_lock)
    t4 = threading.Thread(target=decrement_without_lock)

    ## 开启线程
    t1.start()
    t2.start()
    t3.start()
    t4.start()
    ## .join()
    t1.join()
    t2.join()
    t3.join()
    t4.join()
    print ("the value of shared variable with lock management is %s" % shared_resource_with_lock)
    print ("the value of shared variable with race condition is %s" % shared_resource_with_no_lock)

运行结果:

the value of shared variable with lock management is 0
the value of shared variable with race condition is 0

尽管在上面的结果中,没锁的情况下得到的结果也是正确的,但是执行多次,总会出现错误的结果;而有锁的情况下,执行多次,结果一定是正确的。

尽管理论上用锁的策略可以避免多线程中的竞争问题,但是可能会对程序的其他方面产生负面影响。此外,锁的策略经常会导致不必要的开销,也会限制程序的可扩展性和可读性。更重要的是,有时候需要对多进程共享的内存分配优先级,使用锁可能和这种优先级冲突。从实践的经验来看,使用锁的应用将对debug带来不小的麻烦。所以,最好使用其他可选的方法确保同步读取共享内存,避免竞争条件。

5. 使用RLock进行线程同步:

  1. 为了保证 “只有拿到锁的线程才能释放锁”,那么应该使用 RLock() 对象;
  2. Lock()一样,RLock()也有acquire()release()两种方法;
  3. RLock() 有三个特点:
      1). 谁拿到谁释放。如果线程A拿到锁,线程B无法释放这个锁,只有A可以释放;
      2). 同一线程可以多次拿到该锁,即可以acquire多次;
      3). acquire多少次就必须release多少次,只有最后一次release才能改变RLock的状态为unlocked);

举例:

import threading
import time

class Box(object):
    lock = threading.RLock()

    def __init__(self):
        self.total_items = 0
    
    def execute(self, n):
        Box.lock.acquire()
        self.total_items += n
        Box.lock.release()
    
    def add(self):
        Box.lock.acquire()
        self.execute(1)
        Box.lock.release()
    
    def remove(self):
        Box.lock.acquire()
        self.execute(-1)
        Box.lock.release()

def adder(box, items):
    while items > 0:
        print("adding 1 item in the box")
        box.add()
        time.sleep(1)
        items -= 1

def remover(box, items):
    while items > 0:
        print("removing 1 item in the box")
        box.remove()
        time.sleep(1)
        items -= 1


if __name__ == "__main__":
    items = 5
    print("putting %s items in the box"% items)
    box = Box()
    t1 = threading.Thread(target=adder, args=(box, items))
    t2 = threading.Thread(target=remover, args=(box, items))

    t1.start()
    t2.start()

    t1.join()
    t2.join()
    print("%s items still remain in the box " % box.total_items)

运行结果:

putting 5 items in the box
adding 1 item in the box
removing 1 item in the box
adding 1 item in the box
removing 1 item in the box

removing 1 item in the box
adding 1 item in the box

removing 1 item in the box
adding 1 item in the box
adding 1 item in the box
removing 1 item in the box

0 items still remain in the box 

Box类的execute()方法包含RLockadder()remover()方法也包含RLock,就是说无论是调用Box还是adder()或者remover(),每个线程的每一步都有拿到资源、释放资源的过程。

6. 使用信号量进行线程同步:

  1. 信号量是由操作系统管理的一种抽象数据类型,用于多线程中同步对共享资源的使用;
  2. 信号量是一个内部数据,用于表明当前共享资源可以有多少并发读取;
  3. Threading 中,信号量的操作有两个函数:acquire()release()

举例:

import threading
import time
import random

semaphore = threading.Semaphore(0) ## 可以理解为一个内置的计数器,当调用 acquire 方法时候内置计数器 -1,对应着申请资源;调用 release 方法时候内置计数器+1,对应着释放可用资源。
print("init semaphore %s" % semaphore._value)

def consumer():
    print("consumer is waiting.")
    semaphore.acquire()
    print("consumer notify: consumed item number %s" % item)
    print("consumer semaphore %s" % semaphore._value)

def producer():
    global item
    time.sleep(10)
    item = random.randint(0, 1000)
    print("producer notify : produced item number %s" % item)
    semaphore.release()
    print("producer semaphore %s" % semaphore._value)

if __name__ == "__main__":
    for _ in range(0, 5):
        t1 = threading.Thread(target=producer)
        t2 = threading.Thread(target=consumer)
        t1.start()
        t2.start()
        t1.join()
        t2.join()
    print("program terminated")

运行结果:

init semaphore 0
consumer is waiting.
producer notify : produced item number 756
producer semaphore 1
consumer notify: consumed item number 756
consumer semaphore 0
consumer is waiting.
producer notify : produced item number 948
producer semaphore 1
consumer notify: consumed item number 948
consumer semaphore 0
consumer is waiting.
producer notify : produced item number 597
producer semaphore 1
consumer notify: consumed item number 597
consumer semaphore 0
consumer is waiting.
producer notify : produced item number 239
producer semaphore 1
consumer notify: consumed item number 239
consumer semaphore 0
consumer is waiting.
producer notify : produced item number 141
producer semaphore 1
consumer notify: consumed item number 141
consumer semaphore 0
program terminated

根据semaphore = threading.Semaphore(0)将信号量初始化为0,其目的在于同步两个或多个线程。
producer()执行完后,通过seaphore.release()释放资源,之后consumer()通过semaphore.acquire()拿到资源;
相应的信号量的计数器也会从初始化的0 --> 1 --> 多次重复。

信号量的一个特殊用法是互斥量,互斥量是初始值为1的信号量,可以实现数据、资源的互斥访问;

7. 使用条件进行线程同步:

  1. 条件:指的是程序状态的改变;
  2. 某些线程在等待某一条件发生,其他的线程会在该条件发生的时候进行通知。一旦条件发生,线程会拿到共享资源的唯一权限。

举例:

from threading import Thread, Condition
import time

items = []
condition = Condition() ## A condition variable allows one or more threads to wait until they are notified by another thread.

class consumer(Thread):
    def __init__(self):
        Thread.__init__(self)
    
    def consume(self):
        global condition
        global items
        condition.acquire()
        if len(items) == 0:
            condition.wait() ## Wait until notified or until a timeout occurs.
            print("Consumer notify : no item to consume")
        items.pop()
        print("Consumer notify : consumed 1 item")
        print("Consumer notify : items to consume are " + str(len(items)))
        
        condition.notify() ## Wake up one or more threads waiting on this condition, if any.
        condition.release()

    def run(self):
        for _ in range(0, 20):
            time.sleep(2)
            self.consume()


class producer(Thread):
    def __init__(self):
        Thread.__init__(self)
    
    def produce(self):
        global condition
        global items
        condition.acquire()
        if len(items) == 10:
            condition.wait()
            print("Producer notify : items producted are " + str(len(items)))
            print("Producer notify : stop the production!!")
        items.append(1)
        print("Producer notify : total items producted " + str(len(items)))
        condition.notify()
        condition.release()
    
    def run(self):
        for _ in range(0, 20):
            time.sleep(1)
            self.produce()


if __name__ == "__main__":
    producer = producer()
    consumer = consumer()
    producer.start()
    consumer.start()
    producer.join()
    consumer.join()

运行结果:

Producer notify : total items producted 1
Consumer notify : consumed 1 item
Consumer notify : items to consume are 0
Producer notify : total items producted 1
Producer notify : total items producted 2
Consumer notify : consumed 1 item
Consumer notify : items to consume are 1
Producer notify : total items producted 2
Producer notify : total items producted 3
Consumer notify : consumed 1 item
Consumer notify : items to consume are 2
Producer notify : total items producted 3
Producer notify : total items producted 4
Consumer notify : consumed 1 item
Consumer notify : items to consume are 3
Producer notify : total items producted 4
Producer notify : total items producted 5
Consumer notify : consumed 1 item
Consumer notify : items to consume are 4
Producer notify : total items producted 5
Producer notify : total items producted 6
Consumer notify : consumed 1 item
Consumer notify : items to consume are 5
Producer notify : total items producted 6
Producer notify : total items producted 7
Consumer notify : consumed 1 item
Consumer notify : items to consume are 6
Producer notify : total items producted 7
Producer notify : total items producted 8
Consumer notify : consumed 1 item
Consumer notify : items to consume are 7
Producer notify : total items producted 8
Producer notify : total items producted 9
Consumer notify : consumed 1 item
Consumer notify : items to consume are 8
Producer notify : total items producted 9
Producer notify : total items producted 10
Consumer notify : consumed 1 item
Consumer notify : items to consume are 9
Producer notify : total items producted 10
Consumer notify : consumed 1 item
Consumer notify : items to consume are 9
Consumer notify : consumed 1 item
Consumer notify : items to consume are 8
Consumer notify : consumed 1 item
Consumer notify : items to consume are 7
Consumer notify : consumed 1 item
Consumer notify : items to consume are 6
Consumer notify : consumed 1 item
Consumer notify : items to consume are 5
Consumer notify : consumed 1 item
Consumer notify : items to consume are 4
Consumer notify : consumed 1 item
Consumer notify : items to consume are 3
Consumer notify : consumed 1 item
Consumer notify : items to consume are 2
Consumer notify : consumed 1 item
Consumer notify : items to consume are 1
Consumer notify : consumed 1 item
Consumer notify : items to consume are 0

整个过程有点绕,可以通过这个例子简单理解 使用 condition 进行线程同步 (https://blog.csdn.net/lzanze/article/details/105351064)

8. 使用事件进行线程同步:

事件:线程之间用于通讯的对象。

举例:

import time
from threading import Thread, Event
import random

items = []
event = Event()

class consumer(Thread):
    def __init__(self, items, event):
        Thread.__init__(self)
        self.items = items
        self.event = event
    
    def run(self):
        while True:
            time.sleep(2)
            self.event.wait() ## 线程会阻塞在这里,直到事件 event 被触发(set.evet.set())才会继续执行。
            item = self.items.pop()
            print("Consumer notify : %d popped from list by %s" % (item, self.name))


class producer(Thread):
    def __init__(self, items, event):
        Thread.__init__(self)
        self.items = items
        self.event = event

    def run(self):
        global item
        for _ in range(10):
            time.sleep(2)
            item = random.randint(0, 256)
            self.items.append(item) ## 将 item 添加到list末尾然后通过 self.event.set() 和 self.event.clear() 发出事件通知
            print('Producer notify : item %d appended to list by %s' % (item, self.name))
            print('Producer notify : event set by %s' % self.name)
            self.event.set() ## set() 方法将内部变量设置为 True (is_set() == True)
            print("Produce notify : event cleared by %s" % self.name)
            #print("Produce set event label : ", self.event.is_set())
            self.event.clear() ## clear() 方法将内部变量设置为 False (is_set() == False)
            #print("Produce clear event label : ", self.event.is_set())


if __name__ == "__main__":
    t1 = producer(items, event)
    t2 = consumer(items, event)
    t1.start()
    t2.start()
    t1.join()
    t2.join()

运行结果(部分结果):

Producer notify : item 140 appended to list by Thread-64
Producer notify : event set by Thread-64
Produce notify : event cleared by Thread-64
Consumer notify : 140 popped from list by Thread-65
Producer notify : item 42 appended to list by Thread-64
Producer notify : event set by Thread-64
Produce notify : event cleared by Thread-64
Producer notify : item 101 appended to list by Thread-64
Producer notify : event set by Thread-64
Produce notify : event cleared by Thread-64
Consumer notify : 101 popped from list by Thread-65
Producer notify : item 213 appended to list by Thread-64
Producer notify : event set by Thread-64
Produce notify : event cleared by Thread-64
Producer notify : item 31 appended to list by Thread-64
Producer notify : event set by Thread-64
Produce notify : event cleared by Thread-64
Consumer notify : 31 popped from list by Thread-65
Producer notify : item 235 appended to list by Thread-64
Producer notify : event set by Thread-64
Produce notify : event cleared by Thread-64

该脚本我自己在运行的时候,运行了20多分钟,没结束,所以直接停了(部分结果如上所示)。

9. 使用 with 语法:

  1. 当两个相关的操作需要在一部分代码块前后分别执行的时候,可以使用 with 语法;
  2. 使用 with 语法可以在特定的地方分配和释放资源,因此, with 语法也叫做“上下文管理器”;
  3. threading模块中,所有带有 acquire() 方法和 release() 方法的对象(包括Lock, RLock, Condition, Semaphore)都可以使用 with 语法;

举例:

import threading
import logging
logging.basicConfig(level=logging.DEBUG, format="(%(threadName)-10s) %(message)s", )

def threading_with(statement):
    with statement:
        logging.debug("%s acquired via with" % statement)

def threading_not_with(statement):
    statement.acquire()
    try:
        logging.debug("%s acquired directly" % statement)
    finally:
        statement.release()


if __name__ == "__main__":
    lock = threading.Lock()
    rlock = threading.RLock()
    condition = threading.Condition()
    mutex = threading.Semaphore(1)
    threading_synchronization_list = [lock, rlock, condition, mutex] ## 包含要测试的线程同步使用的对象

    for statement in threading_synchronization_list:
        t1 = threading.Thread(target=threading_with, args=(statement,))
        t2 = threading.Thread(target=threading_not_with, args=(statement,))
        t1.start()
        t2.start()
        t1.join()
        t2.join()

运行结果:

(Thread-68 (threading_with))  acquired via with
(Thread-69 (threading_not_with))  acquired directly
(Thread-70 (threading_with))  acquired via with
(Thread-71 (threading_not_with))  acquired directly
(Thread-72 (threading_with)) , 0)> acquired via with
(Thread-73 (threading_not_with)) , 0)> acquired directly
(Thread-74 (threading_with))  acquired via with
(Thread-75 (threading_not_with))  acquired directly

上述结果展示了使用with和不用with的每一个函数以及用在了哪些地方(Lock, RLock, Condition, Semaphore)

10. 使用 queue 进行线程通信:

threading 模块相比,queue 操作更简单、更安全。


更新中……

你可能感兴趣的:(并行计算,python,学习)