Boost_1_53_0终于迎来了久违的Boost.Lockfree模块,本着学习的心态,将其翻译如下。(原文地址:http://www.boost.org/doc/libs/1_53_0/doc/html/lockfree.html)
Table of Contents
目录
简介及动机
例子
解释
数据结构
内存管理
ABA预防
进程间支持
附录
未来发展
The term non-blocking denotes concurrent data structures,which do not use traditional synchronization primitives like guards to ensurethread-safety. Maurice Herlihy and Nir Shavit (compare "TheArt of Multiprocessor Programming")distinguish between 3 types of non-blocking data structures, each havingdifferent properties:
术语无阻塞表示并发数据结构,它们不使用传统同步原语例如守卫者来保证线程安全。MauriceHerlihy和NirShavit(相比《多处理器编程艺术》)区分了三种类型的无阻塞数据结构,每种均具有不同的特性:
Some data structures can only be implemented in a lock-freemanner, if they are used under certain restrictions. The relevant aspects forthe implementation of boost.lockfree
are thenumber of producer and consumer threads. Single-producer (sp) or multiple producer (mp) means that only a single thread ormultiple concurrent threads are allowed to add data to a data structure. Single-consumer (sc) or Multiple-consumer (mc) denote the equivalent for the removalof data from the data structure.
如果使用在一定的限制条件下,一些数据结构只能被无锁的方式实现。与boost.lockfree实现相对应的是生产者线程和消费者线程的数目。单生产者(sp)或多生产者(mp)意味着只有一个线程或多个并发线程被允许添加数据至某数据结构中。单消费者(sc)或多消费者(mc)则对应于从数据结构中移除数据。
Non-blocking data structures do not rely on locks and mutexes toensure thread-safety. The synchronization is done completely in user-spacewithout any direct interaction with the operating system [4].This implies that they are not prone to issues like priority inversion (alow-priority thread needs to wait for a high-priority thread).
无阻塞数据结构不依赖于锁和互斥量来保证线程安全。同步完全在用户空间中完成,而不需要与操作系统的任何直接交互。这意味着它们不容易出现例如优先反转(低优先级线程需要等待高优先级线程)等问题。
Instead of relying on guards, non-blocking data structures require atomicoperations (specificCPU instructions executed without interruption). This means that any threadeither sees the state before or after the operation, but no intermediate statecan be observed. Not all hardware supports the same set of atomic instructions.If it is not available in hardware, it can be emulated in software usingguards. However this has the obvious drawback of losing the lock-free property.
无阻塞数据结构需要原子操作(特定的CPU执行指令不中断)而不是依赖于“卫兵”。这意味着任何线程只能看到操作之前或之后的状态,而不能观察到任何中间状态。不是所有的硬件都支持同样的原子指令集。如果硬件上不支持,则可以在软件上使用“卫兵”来模拟。但这因为失去了无锁的性质而具有明显的缺陷,。
When discussing the performance of non-blocking data structures,one has to distinguish between amortized and worst-case costs. The definition of 'lock-free'and 'wait-free' only mention the upper bound of an operation. Thereforelock-free data structures are not necessarily the best choice for every usecase. In order to maximise the throughput of an application one should considerhigh-performance concurrent data structures [5].
在讨论无锁数据结构性能时,首先应该区分“平均”和“最坏情况”开销。“无锁”和“无等待”的定义仅提及了一个操作的上限。因此无锁数据结构并不总是在任何情况下都是最好的选择。为了使应用程序的吞吐量最大化,你应该考虑高性能并发数据结构。
Lock-free data structures will be a better choice in order tooptimize the latency of a system or to avoid priority inversion, which may be necessaryin real-time applications. In general we advise to consider if lock-free datastructures are necessary or if concurrent data structures are sufficient. Inany case we advice to perform benchmarks with different data structures for aspecific workload.
在优化系统延时或避免优先反转方面(在实时应用中可能需要这样),无锁数据结构将会是一个更好的选择。一般来说我们建议考虑是否需要无锁数据结构或者是否并发数据结构就够了?在任何情况下,我们都建议对一个特定的工作量利用不同的数据结构来执行基准测试。
Apart from locks and mutexes (which we are not using in boost.lockfree
anyway),there are three other aspects, that could violate lock-freedom:
除了锁和互斥量(我们不会在boost.lockfree中使用它们),还有其他三处可能会违反锁定自由的地方:
AtomicOperations
原子操作
Somearchitectures do not provide the necessary atomic operations in natively inhardware. If this is not the case, they are emulated in software usingspinlocks, which by itself is blocking.
有些系统架构在硬件层面不提供必需的原生原子操作。如果是这种情况,将会在软件层面使用自旋锁来模拟,这本身是阻塞的。
MemoryAllocations
内存分配
Allocatingmemory from the operating system is not lock-free. This makes it impossible toimplement true dynamically-sized non-blocking data structures. The node-baseddata structures of boost.lockfree
usea memory pool to allocate the internal nodes. If this memory pool is exhausted,memory for new nodes has to be allocated from the operating system. However alldata structures of boost.lockfree
canbe configured to avoid memory allocations (instead the specific calls will fail).This is especially useful for real-time systems that require lock-free memoryallocations.
从操作系统分配内存不是无锁的。这使得不可能实现真正动态大小的无阻塞数据结构。boost.lockfree中基于节点的数据结构使用内存池来分配内部节点。如果内存池被耗尽,新节点的内存就需要从操作系统中分配。但是所有boost.lockfree中的数据结构都能配置为避免内存分配(相对应的,某些调用将失败)。这对那些需要无锁内存分配的实时系统特别有用。
ExceptionHandling
异常处理
TheC++ exception handling does not give any guarantees about its real-timebehavior. We therefore do not encourage the use of exceptions and exceptionhandling in lock-free code.
C++异常处理对其实时性不做任何保证。因此我们不鼓励在无锁代码中使用异常和异常处理。
boost.lockfree
implementsthree lock-free data structures:
boost.lockfree实现了三种无锁数据结构:
alock-free multi-produced/multi-consumer queue
一个无锁的多生产者/多消费者队列
alock-free multi-produced/multi-consumer stack
一个无锁的多生产者/多消费者栈
await-free single-producer/single-consumer queue (commonly known as ringbuffer)
一个无等待的单生产者/单消费者队列(通常被称为环形缓冲区)
The data structures can be configured with Boost.Parameter-styletemplates:
数据结构能使用Boost.Parameter类型模板进行配置:
Configuresthe data structure as fixed sized. Theinternal nodes are stored inside an array and they are addressed by arrayindexing. This limits the possible size of the queue to the number of elementsthat can be addressed by the index type (usually 2**16-2), but on platformsthat lack double-width compare-and-exchange instructions, this is the best wayto achieve lock-freedom.
配置数据结构为固定大小。内部节点被存储在一个数组内,并使用数组索引定位。这将队列可能的大小限制在了能被索引类型映射的元素总数(通常是216-2),但是在缺少双宽的比较交换(compare-and-exchange,注:一般是记为compare-and-swap,CAS)指令的平台上,这是实现无锁的最好方式。
Setsthe capacity of a data structure at compile-time.This implies that a data structure is fixed-sized.
在编译时设置数据结构的容量。这意味着数据结构是固定大小的。
Definesthe allocator. boost.lockfree
supportsstateful allocator and is compatible with Boost.Interprocess allocators.
定义分配器。boost.lockfree支持具状态分配器,并且与Boost.Interprocess的分配器兼容。
The boost::lockfree::queue classimplements a multi-writer/multi-reader queue. The following example shows howinteger values are produced and consumed by 4 threads each:
类 boost::lockfree::queue 实现了一个多写入/多读取队列。下面的例子展示了如何产生整数,并被4个线程分别消费:
#include <boost/thread/thread.hpp> #include <boost/lockfree/queue.hpp> #include <iostream> #include <boost/atomic.hpp> boost::atomic_int producer_count(0); boost::atomic_int consumer_count(0); boost::lockfree::queue<int> queue(128); const int iterations = 10000000; const int producer_thread_count = 4; const int consumer_thread_count = 4; void producer(void) { for (int i = 0; i != iterations; ++i) { int value = ++producer_count; while (!queue.push(value)) ; } } boost::atomic<bool> done (false); void consumer(void) { int value; while (!done) { while (queue.pop(value)) ++consumer_count; } while (queue.pop(value)) ++consumer_count; } int main(int argc, char* argv[]) { using namespace std; cout << "boost::lockfree::queue is "; if (!queue.is_lock_free()) cout << "not "; cout << "lockfree" << endl; boost::thread_group producer_threads, consumer_threads; for (int i = 0; i != producer_thread_count; ++i) producer_threads.create_thread(producer); for (int i = 0; i != consumer_thread_count; ++i) consumer_threads.create_thread(consumer); producer_threads.join_all(); done = true; consumer_threads.join_all(); cout << "produced " << producer_count << " objects." << endl; cout << "consumed " << consumer_count << " objects." << endl; }
The program output is:
程序输出:
produced 40000000 objects. consumed 40000000 objects.
The boost::lockfree::stack classimplements a multi-writer/multi-reader stack. The following example shows howinteger values are produced and consumed by 4 threads each:
类boost::lockfree::stack实现了一个多写入/多读取栈。
下面的例子展示了如何产生整数,并被4个线程分别消费:
#include <boost/thread/thread.hpp> #include <boost/lockfree/stack.hpp> #include <iostream> #include <boost/atomic.hpp> boost::atomic_int producer_count(0); boost::atomic_int consumer_count(0); boost::lockfree::stack<int> stack(128); const int iterations = 1000000; const int producer_thread_count = 4; const int consumer_thread_count = 4; void producer(void) { for (int i = 0; i != iterations; ++i) { int value = ++producer_count; while (!stack.push(value)) ; } } boost::atomic<bool> done (false); void consumer(void) { int value; while (!done) { while (stack.pop(value)) ++consumer_count; } while (stack.pop(value)) ++consumer_count; } int main(int argc, char* argv[]) { using namespace std; cout << "boost::lockfree::stack is "; if (!stack.is_lock_free()) cout << "not "; cout << "lockfree" << endl; boost::thread_group producer_threads, consumer_threads; for (int i = 0; i != producer_thread_count; ++i) producer_threads.create_thread(producer); for (int i = 0; i != consumer_thread_count; ++i) consumer_threads.create_thread(consumer); producer_threads.join_all(); done = true; consumer_threads.join_all(); cout << "produced " << producer_count << " objects." << endl; cout << "consumed " << consumer_count << " objects." << endl; }
The program output is:
程序输出:
produced 4000000 objects. consumed 4000000 objects.
The boost::lockfree::spsc_queue classimplements a wait-free single-producer/single-consumer queue. The followingexample shows how integer values are produced and consumed by 2 separatethreads:
类 boost::lockfree::spsc_queue 实现了一个无等待的
单生产者/单消费者队列。下面的例子展示了如何产生整数,并被2个单独的线程消费:
#include <boost/thread/thread.hpp> #include <boost/lockfree/spsc_queue.hpp> #include <iostream> #include <boost/atomic.hpp> int producer_count = 0; boost::atomic_int consumer_count (0); boost::lockfree::spsc_queue<int, boost::lockfree::capacity<1024> > spsc_queue; const int iterations = 10000000; void producer(void) { for (int i = 0; i != iterations; ++i) { int value = ++producer_count; while (!spsc_queue.push(value)) ; } } boost::atomic<bool> done (false); void consumer(void) { int value; while (!done) { while (spsc_queue.pop(value)) ++consumer_count; } while (spsc_queue.pop(value)) ++consumer_count; } int main(int argc, char* argv[]) { using namespace std; cout << "boost::lockfree::queue is "; if (!spsc_queue.is_lock_free()) cout << "not "; cout << "lockfree" << endl; boost::thread producer_thread(producer); boost::thread consumer_thread(consumer); producer_thread.join(); done = true; consumer_thread.join(); cout << "produced " << producer_count << " objects." << endl; cout << "consumed " << consumer_count << " objects." << endl; }
The program output is:
程序输出:
produced 10000000 objects. consumed 10000000 objects.
数据结构
内存分配
ABA阻止
进程间支持
The implementations are implementations of well-known datastructures. The queue is based on Simple, Fast, and Practical Non-Blocking and Blocking ConcurrentQueue Algorithms by Michael Scott and Maged Michael,the stack is based on Systemsprogramming: coping with parallelism by R. K. Treiber andthe spsc_queue is considered as 'folklore' and is implemented in severalopen-source projects including the linux kernel. All data structures arediscussed in detail in "TheArt of Multiprocessor Programming" by Herlihy & Shavit.
该实现是著名的数据结构的实现。队列是基于MichaelScott和MagedMichael提出的简单、快速且实用的无阻塞和阻塞并发队列算法,栈是基于R. K.Treiber的《系统编程:并发处理》,特殊队列(spsc_queue)被认为是“民间传说(folklore)”,它是被几个开源项目实现的,其中包括linux内核。所有的数据结构都在Herlihy& Shavit的《多处理器编程艺术》中被详细讨论。
The lock-free boost::lockfree::queue and boost::lockfree::stack classesare node-based data structures, based on a linked list. Memory management oflock-free data structures is a non-trivial problem, because we need to avoidthat one thread frees an internal node, while another thread still uses it. boost.lockfree
usesa simple approach not returning any memory to the operating system. Insteadthey maintain a free-list in order to reuse them later. This isdone for two reasons: first, depending on the implementation of the memoryallocator freeing the memory may block (so the implementation would not belock-free anymore), and second, most memory reclamation algorithms arepatented.
无锁的类
boost::lockfree::queue和
boost::lockfree::stack是基于节点的数据结构,它们基于一个链表。无锁数据结构的内存管理是一个不平凡的问题,因为我们需要避免一个线程释放了一个内部节点,但另一个线程仍然在使用它的情况。Boost.lockfree使用了一个简单的方法不归还任何内存至操作系统。相反,它们维护了一个空链表以便之后再使用它们。这样做是出于两个原因:首先,依赖于内存分配器的实现释放内存,可能会阻塞(因此该实现将不再无锁),其次,大多数内存回收算法都是具有专利的。
The ABA problem is a common problem when implementing lock-freedata structures. The problem occurs when updating an atomic variable using a compare_exchange
operation:if the value A was read, thread 1 changes it to say C and tries to update thevariable, it uses compare_exchange
towrite C, only if the current value is A. This might be a problem if in themeanwhile thread 2 changes the value from A to B and back to A, because thread1 does not observe the change of the state. The common way to avoid the ABAproblem is to associate a version counter with the value and change bothatomically.
ABA问题是实现无锁数据结构的一个常见问题。当使用比较交换运算更新一个原子变量时,问题就会出现:如果值A被读取,线程1试图将它改为C并尝试更新该变量,它使用比较交换来写C,仅当当前值为A时。如果同时线程2将值从A变为B再变为A,这将是个问题,因为线程1没有观察到状态的改变(具体可参考:http://hustpawpaw.blog.163.com/blog/static/184228324201210811243127/)。通常避免ABA问题的方法是关联一个版本计数器至该值,并且一起原子的变化。
boost.lockfree
usesa tagged_ptr
helperclass which associates a pointer with an integer tag. This usually requires adouble-width compare_exchange
, whichis not available on all platforms. IA32 did not provide the cmpxchg8b
opcodebefore the pentium processor and it is also lacking on many RISC architectureslike PPC. Early X86-64 processors also did not provide a cmpxchg16b
instruction.On 64bit platforms one can work around this issue, because often not the full64bit address space is used. On X86_64 for example, only 48bit are used for theaddress, so we can use the remaining 16bit for the ABA prevention tag. Fordetails please consult the implementation of theboost::lockfree::detail::tagged_ptr
class.
boost.lockfree使用了一个tagged_ptr助手类,它使用一整数标签关联了一个指针。这通常需要一个双宽的比较交换,该操作并非在所有的平台上都可用。IA32在奔腾处理器之前不提供cmpxchg8b操作码,并且它也缺少许多RISC架构例如PPC。早期的X86-64处理器也不提供cmpxchg16b
指令。在64位平台上可以解决这个问题,因为经常并非完整的64位地址空间都被使用。例如在X86-64平台上,仅仅使用了地址空间的48位,因此我们可以使用剩下的16位来做为ABA预防标签。具体细节请参考类boost::lockfree::detail::tagged_ptr
的实现。
For lock-free operations on 32bit platforms without double-width compare_exchange
, wesupport a third approach: by using a fixed-sized array to store the internalnodes we can avoid the use of 32bit pointers, but instead 16bit indices intothe array are sufficient. However this is only possible for fixed-sized datastructures, that have an upper bound of internal nodes.
对不具有双宽比较交换的32位平台上的无锁操作,我们支持第三种方法:我们可以通过使用固定大小的数组来存储内部节点,从而避免使用32位指针,因此使用16位索引至数组就足够了。然而这仅对固定大小的数据结构可行,它们有一个内部节点的上限。
The boost.lockfree
datastructures have basic support for Boost.Interprocess. Theonly problem is the blocking emulation of lock-free atomics, which in thecurrent implementation is not guaranteed to be interprocess-safe.
boost.lockfree
数据结构具有对
Boost.Interprocess的基本支持。唯一的问题在于对无锁原子的阻塞模拟,这在当前实现中是不保证进程安全的。