Lock-Free Programming

参考:
http://preshing.com/20120612/an-introduction-to-lock-free-programming/
http://blog.csdn.net/lifesider/article/details/6582338
http://blog.poxiao.me/p/spinlock-implementation-in-cpp11/
http://www.infoq.com/cn/news/2014/11/cpp-lock-free-programming
http://people.csail.mit.edu/bushl2/rpi/project_web/page5.html
http://15418.courses.cs.cmu.edu/spring2013/article/46
http://www.cl.cam.ac.uk/research/srg/netos/projects/archive/lock-free/
http://www.liblfds.org/
http://www.cppblog.com/woaidongmao/archive/2009/05/02/81663.html
http://www.sogou.com/labs/report_source/4-2.pdf
https://www.infoq.com/news/2014/10/cpp-lock-free-programming
http://coolshell.cn/articles/8239.html 中文coolshell

先收集了一些文章,等学习完再总结一下。

一些基本概念

deadlock & livelock

https://en.wikipedia.org/wiki/Deadlock
https://en.wikipedia.org/wiki/Deadlock#Livelock

两个进程都需要获取资源后才能继续运行。进程P1需要资源R1,并且P1已经拥有了资源R2;进程P2需要资源R2,并且P2已经拥有了资源R1。这就造成了死锁,两个进程都无法继续运行。

而活锁:指进程1可以使用资源,但它让进程2先使用资源;进程2可以使用资源,但它让进程1先使用资源,于是两者一直谦让,都无法使用资源。

这里还有一个形象的比喻

http://blog.csdn.net/java2000_net/article/details/4061983

RMW(ReadModifyWrite)

C++11的 std::atomic<int>::fetch_add 就是RMW操作。但是要注意,C++11并不保证在每个平台上都是lockfree方法实现的。所以最好使用std::atomic<>::is_lock_free()来确定你使用的平台是不是lockfree的RMW。

atomic RMWs are a necessary part of lock-free programming even on single-processor systems. Without atomicity, a thread could be interrupted halfway through the transaction, possibly leading to an inconsistent state.

原子的RMW是lock-free programming的必备条件。

CAS (Compare-And-Swap Loops)

CAS可能是讨论最广泛的RMW操作。WIN32平台提供了一组函数来进行CAS操作,如_InterlockedCompareExchange。通常将CAS操作放在一个循环中,不断地尝试直到成功,这种方法通常包含3步。例如下面的push操作:

void LockFreeQueue::push(Node* newHead)
{
    for (;;)
    {
        // 1. Copy a shared variable (m_Head) to a local.
        Node* oldHead = m_Head;

        // 2. Do some speculative work, not yet visible to other threads.
        newHead->next = oldHead;

        // 3. Next, attempt to publish our changes to the shared variable.
        // If the shared variable hasn't changed, the CAS succeeds and we return.
        // Otherwise, repeat.
        if (_InterlockedCompareExchange(&m_Head, newHead, oldHead) == oldHead)
            return;
    }
}

第一步,将线程间共享的数据(上例是链表的头head)拷贝到局部。
第二步,做一些尝试性的改动(将新的head指向局部旧head)。但是此时对其他线程不可见。
第三步,将所做的改动更新到共享的数据,如果共享的数据没有变化,那么CAS成功我们的函数返回;否则进行下次循环。

当写类似的CAS loop的时候要注意避免ABA问题。

ABA problem

https://en.wikipedia.org/wiki/ABA_problem

wikipedia讲解的非常清楚,我这里只摘录一下

Example : John is waiting in his car at a red traffic light with his children. His children start fighting with each other while waiting, and he leans back to scold them. Once their fighting stops, John checks the light again and notices that it’s still red. However, while he was focusing on his children, the light had changed to green, and then back again. John doesn’t think the light ever changed, but the people waiting behind him are very mad and honking their horns now.
In this scenario, the ‘A’ state is when the traffic light is red, and the ‘B’ state is when it’s green. Originally, the traffic light starts in ‘A’ state. If John looked at the light he would have noticed the change. But he only looked when the light was red (state ‘A’). There is no way to tell if the light turned green during the time of no observation.

这是wikipedia中列举的一个比较形象的例子, John和他的孩子们在车子里等红灯,这时孩子开始打闹起来,John就回头告诉孩子们安静下来,等孩子们安静下来后,John再次检查红绿灯时发现还是红灯.但是实际上John在回头教育孩子的时候错过了红绿灯的变化, 即: 红灯—>绿灯—>红灯. 这就是ABA问题.

(但是这个例子ABA带来的危害并不是很大,只要再等一会变成红灯了就没问题了. )

ABA带来问题的例子: 一个lockfree的stack

  /* Naive lock-free stack which suffers from ABA problem.*/
  class Stack {
    std::atomic<Obj*> top_ptr;
    //
    // Pops the top object and returns a pointer to it.
    //
    Obj* Pop() {
      while(1) {
        Obj* ret_ptr = top_ptr;
        if (!ret_ptr) return nullptr;
        // For simplicity, suppose that we can ensure that this dereference is safe
        // (i.e., that no other thread has popped the stack in the meantime).
        Obj* next_ptr = ret_ptr->next;
        // If the top node is still ret, then assume no one has changed the stack.
        // (That statement is not always true because of the ABA problem)
        // Atomically replace top with next.
        if (top_ptr.compare_exchange_weak(ret_ptr, next_ptr)) {
          return ret_ptr;
        }
        // The stack has changed, start over.
      }
    }
    //
    // Pushes the object specified by obj_ptr to stack.
    //
    void Push(Obj* obj_ptr) {
      while(1) {
        Obj* next_ptr = top_ptr;
        obj_ptr->next = next_ptr;
        // If the top node is still next, then assume no one has changed the stack.
        // (That statement is not always true because of the ABA problem)
        // Atomically replace top with obj.
        if (top_ptr.compare_exchange_weak(next_ptr, obj_ptr)) {
          return;
        }
        // The stack has changed, start over.
      }
    }
  };

如果明白了CAS,那么理解上面的pop和push应该没什么问题.很显然,在调用compare_exchange_weak之前很有可能发生ABA问题. 例如:

假设栈中的元素从栈顶到栈底为top → A → B → C
线程1开始执行pop操作:

ret = A;
next = B;

接着,线程1在执行compare_exchange_weak之前被中断了…

  { // 线程2开始执行pop:
    ret = A;
    next = B;
    compare_exchange_weak(A, B)  // 成功, top = B
    return A;
  } // 现在栈为 top → B → C
  { // 线程2再次执行pop:
    ret = B;
    next = C;
    compare_exchange_weak(B, C)  // 成功, top = C
    return B;
  } // 现在栈为 top → C
  delete B; // 删除了B
  { // 线程2又将A放回了栈:
    A->next = C;
    compare_exchange_weak(C, A)  // 成功, top = A
  }

现在栈为: top → A → C

接着线程1中断返回了,但是线程1还是认为栈没有变化,然后要执行:

compare_exchange_weak(A, B)

那么问题来了,线程1并不知道B已经被删除了,它会把栈改成:top → B → C

是的,它把一个空悬指针放到了栈中.那么当再次pop并使用该node时将发生未知的错误.

std::atomic::compare_exchange_weak()

https://www.codeproject.com/articles/808305/understand-std-atomic-compare-exchange-weak-in-cpl

bool compare_exchange_weak (T& expected, T desired, ..);
bool compare_exchange_strong (T& expected, T desired, ..);

当预期的值与对象真正持有的值相等,那么它将返回成功并把所需的值写入内存。否则,预期值会被内存中实际的值覆盖更新,并返回失败。这在绝大多数情况下都是正确的,除了一个列外情况:CAS的weak版本即使是在内存的值与期望值相等的情况,也可能返回失败。在这种情况下,所需的值不会同步到内存当中。即伪失败(Spurious Failiure)

发生伪失败是因为,在一些平台上面,CAS操作是用一个指令序列来实现的,不同与x86上的一个指令。在这些平台上,切换上下文,另外一个线程加载了同一个内存地址,种种情况都会导致一开始的CAS操作失败。称它是假的,是因为CAS失败并不是因为存储的值与期望的值不相等,而是时间调度的问题。CAS的strong版本的行为不同,它把这个问题包裹在其中,并防止了这种伪失败的发生。

由于伪失败的存在, weak版本通常在循环中使用.

C++11 § 29.6.5
A consequence of spurious failure is that nearly all uses of weak compare-and-exchange will be in a loop.

你可能感兴趣的:(Lock-Free Programming)