最近刚处理完一个死锁defect时,在客户的cf卡驱动中发现一处非常不规范地使用spinlock的API。由于问题已经开始收敛到了这个恶心的driver中,自然不能放过这个spinlock。所以花了些时间来研究spinlock。
最初对spinlock的理解是:
spinlock的具体实现都是与具体的体系结构相关的(甚至具体的cpu),因为需要利用一些处理器特性,来保证对spinlock访问的原子性。例如,在mips中,通常使用llsc来保证读-修改-写操作序列的原子性。
好,废话一大堆,现在我们开始聊什么是ticket spinlock。几天之前,我一直想当然地认为spin_lock的实现是
preempt_disable()
while (atomic_test_and_set_spinlock) {
//wait here
};
spin_unlock的实现是
set spinlock to 0
preempt_enable();
实际上spinlock已经不再像教科书上。为了避免传统spinclock的thundering herd效应,从2.6.25开始spinlock的实现已经变成ticket spinlock了。
thundering herd是指当锁的持有者解锁时,等待者会一窝蜂地去抢锁,这样会造成相对随机地被其中一个获得,更确切地说,谁获得锁与锁的cache locality有关。因此传统的spinlock是unfair的。更理想的情况是,给等待者排个队。当持有者解锁时,第一个等待的人会得到锁,然后是第二个、第三个。ticket spinlock就是这么干的。对于捉虫者来说,公平不公平其实是无所谓的,但是公平了之后,spinlock的行为变得predicatable,这是很爽的事情,你懂的。
下面是ticket spinlock在mips64 smp中的实现(被我简化了:)):
typedef union {
/*
* bits 0..15 : serving_now
* bits 16..31 : ticket
*/
u32 lock;
struct {
#ifdef __BIG_ENDIAN
u16 ticket;
u16 serving_now;
#else
u16 serving_now;
u16 ticket;
#endif
} h;
} arch_spinlock_t;
static inline void arch_spin_lock(arch_spinlock_t *lock)
{
int my_ticket;
int tmp;
int inc = 0x10000;
__asm__ __volatile__ (
" .set push # arch_spin_lock \n"
" .set noreorder \n"
" \n"
"1: ll %[ticket], %[ticket_ptr] \n"
" addu %[my_ticket], %[ticket], %[inc] \n"
" sc %[my_ticket], %[ticket_ptr] \n"
" beqz %[my_ticket], 1b \n"
" srl %[my_ticket], %[ticket], 16 \n"
" andi %[ticket], %[ticket], 0xffff \n"
" andi %[my_ticket], %[my_ticket], 0xffff \n"
" bne %[ticket], %[my_ticket], 4f \n"
" subu %[ticket], %[my_ticket], %[ticket] \n"
"2: \n"
" .subsection 2 \n"
"4: andi %[ticket], %[ticket], 0x1fff \n"
" sll %[ticket], 5 \n"
“ \n"
"6: bnez %[ticket], 6b \n"
" subu %[ticket], 1 \n"
" \n"
" lhu %[ticket], %[serving_now_ptr] \n"
" beq %[ticket], %[my_ticket], 2b \n"
" subu %[ticket], %[my_ticket], %[ticket] \n"
" b 4b \n"
" subu %[ticket], %[ticket], 1 \n"
" .previous \n"
" .set pop \n"
: [ticket_ptr] "+m" (lock->lock),
[serving_now_ptr] "+m" (lock->h.serving_now),
[ticket] "=&r" (tmp),
[my_ticket] "=&r" (my_ticket)
: [inc] "r" (inc));
smp_llsc_mb();
}
static inline void arch_spin_unlock(arch_spinlock_t *lock)
{
unsigned int serving_now = lock->h.serving_now + 1;
wmb();
lock->h.serving_now = (u16)serving_now;
nudge_writes();
}