首先整体看一下linux抢占(调度)的一些基本知识:
User Preemption
User preemption occurs when the kernel is about to return to user-space, need_resched is set, and therefore, the scheduler is invoked. If the kernel is returning to user-space, it knows it is in a safe quiescent state. In other words, if it is safe to continue executing the current task, it is also safe to pick a new task to execute. Consequently, whenever the kernel is preparing to return to user-space either on return from an interrupt or after a system call, the value of need_resched is checked. If it is set, the scheduler is invoked to select a new (more fit) process to execute. Both the return paths for return from interrupt and return from system call are architecture dependent and typically implemented in assembly in entry.S (which, aside from kernel entry code, also contains kernel exit code).
In short, user preemption can occur
When returning to user-space from a system call
When returning to user-space from an interrupt handler
Kernel Preemption
The Linux kernel, unlike most other Unix variants and many other operating systems, is a fully preemptive kernel. In non-preemptive kernels, kernel code runs until completion. That is, the scheduler is not capable of rescheduling a task while it is in the kernelkernel code is scheduled cooperatively, not preemptively. Kernel code runs until it finishes (returns to user-space) or explicitly blocks. In the 2.6 kernel, however, the Linux kernel became preemptive: It is now possible to preempt a task at any point, so long as the kernel is in a state in which it is safe to reschedule.
So when is it safe to reschedule? The kernel is capable of preempting a task running in the kernel so long as it does not hold a lock. That is, locks are used as markers of regions of non-preemptibility. Because the kernel is SMP-safe, if a lock is not held, the current code is reentrant and capable of being preempted.
The first change in supporting kernel preemption was the addition of a preemption counter, preempt_count, to each process's thread_info. This counter begins at zero and increments once for each lock that is acquired and decrements once for each lock that is released. When the counter is zero, the kernel is preemptible. Upon return from interrupt, if returning to kernel-space, the kernel checks the values of need_resched and preempt_count. If need_resched is set and preempt_count is zero, then a more important task is runnable and it is safe to preempt. Thus, the scheduler is invoked. If preempt_count is nonzero, a lock is held and it is unsafe to reschedule. In that case, the interrupt returns as usual to the currently executing task. When all the locks that the current task is holding are released, preempt_count returns to zero. At that time, the unlock code checks whether need_resched is set. If so, the scheduler is invoked. Enabling and disabling kernel preemption is sometimes required in kernel code and is discussed in Chapter 9.
Kernel preemption can also occur explicitly, when a task in the kernel blocks or explicitly calls schedule(). This form of kernel preemption has always been supported because no additional logic is required to ensure that the kernel is in a state that is safe to preempt. It is assumed that the code that explicitly calls schedule() knows it is safe to reschedule.
Kernel preemption can occur
When an interrupt handler exits, before returning to kernel-space
When kernel code becomes preemptible again
If a task in the kernel explicitly calls schedule()
If a task in the kernel blocks (which results in a call to schedule())
上文说了用户态的抢占和内核态的抢占,顺带说一句,可以看出内核态可以发生抢占的时间点也很有限,所以说linux内核做不到实时系统的要求,只能算伪实时。
回到自旋锁在单核(uniprocessor,以下简称UP,和SMP对立)会不会锁死的主题:
UP环境用户态的自旋锁,假定spin纯粹用cas操作来实现(多次spin失败也不加sched_yeild()调用)。根据上文的引用,线程t1获得锁后的临界区可能发生中断,也就可能发生别的线程t2的抢占,如果t2也想获得同一个自旋锁,那么t2就会开始spin;此时除非又发生中断而让t1重新抢占并运行且离开临界区释放锁,则t2就会一直spin。因此我认为,UP环境在用户态使用自旋锁的效率应该大大不如系统的mutex,而且很容易疯狂空转CPU这种近乎锁死的情况。
但是UP两个超线程(下文简称HT)的话,不知道会不会。HT技术产生的背景是,CPU中的用于流水线优化的单元越来越多,但是一个线程执行的时候很多都用不上,所以干脆稍微增加一点资源(在流水线的乱序执行单元之前的前端部件,做成两套),产生两个逻辑核心,充分利用CPU资源。但是能真正发生并行的情况,仅限于两个线程占用的CPU单元不同(流水线中乱序执行及之后的部件)的情形。
而UP上内核态的自旋锁,首先需要讨论什么时候需要实现,然后讨论需要实现的情形下UP环境中会不会锁死。
在UP上的<2.6内核,抢占只会发生在引用文章的3、4两种主动调度的传统情况,正常人写临界区肯定不会产生3、4这两种主动调度(注:据说如果在内核中加了锁还手动调用schedule()的话,会看到dmesg打印"BUG: scheduling while atomic"),所以根本用不着加锁,这时候内核中自旋锁的定义是空的。
但是同时,<2.6的内核在SMP时,为了保证临界区的共享内存的安全,自旋锁就要求实现了。
而在2.6之后的内核,即便是在UP上,多了1、2两种情况,也同样没法保证临界区能持续运行(假设不实现锁,那么“临界区”发生情况1即中断时结论很明显;情况2因为我们假设不设置锁,所以不用考虑),情况就变得跟<2.6的内核在SMP的情形是一样的,自旋锁也需要被实现。
那么看一下抢占式内核实现的自旋锁在UP环境是否会spin,如果spin是否会锁死。
从上文可知,抢占式内核中无论什么锁,一旦加锁进入临界区,会通过thread_info中的计数器保证不会在1(中断返回)、2(例如临界区里面又加了一层锁然后给内层的锁解锁的时机)的时机下被抢占,也就是保证临界区不被抢占,这种实现可以在两个场景下讨论:
使用同一个自旋锁的t1,t2等线程,t1先运行,获得了锁,那么直到t1释放锁离开临界区,在UP环境,t2等其他线程根本没有机会运行,也就是说,所有线程根本没有机会spin,自然就不会发生锁死;
如果t1线程和中断响应代码中使用了同一个自旋锁,那么t1的临界区运行时,发生了中断,那么中断代码尝试获取同一个自旋锁就会失败,然后开始持续的spin,在UP上即发生了锁死。因而内核的自旋锁,专门提供了void spin_lockirq(spinlock_t *lock)这样的接口,来在获取锁之前关闭当前CPU(UP情况下就是唯一的这个CPU)的中断,可以避免这种情况的锁死。
注:关于关闭中断,SMP架构上有一种Big Lock,它可以关闭所有CPU上发生的中断,这种东西(目测应该是什么CPU特定的指令吧)很少使用,因为这对系统性能影响太大。
这个话题有点无聊,纯粹是探索一下。如果有错误,欢迎指正。