✅作者简介:嵌入式领域优质创作者,博客专家
✨个人主页:咸鱼弟
系列专栏:Linux专栏
推荐一款求职面试、刷题神器注册免费刷题
遇到该问题的背景是在修改一个模块时,写了一个定时器,在定时器中断处理函数中处理了一些复杂而又耗时的任务,导致出现的该BUG,从而导致系统重启。
见如下打印
<3>[26578.636839] C1 [ swapper/1] BUG: scheduling while atomic: swapper/1/0/0x00000002
<6>[26578.636869] C0 [ kworker/u:1] CPU1 is up
<4>[26578.636900] C1 [ swapper/1] Modules linked in: bcm15500_i2c_ts
<4>[26578.636961] C1 [ swapper/1] [] (unwind_backtrace+0x0/0x11c) from [] (__schedule+0x70/0x6e0)
<4>[26578.636991] C1 [ swapper/1] [] (__schedule+0x70/0x6e0) from [] (schedule_preempt_disabled+0x14/0x20)
<4>[26578.637052] C1 [ swapper/1] [] (schedule_preempt_disabled+0x14/0x20) from [] (cpu_idle+0xf0/0x104)
<4>[26578.637083] C1 [ swapper/1] [] (cpu_idle+0xf0/0x104) from [] (cpu_die+0x2c/0x5c)
<3>[26578.637510] C1 [ swapper/1] BUG: scheduling while atomic: swapper/1/0/0x00000002
<4>[26578.637510] C1 [ swapper/1] Modules linked in: bcm15500_i2c_ts
<4>[26578.637602] C1 [ swapper/1] [] (unwind_backtrace+0x0/0x11c) from [] (__schedule+0x70/0x6e0)
<4>[26578.637663] C1 [ swapper/1] [] (__schedule+0x70/0x6e0) from [] (schedule_preempt_disabled+0x14/0x20)
<4>[26578.637724] C1 [ swapper/1] [] (schedule_preempt_disabled+0x14/0x20) from [] (cpu_idle+0xf0/0x104)
<4>[26578.637754] C1 [ swapper/1] [] (cpu_idle+0xf0/0x104) from [] (cpu_die+0x2c/0x5c)
<3>[26578.648069] C1 [ swapper/1] BUG: scheduling while atomic: swapper/1/0/0x00000002
该打印是在如下代码中
/*
* Print scheduling while atomic bug:
*/
static noinline void __schedule_bug(struct task_struct *prev)
{
if (oops_in_progress)
return;
printk(KERN_ERR "BUG: scheduling while atomic: %s/%d/0x%08x\n",
prev->comm, prev->pid, preempt_count());
debug_show_held_locks(prev);
print_modules();
if (irqs_disabled())
print_irqtrace_events(prev);
dump_stack();
}
/*
* Various schedule()-time debugging checks and statistics:
*/
static inline void schedule_debug(struct task_struct *prev)
{
/*
* Test if we are atomic. Since do_exit() needs to call into
* schedule() atomically, we ignore that path for now.
* Otherwise, whine if we are scheduling when we should not be.
*/
if (unlikely(in_atomic_preempt_off() && !prev->exit_state))
__schedule_bug(prev);
rcu_sleep_check();
profile_hit(SCHED_PROFILING, __builtin_return_address(0));
schedstat_inc(this_rq(), sched_count);
}
再上一级函数
/*
* __schedule() is the main scheduler function.
*/
static void __sched __schedule(void)
{
struct task_struct *prev, *next;
unsigned long *switch_count;
struct rq *rq;
int cpu;
need_resched:
preempt_disable();
cpu = smp_processor_id();
rq = cpu_rq(cpu);
rcu_note_context_switch(cpu);
prev = rq->curr;
schedule_debug(prev);
....
}
可以看出, 满足如下条件将会打印该出错信息
unlikely(in_atomic_preempt_off() && !prev->exit_state
为0表示TASK_RUNNING状态,当前进程在运行; 并且处于原子状态,,那么就不能切换给其它的进程
Linux/include/linux/sched.h
/*
* Task state bitmask. NOTE! These bits are also
* encoded in fs/proc/array.c: get_task_state().
*
* We have two separate sets of flags: task->state
* is about runnability, while task->exit_state are
* about the task exiting. Confusing, but this way
* modifying one set can't modify the other one by
* mistake.
*/
#define TASK_RUNNING 0
#define TASK_INTERRUPTIBLE 1
#define TASK_UNINTERRUPTIBLE 2
#define __TASK_STOPPED 4
#define __TASK_TRACED 8
/* in tsk->exit_state */
#define EXIT_ZOMBIE 16
#define EXIT_DEAD 32
/* in tsk->state again */
#define TASK_DEAD 64
#define TASK_WAKEKILL 128
#define TASK_WAKING 256
#define TASK_STATE_MAX 512
kernel/include/linux/hardirq.h
#if defined(CONFIG_PREEMPT_COUNT)
# define PREEMPT_CHECK_OFFSET 1
#else
# define PREEMPT_CHECK_OFFSET 0
#endif
/*
* Are we running in atomic context? WARNING: this macro cannot
* always detect atomic context; in particular, it cannot know about
* held spinlocks in non-preemptible kernels. Thus it should not be
* used in the general case to determine whether sleeping is possible.
* Do not use in_atomic() in driver code.
*/
#define in_atomic() ((preempt_count() & ~PREEMPT_ACTIVE) != 0)
/*
* Check whether we were atomic before we did preempt_disable():
* (used by the scheduler, *after* releasing the kernel lock)
*/
#define in_atomic_preempt_off() \
((preempt_count() & ~PREEMPT_ACTIVE) != PREEMPT_CHECK_OFFSET)
通过上述分析我们得出结论:
linux内核打印"BUG: scheduling while atomic"和"bad: scheduling from the idle thread"错误的时候,通常是在中断处理函数中调用了可以休眠的函数,如semaphore,mutex,sleep之类的可休眠的函数,
我在定时器中断到来时,处理函数中有sleep的操作,而linux内核要求在中断处理的时候,不允许系统调度,不允许抢占,要等到中断处理完成才能做其他事情,从而导致该bug。因此,要充分考虑中断处理的时间,一定不能太久。
那遇到这种问题是如何解决的?
可以使用工作队列,将这些复杂耗时的任务放到工作队列中去处理,也就是当中断到来时,把原来直接处理的复杂任务放到工作队列中去处理。
工作队列的使用可参考 Linux内核工作队列(workqueue)详解