我们带着这个参数回到kdb()里面去看,我们会发现,各cpu都会跳到kdba_main_loop()去.这个函数定义于arch/i386/kdb/kdbasupport.c
577 /*
578 * kdba_main_loop
579 *
580 * Do any architecture specific set up before entering the main kdb loop.
581 * The primary function of this routine is to make all processes look the
582 * same to kdb, kdb must be able to list a process without worrying if the
583 * process is running or blocked, so make all process look as though they
584 * are blocked.
585 *
586 * Inputs:
587 * reason The reason KDB was invoked
588 * error The hardware-defined error code
589 * error2 kdb's current reason code. Initially error but can change
590 * acording to kdb state.
591 * db_result Result from break or debug point.
592 * regs The exception frame at time of fault/breakpoint. If reason
593 * is SILENT or CPU_UP then regs is NULL, otherwise it should
594 * always be valid.
595 * Returns:
596 * 0 KDB was invoked for an event which it wasn't responsible
597 * 1 KDB handled the event for which it was invoked.
598 * Outputs:
599 * Sets eip and esp in current->thread.
600 * Locking:
601 * None.
602 * Remarks:
603 * none.
604 */
605
606 int
607 kdba_main_loop(kdb_reason_t reason, kdb_reason_t reason2, int error,
608 kdb_dbtrap_t db_result, struct pt_regs *regs)
609 {
610 int ret;
611 kdb_save_running(regs);
612 ret = kdb_main_loop(reason, reason2, error, db_result, regs);
613 kdb_unsave_running(regs);
614 return ret;
615 }
这个函数的核心部分自然是kdb_main_loop(),不过在它之前在它之后的这两个函数也有理由引起我们的注意.定义于kdb/kdbsupport.c:
782 /*
783 * kdb_save_running
784 *
785 * Save the state of a running process. This is invoked on the current
786 * process on each cpu (assuming the cpu is responding).
787 * Inputs:
788 * regs struct pt_regs for the process
789 * Outputs:
790 * Updates kdb_running_process[] for this cpu.
791 * Returns:
792 * none.
793 * Locking:
794 * none.
795 */
796
797 void
798 kdb_save_running(struct pt_regs *regs)
799 {
800 struct kdb_running_process *krp = kdb_running_process + smp_processor_id();
801 krp->p = current;
802 krp->regs = regs;
803 krp->seqno = kdb_seqno;
804 krp->irq_depth = hardirq_count() >> HARDIRQ_SHIFT;
805 kdba_save_running(&(krp->arch), regs);
806 }
807
808 /*
809 * kdb_unsave_running
810 *
811 * Reverse the effect of kdb_save_running.
812 * Inputs:
813 * regs struct pt_regs for the process
814 * Outputs:
815 * Updates kdb_running_process[] for this cpu.
816 * Returns:
817 * none.
818 * Locking:
819 * none.
820 */
821
822 void
823 kdb_unsave_running(struct pt_regs *regs)
824 {
825 struct kdb_running_process *krp = kdb_running_process + smp_processor_id();
826 kdba_unsave_running(&(krp->arch), regs);
827 krp->seqno = 0;
828 }
有这么一个结构体以及相应的数组,结构体定义于include/linux/kdbprivate.h:
428 /* Save data about running processes */
429
430 struct kdb_running_process {
431 struct task_struct *p;
432 struct pt_regs *regs;
433 int seqno; /* kdb sequence number */
434 int irq_depth; /* irq count */
435 struct kdba_running_process arch; /* arch dependent save data */
436 };
这其中struct kdba_running_process则定义于include/asm-i386/kdbprivate.h:
159 /* Arch specific data saved for running processes */
160
161 struct kdba_running_process {
162 long esp; /* CONFIG_4KSTACKS may be on a different stack */
163 };
熟悉x86体系结构的同志们一定不会对esp陌生.传说中的栈指针嘛.
而kdb_save_running和kdb_unsave_running内部调用了一对函数,它们都来自include/asm-i386/kdbprivate.h:
165 static inline
166 void kdba_save_running(struct kdba_running_process *k, struct pt_regs *regs)
167 {
168 k->esp = current_stack_pointer;
169 }
170
171 static inline
172 void kdba_unsave_running(struct kdba_running_process *k, struct pt_regs *regs)
173 {
174 }
进一步跟踪你会发现,current_stack_pointer其实就是esp寄存器,不信你来看include/asm-i386/thread_info.h:
87 /* how to get the current stack pointer from C */
88 register unsigned long current_stack_pointer asm("esp") __attribute_used__;
标准的gcc内联汇编,相比kernel中众多的内联汇编语句,这句算是最好理解的了.
那么到这里基本上我们就知道了,kdb_save_running和kdb_unsave_running的作用无非就是在真正进入kdb之前保存之前那个进程的一些基本信息以及在从kdb出来以后及时的恢复之.
最后需要提醒一下的是,kdb_running_process不仅仅是结构体的名字,它还是一个数组的名字,该数组定义于kdb/kdbsupport.c:
780 struct kdb_running_process kdb_running_process[NR_CPUS];
对于smp来说,有几个处理器这个数组就有几个元素,对于up来说,自然就是一个成员.在kdb_save_running和kdb_unsave_running中都有用到这个数组名,以它为起始地址去计算与某个CPU相关的那个struct kdb_running_process结构体变量,并让krp指向它.
这时候我们就可以全力关注kdb_main_loop()了.用下半身想想也知道这个函数必然定义于kdb/kdbmain.c中:
1506 /*
1507 * kdb_main_loop
1508 *
1509 * The main kdb loop. After initial setup and assignment of the controlling
1510 * cpu, all cpus are in this loop. One cpu is in control and will issue the kdb
1511 * prompt, the others will spin until 'go' or cpu switch.
1512 *
1513 * To get a consistent view of the kernel stacks for all processes, this routine
1514 * is invoked from the main kdb code via an architecture specific routine.
1515 * kdba_main_loop is responsible for making the kernel stacks consistent for all
1516 * processes, there should be no difference between a blocked process and a
1517 * running process as far as kdb is concerned.
1518 *
1519 * Inputs:
1520 * reason The reason KDB was invoked
1521 * error The hardware-defined error code
1522 * reason2 kdb's current reason code. Initially error but can change
1523 * acording to kdb state.
1524 * db_result Result code from break or debug point.
1525 * regs The exception frame at time of fault/breakpoint. If reason
1526 * is SILENT or CPU_UP then regs is NULL, otherwise it
1527 * should always be valid.
1528 * Returns:
1529 * 0 KDB was invoked for an event which it wasn't responsible
1530 * 1 KDB handled the event for which it was invoked.
1531 * Locking:
1532 * none
1533 * Remarks:
1534 * none
1535 */
1536
1537 int
1538 kdb_main_loop(kdb_reason_t reason, kdb_reason_t reason2, int error,
1539 kdb_dbtrap_t db_result, struct pt_regs *regs)
1540 {
1541 int result = 1;
1542 /* Stay in kdb() until 'go', 'ss[b]' or an error */
1543 while (1) {
1544 /*
1545 * All processors except the one that is in control
1546 * will spin here.
1547 */
1548 KDB_DEBUG_STATE("kdb_main_loop 1", reason);
1549 while (KDB_STATE(HOLD_CPU)) {
1550 /* state KDB is turned off by kdb_cpu to see if the
1551 * other cpus are still live, each cpu in this loop
1552 * turns it back on.
1553 */
1554 if (!KDB_STATE(KDB)) {
1555 KDB_STATE_SET(KDB);
1556 }
1557 }
1558 KDB_STATE_CLEAR(SUPPRESS);
1559 KDB_DEBUG_STATE("kdb_main_loop 2", reason);
1560 if (KDB_STATE(LEAVING))
1561 break; /* Another cpu said 'go' */
1562
1563 if (!kdb_quiet(reason))
1564 kdb_wait_for_cpus();
1565 /* Still using kdb, this processor is in control */
1566 result = kdb_local(reason2, error, regs, db_result);
1567 KDB_DEBUG_STATE("kdb_main_loop 3", result);
1568
1569 if (result == KDB_CMD_CPU) {
1570 /* Cpu switch, hold the current cpu, release the target one. */
1571 reason2 = KDB_REASON_SWITCH;
1572 KDB_STATE_SET(HOLD_CPU);
1573 KDB_STATE_CLEAR_CPU(HOLD_CPU, kdb_new_cpu);
1574 continue;
1575 }
1576
1577 if (result == KDB_CMD_SS) {
1578 KDB_STATE_SET(DOING_SS);
1579 break;
1580 }
1581
1582 if (result == KDB_CMD_SSB) {
1583 KDB_STATE_SET(DOING_SS);
1584 KDB_STATE_SET(DOING_SSB);
1585 break;
1586 }
1587
1588 if (result && result != 1 && result != KDB_CMD_GO)
1589 kdb_printf("/nUnexpected kdb_local return code %d/n", result);
1590
1591 KDB_DEBUG_STATE("kdb_main_loop 4", reason);
1592 break;
1593 }
1594 if (KDB_STATE(DOING_SS))
1595 KDB_STATE_CLEAR(SSBPT);
1596 return result;
1597 }
记住,此时此刻,所有的处理器都在这个函数中,只不过其中一个的reason是我们最早传递的那个KDB_REASON_KEYBOARD,而其余的几个cpu的reason是IPI引起的KDB_REASON_SWITCH,前者只有一个,但是地位相当于正室,后者可以有多个,但是地位相当于小妾.(注释中说,第一个那个叫做in control,而其它的就是不停的spin)
扫一下这个函数发现主要就是一个while(1)死循环.当然,其中有多个break可以跳出循环.不过有意思的是这个死循环内部还有另一个死循环,1549行,while(KDB_STATE(HOLD_CPU)),这个就得回忆一下如风的往事了,回过去看kdb(),我们发现,在调用smp_kdb_stop()之前有一俄for循环,把每一个活着的cpu都给设置了两个flag,即HOLD_CPU和WAIT_IPI.(当然,in control的这个cpu没有设置)直到后来结束了kdba_main_loop()之后,才把这两个flag给人Clear掉.所以1549到1557这段while循环对于那些cpu来说,就相当于太上老君把孙猴子困在了炼丹炉中,非得等到一定时间以后才能放出来.这就是注释里说的spin了.因此,在下面的代码其它的cpu暂时就不会执行了,只有那个in control的cpu会继续往下走,于是,从此以后,我们就先只关注这一个cpu,而其它的cpu,暂时可以忽略,你愿意想起她就想起她,像想起春天的一个梦,你愿意忘记她就忘记她,像忘记天边的一颗星.
1564行,kdb_wait_for_cpus().这个函数就是等待大家都进入kdb.它这个函数的技巧就是判断大家的seqno是不是和kdb_seqno这个全局变量相等,因为实际上如果大家都执行了kdb()函数,那么我们前面说过,kdb_save_running()中就会把各cpu对应的那个seqno赋值为kdb_seqno.于是就应该有1477行的online等于kdb_data从而跳出for循环.反之如果某个cpu或者某几个cpu还没有执行过kdb()函数,那么它们的seqno就不会和全局的这个seqno相等,或者直接说就是初始值0.于是kdb_data就会和online不相等,从而会执行1479行开始的代码,并且开始下一轮的for循环,当然也没什么别的噱头,无非就是做些延时而已.最后只要大家都进入了kdb,这个函数就可以结束了.