于是我们还是直接来看kdb()吧.这个函数有多长?说出来吓死你,近500行,光注释就有一两百行.写代码的估计一边写着,一边心里嘀咕着:XP不发威,你当我是DOS啊!
1615 /*
1616 * kdb
1617 *
1618 * This function is the entry point for the kernel debugger. It
1619 * provides a command parser and associated support functions to
1620 * allow examination and control of an active kernel.
1621 *
1622 * The breakpoint trap code should invoke this function with
1623 * one of KDB_REASON_BREAK (int 03) or KDB_REASON_DEBUG (debug register)
1624 *
1625 * the die_if_kernel function should invoke this function with
1626 * KDB_REASON_OOPS.
1627 *
1628 * In single step mode, one cpu is released to run without
1629 * breakpoints. Interrupts and NMI are reset to their original values,
1630 * the cpu is allowed to do one instruction which causes a trap
1631 * into kdb with KDB_REASON_DEBUG.
1632 *
1633 * Inputs:
1634 * reason The reason KDB was invoked
1635 * error The hardware-defined error code
1636 * regs The exception frame at time of fault/breakpoint. If reason
1637 * is SILENT or CPU_UP then regs is NULL, otherwise it
1638 * should always be valid.
1639 * Returns:
1640 * 0 KDB was invoked for an event which it wasn't responsible
1641 * 1 KDB handled the event for which it was invoked.
1642 * Locking:
1643 * none
1644 * Remarks:
1645 * No assumptions of system state. This function may be invoked
1646 * with arbitrary locks held. It will stop all other processors
1647 * in an SMP environment, disable all interrupts and does not use
1648 * the operating systems keyboard driver.
1649 *
1650 * This code is reentrant but only for cpu switch. Any other
1651 * reentrancy is an error, although kdb will attempt to recover.
1652 *
1653 * At the start of a kdb session the initial processor is running
1654 * kdb() and the other processors can be doing anything. When the
1655 * initial processor calls smp_kdb_stop() the other processors are
1656 * driven through kdb_ipi which calls kdb() with reason SWITCH.
1657 * That brings all processors into this routine, one with a "real"
1658 * reason code, the other with SWITCH.
1659 *
1660 * Because the other processors are driven via smp_kdb_stop(),
1661 * they enter here from the NMI handler. Until the other
1662 * processors exit from here and exit from kdb_ipi, they will not
1663 * take any more NMI requests. The initial cpu will still take NMI.
1664 *
1665 * Multiple race and reentrancy conditions, each with different
1666 * advoidance mechanisms.
1667 *
1668 * Two cpus hit debug points at the same time.
1669 *
1670 * kdb_lock and kdb_initial_cpu ensure that only one cpu gets
1671 * control of kdb. The others spin on kdb_initial_cpu until
1672 * they are driven through NMI into kdb_ipi. When the initial
1673 * cpu releases the others from NMI, they resume trying to get
1674 * kdb_initial_cpu to start a new event.
1675 *
1676 * A cpu is released from kdb and starts a new event before the
1677 * original event has completely ended.
1678 *
1679 * kdb_previous_event() prevents any cpu from entering
1680 * kdb_initial_cpu state until the previous event has completely
1681 * ended on all cpus.
1682 *
1683 * An exception occurs inside kdb.
1684 *
1685 * kdb_initial_cpu detects recursive entry to kdb and attempts
1686 * to recover. The recovery uses longjmp() which means that
1687 * recursive calls to kdb never return. Beware of assumptions
1688 * like
1689 *
1690 * ++depth;
1691 * kdb();
1692 * --depth;
1693 *
1694 * If the kdb call is recursive then longjmp takes over and
1695 * --depth is never executed.
1696 *
1697 * NMI handling.
1698 *
1699 * NMI handling is tricky. The initial cpu is invoked by some kdb event,
1700 * this event could be NMI driven but usually is not. The other cpus are
1701 * driven into kdb() via kdb_ipi which uses NMI so at the start the other
1702 * cpus will not accept NMI. Some operations such as SS release one cpu
1703 * but hold all the others. Releasing a cpu means it drops back to
1704 * whatever it was doing before the kdb event, this means it drops out of
1705 * kdb_ipi and hence out of NMI status. But the software watchdog uses
1706 * NMI and we do not want spurious watchdog calls into kdb. kdba_read()
1707 * resets the watchdog counters in its input polling loop, when a kdb
1708 * command is running it is subject to NMI watchdog events.
1709 *
1710 * Another problem with NMI handling is the NMI used to drive the other
1711 * cpus into kdb cannot be distinguished from the watchdog NMI. State
1712 * flag WAIT_IPI indicates that a cpu is waiting for NMI via kdb_ipi,
1713 * if not set then software NMI is ignored by kdb_ipi.
1714 *
1715 * Cpu switching.
1716 *
1717 * All cpus are in kdb (or they should be), all but one are
1718 * spinning on KDB_STATE(HOLD_CPU). Only one cpu is not in
1719 * HOLD_CPU state, only that cpu can handle commands.
1720 *
1721 * Go command entered.
1722 *
1723 * If necessary, go will switch to the initial cpu first. If the event
1724 * was caused by a software breakpoint (assumed to be global) that
1725 * requires single-step to get over the breakpoint then only release the
1726 * initial cpu, after the initial cpu has single-stepped the breakpoint
1727 * then release the rest of the cpus. If SSBPT is not required then
1728 * release all the cpus at once.
1729 */
1730
1731 fastcall int
1732 kdb(kdb_reason_t reason, int error, struct pt_regs *regs)
1733 {
1734 kdb_intstate_t int_state; /* Interrupt state */
1735 kdb_reason_t reason2 = reason;
1736 int result = 0; /* Default is kdb did not handle it */
1737 int ss_event, old_regs_saved = 0;
1738 struct pt_regs *old_regs = NULL;
1739 kdb_dbtrap_t db_result=KDB_DB_NOBPT;
1740 preempt_disable();
1741 atomic_inc(&kdb_event);
1742
1743 switch(reason) {
1744 case KDB_REASON_OOPS:
1745 case KDB_REASON_NMI:
1746 KDB_FLAG_SET(CATASTROPHIC); /* kernel state is dubious now */
1747 break;
1748 default:
1749 break;
1750 }
1751 switch(reason) {
1752 case KDB_REASON_ENTER:
1753 case KDB_REASON_ENTER_SLAVE:
1754 case KDB_REASON_BREAK:
1755 case KDB_REASON_DEBUG:
1756 case KDB_REASON_OOPS:
1757 case KDB_REASON_SWITCH:
1758 case KDB_REASON_KEYBOARD:
1759 case KDB_REASON_NMI:
1760 if (regs && regs != get_irq_regs()) {
1761 old_regs = set_irq_regs(regs);
1762 old_regs_saved = 1;
1763 }
1764 break;
1765 default:
1766 break;
1767 }
1768 if (kdb_continue_catastrophic > 2) {
1769 kdb_printf("kdb_continue_catastrophic is out of range, setting to 2/n");
1770 kdb_continue_catastrophic = 2;
1771 }
1772 if (!kdb_on && KDB_FLAG(CATASTROPHIC) && kdb_continue_catastrophic == 2) {
1773 KDB_FLAG_SET(ONLY_DO_DUMP);
1774 }
1775 if (!kdb_on && !KDB_FLAG(ONLY_DO_DUMP))
1776 goto out;
1777
1778 KDB_DEBUG_STATE("kdb 1", reason);
1779 KDB_STATE_CLEAR(SUPPRESS);
1780
1781 /* Filter out userspace breakpoints first, no point in doing all
1782 * the kdb smp fiddling when it is really a gdb trap.
1783 * Save the single step status first, kdba_db_trap clears ss status.
1784 * kdba_b[dp]_trap sets SSBPT if required.
1785 */
1786 ss_event = KDB_STATE(DOING_SS) || KDB_STATE(SSBPT);
1787 #ifdef CONFIG_CPU_XSCALE
1788 if ( KDB_STATE(A_XSC_ICH) ) {
1789 /* restore changed I_BIT */
1790 KDB_STATE_CLEAR(A_XSC_ICH);
1791 kdba_restore_retirq(regs, KDB_STATE(A_XSC_IRQ));
1792 if ( !ss_event ) {
1793 kdb_printf("Stranger!!! Why IRQ bit is changed====/n");
1794 }
1795 }
1796 #endif
1797 if (reason == KDB_REASON_BREAK) {
1798 db_result = kdba_bp_trap(regs, error); /* Only call this once */
1799 }
1800 if (reason == KDB_REASON_DEBUG) {
1801 db_result = kdba_db_trap(regs, error); /* Only call this once */
1802 }
1803
1804 if ((reason == KDB_REASON_BREAK || reason == KDB_REASON_DEBUG)
1805 && db_result == KDB_DB_NOBPT) {
1806 KDB_DEBUG_STATE("kdb 2", reason);
1807 goto out; /* Not one of mine */
1808 }
1809
1810 /* Turn off single step if it was being used */
1811 if (ss_event) {
1812 kdba_clearsinglestep(regs);
1813 /* Single step after a breakpoint removes the need for a delayed reinstall */
1814 if (reason == KDB_REASON_BREAK || reason == KDB_REASON_DEBUG)
1815 KDB_STATE_CLEAR(SSBPT);
1816 }
1817
1818 /* kdb can validly reenter but only for certain well defined conditions */
1819 if (reason == KDB_REASON_DEBUG
1820 && !KDB_STATE(HOLD_CPU)
1821 && ss_event)
1822 KDB_STATE_SET(REENTRY);
1823 else
1824 KDB_STATE_CLEAR(REENTRY);
1825
1826 /* Wait for previous kdb event to completely exit before starting
1827 * a new event.
1828 */
1829 while (kdb_previous_event())
1830 ;
1831 KDB_DEBUG_STATE("kdb 3", reason);
1832
1833 /*
1834 * If kdb is already active, print a message and try to recover.
1835 * If recovery is not possible and recursion is allowed or
1836 * forced recursion without recovery is set then try to recurse
1837 * in kdb. Not guaranteed to work but it makes an attempt at
1838 * debugging the debugger.
1839 */
1840 if (reason != KDB_REASON_SWITCH &&
1841 reason != KDB_REASON_ENTER_SLAVE) {
1842 if (KDB_IS_RUNNING() && !KDB_STATE(REENTRY)) {
1843 int recover = 1;
1844 unsigned long recurse = 0;
1845 kdb_printf("kdb: Debugger re-entered on cpu %d, new reason = %d/n",
1846 smp_processor_id(), reason);
1847 /* Should only re-enter from released cpu */
1848
1849 if (KDB_STATE(HOLD_CPU)) {
1850 kdb_printf(" Strange, cpu %d should not be running/n", smp_processor_id());
1851 recover = 0;
1852 }
1853 if (!KDB_STATE(CMD)) {
1854 kdb_printf(" Not executing a kdb command/n");
1855 recover = 0;
1856 }
1857 if (!KDB_STATE(LONGJMP)) {
1858 kdb_printf(" No longjmp available for recovery/n");
1859 recover = 0;
1860 }
1861 kdbgetulenv("RECURSE", &recurse);
1862 if (recurse > 1) {
1863 kdb_printf(" Forced recursion is set/n");
1864 recover = 0;
1865 }
1866 if (recover) {
1867 kdb_printf(" Attempting to abort command and recover/n");
1868 #ifdef kdba_setjmp
1869 kdba_longjmp(&kdbjmpbuf[smp_processor_id()], 0);
1870 #endif /* kdba_setjmp */
1871 }
1872 if (recurse) {
1873 if (KDB_STATE(RECURSE)) {
1874 kdb_printf(" Already in recursive mode/n");
1875 } else {
1876 kdb_printf(" Attempting recursive mode/n");
1877 KDB_STATE_SET(RECURSE);
1878 KDB_STATE_SET(REENTRY);
1879 reason2 = KDB_REASON_RECURSE;
1880 recover = 1;
1881 }
1882 }
1883 if (!recover) {
1884 kdb_printf(" Cannot recover, allowing event to proceed/n");
1885 /*temp*/
1886 while (KDB_IS_RUNNING())
1887 cpu_relax();
1888 goto out;
1889 }
1890 }
1891 } else if (reason == KDB_REASON_SWITCH && !KDB_IS_RUNNING()) {
1892 kdb_printf("kdb: CPU switch without kdb running, I'm confused/n");
1893 goto out;
1894 }
1895
1896 /*
1897 * Disable interrupts, breakpoints etc. on this processor
1898 * during kdb command processing
1899 */
1900 KDB_STATE_SET(KDB);
1901 kdba_disableint(&int_state);
1902 if (!KDB_STATE(KDB_CONTROL)) {
1903 kdb_bp_remove_local();
1904 KDB_STATE_SET(KDB_CONTROL);
1905 }
1906
1907 /*
1908 * If not entering the debugger due to CPU switch or single step
1909 * reentry, serialize access here.
1910 * The processors may race getting to this point - if,
1911 * for example, more than one processor hits a breakpoint
1912 * at the same time. We'll serialize access to kdb here -
1913 * other processors will loop here, and the NMI from the stop
1914 * IPI will take them into kdb as switch candidates. Once
1915 * the initial processor releases the debugger, the rest of
1916 * the processors will race for it.
1917 *
1918 * The above describes the normal state of affairs, where two or more
1919 * cpus that are entering kdb at the "same" time are assumed to be for
1920 * separate events. However some processes such as ia64 MCA/INIT will
1921 * drive all the cpus into error processing at the same time. For that
1922 * case, all of the cpus entering kdb at the "same" time are really a
1923 * single event.
1924 *
1925 * That case is handled by the use of KDB_ENTER by one cpu (the
1926 * monarch) and KDB_ENTER_SLAVE on the other cpus (the slaves).
1927 * KDB_ENTER_SLAVE maps to KDB_REASON_ENTER_SLAVE. The slave events
1928 * will be treated as if they had just responded to the kdb IPI, i.e.
1929 * as if they were KDB_REASON_SWITCH.
1930 *
1931 * Because of races across multiple cpus, ENTER_SLAVE can occur before
1932 * the main ENTER. Hold up ENTER_SLAVE here until the main ENTER
1933 * arrives.
1934 */
1935
1936 if (reason == KDB_REASON_ENTER_SLAVE) {
1937 spin_lock(&kdb_lock);
1938 while (!KDB_IS_RUNNING()) {
1939 spin_unlock(&kdb_lock);
1940 while (!KDB_IS_RUNNING())
1941 cpu_relax();
1942 spin_lock(&kdb_lock);
1943 }
1944 reason = KDB_REASON_SWITCH;
1945 KDB_STATE_SET(HOLD_CPU);
1946 spin_unlock(&kdb_lock);
1947 }
1948
1949 if (reason == KDB_REASON_SWITCH || KDB_STATE(REENTRY))
1950 ; /* drop through */
1951 else {
1952 KDB_DEBUG_STATE("kdb 4", reason);
1953 spin_lock(&kdb_lock);
1954 while (KDB_IS_RUNNING() || kdb_previous_event()) {
1955 spin_unlock(&kdb_lock);
1956 while (KDB_IS_RUNNING() || kdb_previous_event())
1957 cpu_relax();
1958 spin_lock(&kdb_lock);
1959 }
1960 KDB_DEBUG_STATE("kdb 5", reason);
1961
1962 kdb_initial_cpu = smp_processor_id();
1963 ++kdb_seqno;
1964 spin_unlock(&kdb_lock);
1965 if (!kdb_quiet(reason))
1966 notify_die(DIE_KDEBUG_ENTER, "KDEBUG ENTER", regs, error, 0, 0);
1967 }
1968
1969 if (smp_processor_id() == kdb_initial_cpu
1970 && !KDB_STATE(REENTRY)) {
1971 KDB_STATE_CLEAR(HOLD_CPU);
1972 KDB_STATE_CLEAR(WAIT_IPI);
1973 kdb_check_i8042();
1974 /*
1975 * Remove the global breakpoints. This is only done
1976 * once from the initial processor on initial entry.
1977 */
1978 if (!kdb_quiet(reason) || smp_processor_id() == 0)
1979 kdb_bp_remove_global();
1980
1981 /*
1982 * If SMP, stop other processors. The other processors
1983 * will enter kdb() with KDB_REASON_SWITCH and spin in
1984 * kdb_main_loop().
1985 */
1986 KDB_DEBUG_STATE("kdb 6", reason);
1987 if (NR_CPUS > 1 && !kdb_quiet(reason)) {
1988 int i;
1989 for (i = 0; i < NR_CPUS; ++i) {
1990 if (!cpu_online(i))
1991 continue;
1992 if (i != kdb_initial_cpu) {
1993 KDB_STATE_SET_CPU(HOLD_CPU, i);
1994 KDB_STATE_SET_CPU(WAIT_IPI, i);
1995 }
1996 }
1997 KDB_DEBUG_STATE("kdb 7", reason);
1998 smp_kdb_stop();
1999 KDB_DEBUG_STATE("kdb 8", reason);
2000 }
2001 }
2002
2003 if (KDB_STATE(GO1)) {
2004 kdb_bp_remove_global(); /* They were set for single-step purposes */
2005 KDB_STATE_CLEAR(GO1);
2006 reason = KDB_REASON_SILENT; /* Now silently go */
2007 }
2008
2009 /* Set up a consistent set of process stacks before talking to the user */
2010 KDB_DEBUG_STATE("kdb 9", result);
2011 result = kdba_main_loop(reason, reason2, error, db_result, regs);
2012
2013 KDB_DEBUG_STATE("kdb 10", result);
2014 kdba_adjust_ip(reason2, error, regs);
2015 KDB_STATE_CLEAR(LONGJMP);
2016 KDB_DEBUG_STATE("kdb 11", result);
2017 /* go which requires single-step over a breakpoint must only release
2018 * one cpu.
2019 */
2020 if (result == KDB_CMD_GO && KDB_STATE(SSBPT))
2021 KDB_STATE_SET(GO1);
2022
2023 if (smp_processor_id() == kdb_initial_cpu &&
2024 !KDB_STATE(DOING_SS) &&
2025 !KDB_STATE(RECURSE)) {
2026 /*
2027 * (Re)install the global breakpoints and cleanup the cached
2028 * symbol table. This is only done once from the initial
2029 * processor on go.
2030 */
2031 KDB_DEBUG_STATE("kdb 12", reason);
2032 if (!kdb_quiet(reason) || smp_processor_id() == 0) {
2033 kdb_bp_install_global(regs);
2034 kdbnearsym_cleanup();
2035 debug_kusage();
2036 }
2037 if (!KDB_STATE(GO1)) {
2038 /*
2039 * Release all other cpus which will see KDB_STATE(LEAVING) is set.
2040 */
2041 int i;
2042 for (i = 0; i < NR_CPUS; ++i) {
2043 if (KDB_STATE_CPU(KDB, i))
2044 KDB_STATE_SET_CPU(LEAVING, i);
2045 KDB_STATE_CLEAR_CPU(WAIT_IPI, i);
2046 KDB_STATE_CLEAR_CPU(HOLD_CPU, i);
2047 }
2048 /* Wait until all the other processors leave kdb */
2049 while (kdb_previous_event() != 1)
2050 ;
2051 if (!kdb_quiet(reason))
2052 notify_die(DIE_KDEBUG_LEAVE, "KDEBUG LEAVE", regs, error, 0, 0);
2053 kdb_initial_cpu = -1; /* release kdb control */
2054 KDB_DEBUG_STATE("kdb 13", reason);
2055 }
2056 }
2057
2058 KDB_DEBUG_STATE("kdb 14", result);
2059 kdba_restoreint(&int_state);
2060 #ifdef CONFIG_CPU_XSCALE
2061 if ( smp_processor_id() == kdb_initial_cpu &&
2062 ( KDB_STATE(SSBPT) | KDB_STATE(DOING_SS) )
2063 ) {
2064 kdba_setsinglestep(regs);
2065 // disable IRQ in stack frame
2066 KDB_STATE_SET(A_XSC_ICH);
2067 if ( kdba_disable_retirq(regs) ) {
2068 KDB_STATE_SET(A_XSC_IRQ);
2069 }
2070 else {
2071 KDB_STATE_CLEAR(A_XSC_IRQ);
2072 }
2073 }
2074 #endif
2075
2076 /* Only do this work if we are really leaving kdb */
2077 if (!(KDB_STATE(DOING_SS) || KDB_STATE(SSBPT) || KDB_STATE(RECURSE))) {
2078 KDB_DEBUG_STATE("kdb 15", result);
2079 kdb_bp_install_local(regs);
2080 if (old_regs_saved)
2081 set_irq_regs(old_regs);
2082 KDB_STATE_CLEAR(KDB_CONTROL);
2083 }
2084
2085 KDB_DEBUG_STATE("kdb 16", result);
2086 KDB_FLAG_CLEAR(CATASTROPHIC);
2087 KDB_STATE_CLEAR(IP_ADJUSTED); /* Re-adjust ip next time in */
2088 KDB_STATE_CLEAR(KEYBOARD);
2089 KDB_STATE_CLEAR(KDB); /* Main kdb state has been cleared */
2090 KDB_STATE_CLEAR(RECURSE);
2091 KDB_STATE_CLEAR(LEAVING); /* No more kdb work after this */
2092 KDB_DEBUG_STATE("kdb 17", reason);
2093 out:
2094 atomic_dec(&kdb_event);
2095 preempt_enable();
2096 return result != 0;
2097 }
星爷说:”做人要是没有理想,那和咸鱼有什么区别呀?”而我现在的理想就是希望能够看懂这个变态的函数.老实说,面对这么一个庞然大物,要完全看明白的确有难度,我只能郭天王的话来鼓励一下自己了:”因为难,才好玩,结果,重要吗?”
在我们前面调用kdb()的时候,我们传递的第一个参数,reason,有两种情况,一种是我们传递的是KDB_REASON_KEYBOARD,一种是KDB_REASON_ENTER,因此我们就分别来看看这两种情形:
你会发现,虽然kdb()看起来很复杂,但其实你真正需要关注的代码并不多,很多代码实际上在普通情况下不会执行,另外很多代码只是用于调试,尤其是你第一次进入kdb,你首先要关注的代码就是1998行这个smp_kdb_stop().这个函数也是体系结构相关的,对于i386,其定义在arch/i386/kdb/kdbasupport.c:
虽然这个函数的名字已经隐隐约约告诉我们,其作用就是停止其它的处理器.但我们还是有必要仔细看一看.这里涉及到IPI,IPI是处理器间中断(interprocessor interrupt),这是smp中很重要的一个概念.而send_IPI_allbutself(int vector)就是发送IPI给除自己以外的所有CPU,这个参数vector表示中断类型.而之前我们其实也看到过KDB_VECTOR.我们在arch/i386/kernel/entry.S中要加入下面这段:
很明显,这又是一个中断句柄,并且真正的中断服务函数就是smp_kdb_interrupt().这个函数是我们自己定义的,对于i386,它的定义在arch/i386/kdb/kdbasupport.c:
写代码和我们与人交往是一样的,有很多东西都是虚的.我们常说,废话是人际关系的第一句话,写代码也是如此,为了调用一个关键的函数,首先它会调用一些看起来没什么意义的函数,就比如这个smp_kdb_interrupt(),它最终的目的就是调用kdb_ipi(),可在这之前它调用了set_irq_regs(),ack_APIC_irq(),irq_enter()等函数,当然你如果真的理解这段代码的话,你确实可以说这些函数调用都是很有必要的,不过从看代码的角度来说,你可以不必care这几个函数,直接切入重点,直面kdb_ipi().这个函数来自于我们的common patch,它被添加到了kdb/kdbsupport.c中:
注意到刚才调用kdb_ipi()的时候,第二个参数是NULL,所以这里的ack_interrupt是NULL,换言之,291行不会被执行.于是乎,真正要执行的也就只是294行这个kdb(),记住它的第一个参数是KDB_REASON_SWITCH.这很有趣,等于说各个处理器都要在同一时刻执行kdb(),但是它们虽然执行同一个函数,意义却截然不同,原因是它们的参数一个是KDB_REASON_SWITCH,另一个则是KDB_REASON_KEYBOARD.这种情形在生活中也很普遍,它就相当于不同的人,虽然做同一件事情,但是意义却不同,比如,对于色狼来说,脱光了就开始”娱乐”;对于艺术家来说,脱光了就开始”艺术”.