Mono源码阅读-崩溃机制

Mono源码阅读-崩溃机制

# 简介

本文主要针对mono源码中关于崩溃信号量处理的相关源码进行阅读和研究,源码涉及的代码文件如下:

  • mini.c

  • mini-posix.c

  • mini-exceptions.c

  • exceptions-arm.c

Install Signal Handler

add_signal_handler

所有信号handler的注册函数都是调用 add_signal_handler的.

mono代码里一共调用这个函数来注册信号量的函数有:

interp.c

  • mono_runtime_install_handlers

mini-posix.c

  • mono_runtime_posix_install_handlers
  • mono_runtime_setup_stat_profiler (SIGPROF)

mono_runtime_posix_install_handlers

这里主要关注mini目录下的信号量注册.

Mono捕捉的信号:

  • SIGINT (if handle sigint)
  • SIGFPE
  • SIGQUIT
  • SIGILL
  • SIGBUS
  • SIGUSR2(if mono_jit_trace_calls != null)
  • SIGUSR1 -> mono_thread_get_abort_signal(0
  • SIGABRT
  • SIGSEGV
常量 解释
SIGSEGV 非法内存访问(段错误),试图访问未分配给自己的内存, 或试图往没有写权限的内存地址写数据.
SIGINT 外部中断,通常为用户所发动, 程序终止(interrupt)信号, 在用户键入INTR字符(通常是Ctrl-C)时发出,用于通知前台进程组终止进程。
SIGILL 非法程序映像,例如非法指令, 执行了非法指令. 通常是因为可执行文件本身出现错误, 或者试图执行数据段. 堆栈溢出时也有可能产生这个信号。
SIGABRT 异常终止条件,例如 abort() 所起始的
SIGFPE 在发生致命的算术运算错误时发出. 不仅包括浮点运算错误, 还包括溢出及除数为0等其它所有的算术的错误。
SIGQUIT 和SIGINT类似, 但由QUIT字符(通常是Ctrl-\)来控制. 进程在因收到SIGQUIT退出时会产生core文件, 在这个意义上类似于一个程序错误信号。
SIGBUS 非法地址, 包括内存地址对齐(alignment)出错。比如访问一个四个字长的整数, 但其地址不是4的倍数。它与SIGSEGV的区别在于后者是由于对合法存储地址的非法访问触发的(如访问不属于自己存储空间或只读存储空间)。
SIGUSR1 留给用户使用
SIGUSR1 留给用户使用

注册信号量的代码:

void
mono_runtime_posix_install_handlers (void)
{


    sigset_t signal_set;


    if (mini_get_debug_options ()->handle_sigint)
        add_signal_handler (SIGINT, mono_sigint_signal_handler);


    add_signal_handler (SIGFPE, mono_sigfpe_signal_handler);
    add_signal_handler (SIGQUIT, sigquit_signal_handler);
    add_signal_handler (SIGILL, mono_sigill_signal_handler);
    add_signal_handler (SIGBUS, mono_sigsegv_signal_handler);
    if (mono_jit_trace_calls != NULL)
        add_signal_handler (SIGUSR2, sigusr2_signal_handler);


    add_signal_handler (mono_thread_get_abort_signal (), sigusr1_signal_handler);
    /* it seems to have become a common bug for some programs that run as parents
     * of many processes to block signal delivery for real time signals.
     * We try to detect and work around their breakage here.
     */
    sigemptyset (&signal_set);
    sigaddset (&signal_set, mono_thread_get_abort_signal ());
    sigprocmask (SIG_UNBLOCK, &signal_set, NULL);


    signal (SIGPIPE, SIG_IGN);


#ifndef MONO_CROSS_COMPILE
    add_signal_handler (SIGABRT, sigabrt_signal_handler);


    /* catch SIGSEGV */
    add_signal_handler (SIGSEGV, mono_sigsegv_signal_handler);
#endif
}

Signal Handler

所有的信号handler都是使用 SIG_HANDLER_SIGNATURE 宏来定义的:

mini-posix.c

  • sigabrt_signal_handler
  • sigprof_signal_handler
  • sigquit_signal_handler
  • siguser1_signal_handler
  • sigusr2_signal_handler

mini.c

  • mono_sigfpe_signal_handler
  • mono_sigill_signal_handler
  • mono_sigsegv_signal_handler
  • mono_sigint_signal_handler

SIGINT

void
SIG_HANDLER_SIGNATURE (mono_sigint_signal_handler)
{
    MonoException *exc;
    GET_CONTEXT;


    exc = mono_get_exception_execution_engine ("Interrupted (SIGINT).");

    mono_arch_handle_exception (ctx, exc, FALSE);
}

mono_arch_handle_exception

/*
* This is the function called from the signal handler
*/
gboolean
mono_arch_handle_exception (void *ctx, gpointer obj, gboolean test_only)
{
    MonoContext mctx;
    gboolean result;


    mono_arch_sigctx_to_monoctx (ctx, &mctx);


    result = mono_handle_exception (&mctx, obj, (gpointer)mctx.eip, test_only);
    /* restore the context so that returning from the signal handler will invoke
     * the catch clause
     */
    mono_arch_monoctx_to_sigctx (&mctx, ctx);
    return result;
}

SEGV

如果没有 mono_domain_get() 或者没有 jit_tls 则可以认为该线程非管理线程, 则调用
mono_chain_signal 来调用注册的chian signal handler 去处理, 如果该handler返回true, 则mono直接return不做任何处理 , 否则mono会调用
如果是管理线程, 那么和在C#里面Throw Exception的逻辑一样, 调用mono_handle_exception去处理C#的异常.
mono_handle_native_sigsegv来打印堆栈并最后调用 abort()

这里的chain_signal_handler就是mono在注册信号量的时候预先保存了之前的signal_handler
saved_handler

void
SIG_HANDLER_SIGNATURE (mono_sigsegv_signal_handler)
{
    MonoJitInfo *ji;
    MonoJitTlsData *jit_tls = TlsGetValue (mono_jit_tls_id);
    gpointer ip;


    GET_CONTEXT;


#if defined(MONO_ARCH_SOFT_DEBUG_SUPPORTED) && defined(HAVE_SIG_INFO)
    if (mono_arch_is_single_step_event (info, ctx)) {
        mono_debugger_agent_single_step_event (ctx);
        return;
    } else if (mono_arch_is_breakpoint_event (info, ctx)) {
        mono_debugger_agent_breakpoint_hit (ctx);
        return;
    }
#endif


#if !defined(PLATFORM_WIN32) && defined(HAVE_SIG_INFO)
    if (mono_aot_is_pagefault (info->si_addr)) {
        mono_aot_handle_pagefault (info->si_addr);
        return;
    }
#endif


    /* The thread might no be registered with the runtime */
    if (!mono_domain_get () || !jit_tls) {
        if (mono_chain_signal (SIG_HANDLER_PARAMS))
            return;
        mono_handle_native_sigsegv (SIGSEGV, ctx);
    }


    ip = mono_arch_ip_from_context (ctx);
#ifdef _WIN64
    /* Sometimes on win64 we get null IP, but the previous frame is a valid managed frame */
    /* So pop and try again */
    if (!ip && ctx)
    {
        MonoContext *context = (MonoContext*)ctx;
        gpointer *sp = context->rsp;
        if (sp)
        {
            ip = context->rip = *sp;
            context->rsp += sizeof(gpointer);
        }
    }
#endif
    ji = mono_jit_info_table_find (mono_domain_get (), ip);


#ifdef MONO_ARCH_SIGSEGV_ON_ALTSTACK
    if (mono_handle_soft_stack_ovf (jit_tls, ji, ctx, (guint8*)info->si_addr))
        return;


    /* The hard-guard page has been hit: there is not much we can do anymore
     * Print a hopefully clear message and abort.
     */
    if (jit_tls->stack_size &&
            ABS ((guint8*)info->si_addr - ((guint8*)jit_tls->end_of_stack - jit_tls->stack_size)) < 32768) {
        const char *method;
        /* we don't do much now, but we can warn the user with a useful message */
        fprintf (stderr, "Stack overflow: IP: %p, fault addr: %p\n", mono_arch_ip_from_context (ctx), (gpointer)info->si_addr);
        if (ji && ji->method)
            method = mono_method_full_name (ji->method, TRUE);
        else
            method = "Unmanaged";
        fprintf (stderr, "At %s\n", method);
        _exit (1);
    } else {
        /* The original handler might not like that it is executed on an altstack... */
        if (!ji && mono_chain_signal (SIG_HANDLER_PARAMS))
            return;


        mono_arch_handle_altstack_exception (ctx, info->si_addr, FALSE);
    }
#else


    if (!ji) {
        if (mono_chain_signal (SIG_HANDLER_PARAMS))
            return;


        mono_handle_native_sigsegv (SIGSEGV, ctx);
    }

    mono_arch_handle_exception (ctx, NULL, FALSE);
#endif
}

mono_handle_native_sigsegv

几个关键点:

  • mono_backtrace_from_context (OS X) 从sig context里转成MonoContext, 并且回溯堆栈的每一个PC
  • backtrace (非OS X) 也是回溯堆栈的每一个PC值
  • backtrace_symbols 将每个PC值转换成函数名(符号名称)

然后将堆栈打印到stderr

然后通过GDB获取更详细的调试信息, 并打印到stderr.

最后去掉监听ABRT信号量, 然后调用 abort() 函数来退出程序.

Throw Exception

mono_arm_throw_exception

exceptions-arm.c

抛出异常的代码:

void
mono_arm_throw_exception (MonoObject *exc, unsigned long eip, unsigned long esp, gulong *int_regs, gdouble *fp_regs)
{
    static void (*restore_context) (MonoContext *);
    MonoContext ctx;
    gboolean rethrow = eip & 1;


    if (!restore_context)
        restore_context = mono_get_restore_context ();


    eip &= ~1; /* clear the optional rethrow bit */
    /* adjust eip so that it point into the call instruction */
    eip -= 4;


    /*printf ("stack in throw: %p\n", esp);*/
    MONO_CONTEXT_SET_BP (&ctx, int_regs [ARMREG_FP - 4]);
    MONO_CONTEXT_SET_SP (&ctx, esp);
    MONO_CONTEXT_SET_IP (&ctx, eip);
    memcpy (((guint8*)&ctx.regs) + (4 * 4), int_regs, sizeof (gulong) * 8);
    /* memcpy (&ctx.fregs, fp_regs, sizeof (double) * MONO_SAVED_FREGS); */


    if (mono_object_isinst (exc, mono_defaults.exception_class)) {
        MonoException *mono_ex = (MonoException*)exc;
        if (!rethrow)
            mono_ex->stack_trace = NULL;
    }
    mono_handle_exception (&ctx, exc, (gpointer)(eip + 4), FALSE);
    restore_context (&ctx);
    g_assert_not_reached ();
}

保存context
mono_handle_exception
还原context

mono_handle_exception

mini-exceptions.c

  • mono_handle_exception
    • mono_handle_exception_internal

MonoContext

Mono为了做平台兼容性, 将sig_context全部统一成 MonoContext 结构体, 主要包括寄存器的各类值, 例如ARM下保存了PC, FP, SP和R0-R15

typedef struct {
    gulong eip;          // pc
    gulong ebp;          // fp
    gulong esp;          // sp
    gulong regs [16];
    double fregs [MONO_SAVED_FREGS];
} MonoContext;

eip -> sigctx.arm_pc (R15)
esp -> sigctx.arm_sp (RR13)
ebp -> sigctx.arm_fp (R11)
regs -> sigctx.arm_r0, sizeof(gulong) * 16 (R0 ~ R15)


http://www.mono-project.com/docs/debug+profile/debug/
http://www.mono-project.com/docs/advanced/embedding/

define mono_backtrace select-frame 0 set $i = 0 while ($i < $arg0) set $foo = (char*) mono_pmip ($pc) if ($foo) printf "#%d %p in %s\n", $i, $pc, $foo else frame end up-silently set $i = $i + 1 end end
define mono_stack set $mono_thread = mono_thread_current () if ($mono_thread == 0x00) printf "No mono thread associated with this thread\n" else set $ucp = malloc (sizeof (ucontext_t)) call (void) getcontext ($ucp) call (void) mono_print_thread_dump ($ucp) call (void) free ($ucp) end end

mono_chain_signal 调用原handler
mono_handle_native_sigsegv 打印堆栈 and abort()
mono_arch_handle_exception // exceptions-arm.c
mono_handle_exception_internal // mini-exceptions.c

    if (!ji) {
        if (mono_chain_signal (SIG_HANDLER_PARAMS))
            return;


        mono_handle_native_sigsegv (SIGSEGV, ctx);
    }

    mono_arch_handle_exception (ctx, exc, FALSE);

mini.c
SIG_HANDLER_SIGNATURE (mono_sigfpe_signal_handler)
SIG_HANDLER_SIGNATURE (mono_sigill_signal_handler)
SIG_HANDLER_SIGNATURE (mono_sigsegv_signal_handler)
SIG_HANDLER_SIGNATURE (mono_sigint_signal_handler)

mini-posix.c
SIG_HANDLER_SIGNATURE (sigabrt_signal_handler)
SIG_HANDLER_SIGNATURE (sigusr1_signal_handler)
SIG_HANDLER_SIGNATURE (sigprof_signal_handler)
SIG_HANDLER_SIGNATURE (sigquit_signal_handler)
SIG_HANDLER_SIGNATURE (sigusr2_signal_handler)


在非管理线程, 无法获取 tls, 主要是两个:

  • mono_domain_get()
  • jit_tls

就算可以通过ptrace获取tls, 但因为必须调用如下几个函数来walk stack,

  • mono_jit_walk_stack_from_ctx
  • mono_walk_stack

这里函数里面都去访问了tls, 因此无法传值进去, 如果自己去实现这两个函数, 又因为很多结构体无法访问到, 因此不能自己去实现stack walker

就算只希望拿到最后一个pc, 去获取c#函数名, 因为所有的C#的函数信息都存放在 domain 里的jitInfoTable 里, 如果可以获取到 current domain对象, 那么也可以通过
mono_jit_info_table_find(domain, addr) 函数来获取到 MonoJitInfo, 然后用 mono_jit_info_get_method 获取 MonoMethod, 最后通过 mono_method_full_name 来得到函数名.


NOTE ATTRIBUTES

Created Date: 2018-05-25 00:55:50
Last Evernote Update Date: 2020-05-23 07:15:03

你可能感兴趣的:(Mono源码阅读-崩溃机制)