Mono源码阅读-崩溃机制
# 简介
本文主要针对mono源码中关于崩溃信号量处理的相关源码进行阅读和研究,源码涉及的代码文件如下:
mini.c
mini-posix.c
mini-exceptions.c
exceptions-arm.c
Install Signal Handler
add_signal_handler
所有信号handler的注册函数都是调用 add_signal_handler的.
mono代码里一共调用这个函数来注册信号量的函数有:
interp.c
- mono_runtime_install_handlers
mini-posix.c
- mono_runtime_posix_install_handlers
- mono_runtime_setup_stat_profiler (SIGPROF)
mono_runtime_posix_install_handlers
这里主要关注mini目录下的信号量注册.
Mono捕捉的信号:
- SIGINT (if handle sigint)
- SIGFPE
- SIGQUIT
- SIGILL
- SIGBUS
- SIGUSR2(if mono_jit_trace_calls != null)
- SIGUSR1 -> mono_thread_get_abort_signal(0
- SIGABRT
- SIGSEGV
常量 | 解释 |
---|---|
SIGSEGV | 非法内存访问(段错误),试图访问未分配给自己的内存, 或试图往没有写权限的内存地址写数据. |
SIGINT | 外部中断,通常为用户所发动, 程序终止(interrupt)信号, 在用户键入INTR字符(通常是Ctrl-C)时发出,用于通知前台进程组终止进程。 |
SIGILL | 非法程序映像,例如非法指令, 执行了非法指令. 通常是因为可执行文件本身出现错误, 或者试图执行数据段. 堆栈溢出时也有可能产生这个信号。 |
SIGABRT | 异常终止条件,例如 abort() 所起始的 |
SIGFPE | 在发生致命的算术运算错误时发出. 不仅包括浮点运算错误, 还包括溢出及除数为0等其它所有的算术的错误。 |
SIGQUIT | 和SIGINT类似, 但由QUIT字符(通常是Ctrl-\)来控制. 进程在因收到SIGQUIT退出时会产生core文件, 在这个意义上类似于一个程序错误信号。 |
SIGBUS | 非法地址, 包括内存地址对齐(alignment)出错。比如访问一个四个字长的整数, 但其地址不是4的倍数。它与SIGSEGV的区别在于后者是由于对合法存储地址的非法访问触发的(如访问不属于自己存储空间或只读存储空间)。 |
SIGUSR1 | 留给用户使用 |
SIGUSR1 | 留给用户使用 |
注册信号量的代码:
void
mono_runtime_posix_install_handlers (void)
{
sigset_t signal_set;
if (mini_get_debug_options ()->handle_sigint)
add_signal_handler (SIGINT, mono_sigint_signal_handler);
add_signal_handler (SIGFPE, mono_sigfpe_signal_handler);
add_signal_handler (SIGQUIT, sigquit_signal_handler);
add_signal_handler (SIGILL, mono_sigill_signal_handler);
add_signal_handler (SIGBUS, mono_sigsegv_signal_handler);
if (mono_jit_trace_calls != NULL)
add_signal_handler (SIGUSR2, sigusr2_signal_handler);
add_signal_handler (mono_thread_get_abort_signal (), sigusr1_signal_handler);
/* it seems to have become a common bug for some programs that run as parents
* of many processes to block signal delivery for real time signals.
* We try to detect and work around their breakage here.
*/
sigemptyset (&signal_set);
sigaddset (&signal_set, mono_thread_get_abort_signal ());
sigprocmask (SIG_UNBLOCK, &signal_set, NULL);
signal (SIGPIPE, SIG_IGN);
#ifndef MONO_CROSS_COMPILE
add_signal_handler (SIGABRT, sigabrt_signal_handler);
/* catch SIGSEGV */
add_signal_handler (SIGSEGV, mono_sigsegv_signal_handler);
#endif
}
Signal Handler
所有的信号handler都是使用 SIG_HANDLER_SIGNATURE 宏来定义的:
mini-posix.c
- sigabrt_signal_handler
- sigprof_signal_handler
- sigquit_signal_handler
- siguser1_signal_handler
- sigusr2_signal_handler
mini.c
- mono_sigfpe_signal_handler
- mono_sigill_signal_handler
- mono_sigsegv_signal_handler
- mono_sigint_signal_handler
SIGINT
void
SIG_HANDLER_SIGNATURE (mono_sigint_signal_handler)
{
MonoException *exc;
GET_CONTEXT;
exc = mono_get_exception_execution_engine ("Interrupted (SIGINT).");
mono_arch_handle_exception (ctx, exc, FALSE);
}
mono_arch_handle_exception
/*
* This is the function called from the signal handler
*/
gboolean
mono_arch_handle_exception (void *ctx, gpointer obj, gboolean test_only)
{
MonoContext mctx;
gboolean result;
mono_arch_sigctx_to_monoctx (ctx, &mctx);
result = mono_handle_exception (&mctx, obj, (gpointer)mctx.eip, test_only);
/* restore the context so that returning from the signal handler will invoke
* the catch clause
*/
mono_arch_monoctx_to_sigctx (&mctx, ctx);
return result;
}
SEGV
如果没有 mono_domain_get() 或者没有 jit_tls 则可以认为该线程非管理线程, 则调用
mono_chain_signal 来调用注册的chian signal handler 去处理, 如果该handler返回true, 则mono直接return不做任何处理 , 否则mono会调用
如果是管理线程, 那么和在C#里面Throw Exception的逻辑一样, 调用mono_handle_exception去处理C#的异常.
mono_handle_native_sigsegv来打印堆栈并最后调用 abort()
这里的chain_signal_handler就是mono在注册信号量的时候预先保存了之前的signal_handler
saved_handler
void
SIG_HANDLER_SIGNATURE (mono_sigsegv_signal_handler)
{
MonoJitInfo *ji;
MonoJitTlsData *jit_tls = TlsGetValue (mono_jit_tls_id);
gpointer ip;
GET_CONTEXT;
#if defined(MONO_ARCH_SOFT_DEBUG_SUPPORTED) && defined(HAVE_SIG_INFO)
if (mono_arch_is_single_step_event (info, ctx)) {
mono_debugger_agent_single_step_event (ctx);
return;
} else if (mono_arch_is_breakpoint_event (info, ctx)) {
mono_debugger_agent_breakpoint_hit (ctx);
return;
}
#endif
#if !defined(PLATFORM_WIN32) && defined(HAVE_SIG_INFO)
if (mono_aot_is_pagefault (info->si_addr)) {
mono_aot_handle_pagefault (info->si_addr);
return;
}
#endif
/* The thread might no be registered with the runtime */
if (!mono_domain_get () || !jit_tls) {
if (mono_chain_signal (SIG_HANDLER_PARAMS))
return;
mono_handle_native_sigsegv (SIGSEGV, ctx);
}
ip = mono_arch_ip_from_context (ctx);
#ifdef _WIN64
/* Sometimes on win64 we get null IP, but the previous frame is a valid managed frame */
/* So pop and try again */
if (!ip && ctx)
{
MonoContext *context = (MonoContext*)ctx;
gpointer *sp = context->rsp;
if (sp)
{
ip = context->rip = *sp;
context->rsp += sizeof(gpointer);
}
}
#endif
ji = mono_jit_info_table_find (mono_domain_get (), ip);
#ifdef MONO_ARCH_SIGSEGV_ON_ALTSTACK
if (mono_handle_soft_stack_ovf (jit_tls, ji, ctx, (guint8*)info->si_addr))
return;
/* The hard-guard page has been hit: there is not much we can do anymore
* Print a hopefully clear message and abort.
*/
if (jit_tls->stack_size &&
ABS ((guint8*)info->si_addr - ((guint8*)jit_tls->end_of_stack - jit_tls->stack_size)) < 32768) {
const char *method;
/* we don't do much now, but we can warn the user with a useful message */
fprintf (stderr, "Stack overflow: IP: %p, fault addr: %p\n", mono_arch_ip_from_context (ctx), (gpointer)info->si_addr);
if (ji && ji->method)
method = mono_method_full_name (ji->method, TRUE);
else
method = "Unmanaged";
fprintf (stderr, "At %s\n", method);
_exit (1);
} else {
/* The original handler might not like that it is executed on an altstack... */
if (!ji && mono_chain_signal (SIG_HANDLER_PARAMS))
return;
mono_arch_handle_altstack_exception (ctx, info->si_addr, FALSE);
}
#else
if (!ji) {
if (mono_chain_signal (SIG_HANDLER_PARAMS))
return;
mono_handle_native_sigsegv (SIGSEGV, ctx);
}
mono_arch_handle_exception (ctx, NULL, FALSE);
#endif
}
mono_handle_native_sigsegv
几个关键点:
- mono_backtrace_from_context (OS X) 从sig context里转成MonoContext, 并且回溯堆栈的每一个PC
- backtrace (非OS X) 也是回溯堆栈的每一个PC值
- backtrace_symbols 将每个PC值转换成函数名(符号名称)
然后将堆栈打印到stderr
然后通过GDB获取更详细的调试信息, 并打印到stderr.
最后去掉监听ABRT信号量, 然后调用 abort() 函数来退出程序.
Throw Exception
mono_arm_throw_exception
exceptions-arm.c
抛出异常的代码:
void
mono_arm_throw_exception (MonoObject *exc, unsigned long eip, unsigned long esp, gulong *int_regs, gdouble *fp_regs)
{
static void (*restore_context) (MonoContext *);
MonoContext ctx;
gboolean rethrow = eip & 1;
if (!restore_context)
restore_context = mono_get_restore_context ();
eip &= ~1; /* clear the optional rethrow bit */
/* adjust eip so that it point into the call instruction */
eip -= 4;
/*printf ("stack in throw: %p\n", esp);*/
MONO_CONTEXT_SET_BP (&ctx, int_regs [ARMREG_FP - 4]);
MONO_CONTEXT_SET_SP (&ctx, esp);
MONO_CONTEXT_SET_IP (&ctx, eip);
memcpy (((guint8*)&ctx.regs) + (4 * 4), int_regs, sizeof (gulong) * 8);
/* memcpy (&ctx.fregs, fp_regs, sizeof (double) * MONO_SAVED_FREGS); */
if (mono_object_isinst (exc, mono_defaults.exception_class)) {
MonoException *mono_ex = (MonoException*)exc;
if (!rethrow)
mono_ex->stack_trace = NULL;
}
mono_handle_exception (&ctx, exc, (gpointer)(eip + 4), FALSE);
restore_context (&ctx);
g_assert_not_reached ();
}
保存context
mono_handle_exception
还原context
mono_handle_exception
mini-exceptions.c
- mono_handle_exception
- mono_handle_exception_internal
MonoContext
Mono为了做平台兼容性, 将sig_context全部统一成 MonoContext 结构体, 主要包括寄存器的各类值, 例如ARM下保存了PC, FP, SP和R0-R15
typedef struct {
gulong eip; // pc
gulong ebp; // fp
gulong esp; // sp
gulong regs [16];
double fregs [MONO_SAVED_FREGS];
} MonoContext;
eip -> sigctx.arm_pc (R15)
esp -> sigctx.arm_sp (RR13)
ebp -> sigctx.arm_fp (R11)
regs -> sigctx.arm_r0, sizeof(gulong) * 16 (R0 ~ R15)
http://www.mono-project.com/docs/debug+profile/debug/
http://www.mono-project.com/docs/advanced/embedding/
define mono_backtrace select-frame 0 set $i = 0 while ($i < $arg0) set $foo = (char*) mono_pmip ($pc) if ($foo) printf "#%d %p in %s\n", $i, $pc, $foo else frame end up-silently set $i = $i + 1 end end
define mono_stack set $mono_thread = mono_thread_current () if ($mono_thread == 0x00) printf "No mono thread associated with this thread\n" else set $ucp = malloc (sizeof (ucontext_t)) call (void) getcontext ($ucp) call (void) mono_print_thread_dump ($ucp) call (void) free ($ucp) end end
mono_chain_signal 调用原handler
mono_handle_native_sigsegv 打印堆栈 and abort()
mono_arch_handle_exception // exceptions-arm.c
mono_handle_exception_internal // mini-exceptions.c
if (!ji) {
if (mono_chain_signal (SIG_HANDLER_PARAMS))
return;
mono_handle_native_sigsegv (SIGSEGV, ctx);
}
mono_arch_handle_exception (ctx, exc, FALSE);
mini.c
SIG_HANDLER_SIGNATURE (mono_sigfpe_signal_handler)
SIG_HANDLER_SIGNATURE (mono_sigill_signal_handler)
SIG_HANDLER_SIGNATURE (mono_sigsegv_signal_handler)
SIG_HANDLER_SIGNATURE (mono_sigint_signal_handler)
mini-posix.c
SIG_HANDLER_SIGNATURE (sigabrt_signal_handler)
SIG_HANDLER_SIGNATURE (sigusr1_signal_handler)
SIG_HANDLER_SIGNATURE (sigprof_signal_handler)
SIG_HANDLER_SIGNATURE (sigquit_signal_handler)
SIG_HANDLER_SIGNATURE (sigusr2_signal_handler)
在非管理线程, 无法获取 tls, 主要是两个:
- mono_domain_get()
- jit_tls
就算可以通过ptrace获取tls, 但因为必须调用如下几个函数来walk stack,
- mono_jit_walk_stack_from_ctx
- mono_walk_stack
这里函数里面都去访问了tls, 因此无法传值进去, 如果自己去实现这两个函数, 又因为很多结构体无法访问到, 因此不能自己去实现stack walker
就算只希望拿到最后一个pc, 去获取c#函数名, 因为所有的C#的函数信息都存放在 domain 里的jitInfoTable 里, 如果可以获取到 current domain对象, 那么也可以通过
mono_jit_info_table_find(domain, addr) 函数来获取到 MonoJitInfo, 然后用 mono_jit_info_get_method 获取 MonoMethod, 最后通过 mono_method_full_name 来得到函数名.
NOTE ATTRIBUTES
Created Date: 2018-05-25 00:55:50
Last Evernote Update Date: 2020-05-23 07:15:03