目录
一、什么是core dump?
二、coredump是怎么来的?
三、怎么限制coredump文件的产生?
ulimit
半永久限制
永久限制
四、从源码分析如何对coredump文件的名字和路径管理
命名
管理
一些问题的答案
1、为什么新的ubuntu不能产生coredump了,需要手动管理?
2、systemd是什么时候出现在ubuntu中的?
3、为什么使用脚本管理coredump后ulimit就不能限制coredump文件的大小了。
4、为什么ulimit限制的大小并不准确?
5、使用脚本就没办法限制coredump的大小了嘛?
6、ubuntu现在对coredump文件的管理方式
core dump是Linux应用程序调试的一种有效方式,core dump又称为“核心转储”,是该进程实际使用的物理内存的“快照”。分析core dump文件可以获取应用程序崩溃时的现场信息,如程序运行时的CPU寄存器值、堆栈指针、栈数据、函数调用栈等信息。在嵌入式系统开发中是开发人员调试所需必备材料之一。每当程序异常行为产生coredump时我们就可以利用gdb工具去查看程序崩溃的原因。
Core dump是Linux基于信号实现的。Linux中信号是一种异步事件处理机制,每种信号都对应有默认的异常处理操作,默认操作包括忽略该信号(Ignore)、暂停进程(Stop)、终止进程(Terminate)、终止并产生core dump(Core)等。
Signal | Value | Action | Comment |
SIGHUP | 1 | Term | Hangup detected on controlling terminal or death of controlling process |
SIGINT | 2 | Term | Interrupt from keyboard |
SIGQUIT | 3 | Core | Quit from keyboard |
SIGILL | 4 | Core | Illegal Instruction |
SIGTRAP | 5 | Core | Trace/breakpoint trap |
SIGABRT | 6 | Core | Abort signal from abort(3) |
SIGIOT | 6 | Core | IOT trap. A synonym for SIGABRT |
SIGEMT | 7 | Term | |
SIGFPE | 8 | Core | Floating point exception |
SIGKILL | 9 | Term | Kill signal, cannot be caught, blocked or ignored. |
SIGBUS | 10,7,10 | Core | Bus error (bad memory access) |
SIGSEGV | 11 | Core | Invalid memory reference |
SIGPIPE | 13 | Term | Broken pipe: write to pipe with no readers |
SIGALRM | 14 | Term | Timer signal from alarm(2) |
SIGTERM | 15 | Term | Termination signal |
SIGUSR1 | 30,10,16 | Term | User-defined signal 1 |
SIGUSR2 | 31,12,17 | Term | User-defined signal 2 |
SIGCHLD | 20,17,18 | Ign | Child stopped or terminated |
SIGCONT | 19,18,25 | Cont | Continue if stopped |
SIGSTOP | 17,19,23 | Stop | Stop process, cannot be caught, blocked or ignored. |
SIGTSTP | 18,20,24 | Stop | Stop typed at terminal |
SIGTTIN | 21,21,26 | Stop | Terminal input for background process |
SIGTTOU | 22,22,27 | Stop | Terminal output for background process |
SIGIO | 23,29,22 | Term | I/O now possible (4.2BSD) |
SIGPOLL | Term | Pollable event (Sys V). Synonym for SIGIO | |
SIGPROF | 27,27,29 | Term | Profiling timer expired |
SIGSYS | 12,31,12 | Core | Bad argument to routine (SVr4) |
SIGURG | 16,23,21 | Ign | Urgent condition on socket (4.2BSD) |
SIGVTALRM | 26,26,28 | Term | Virtual alarm clock (4.2BSD) |
SIGXCPU | 24,24,30 | Core | CPU time limit exceeded (4.2BSD) |
SIGXFSZ | 25,25,31 | Core | File size limit exceeded (4.2BSD) |
SIGSTKFLT | 16 | Term | Stack fault on coprocessor (unused) |
SIGCLD | 18 | Ign | A synonym for SIGCHLD |
SIGPWR | 29,30,19 | Term | Power failure (System V) |
SIGINFO | 29 | A synonym for SIGPWR, on an alpha | |
SIGLOST | 29 | Term | File lock lost (unused), on a sparc |
SIGWINCH | 28,28,20 | Ign | Window resize signal (4.3BSD, Sun) |
SIGUNUSED | 31 | Core | Synonymous with SIGSYS |
以下情况会出现应用程序崩溃导致产生core dump:
说到限制就不得不提到ulimit工具了,这个工具其实有个坑,他并不是linux自带的工具,而是shell工具,也就是说他的级别是shell层的。
当我们新开启一个终端后之前终端的全部配置都会消失,并且他还看人下菜碟,每个linux用设置的都是自己的ulimit,比如你有两个用户,在同一个终端下对当前终端配置
一个 ulimit -c 1024
一个ulimit -c 2048
他们的限制都作用在各自的范围
即便同一个目录的同一个文件,本来这个coredump可以产生4096k个字节那么a的coredump就是1024,b的就是2048当然,不会很准确的限制,后面会从源码上给大家说明为什么。
还有就是可以通过配置文件,但是这个优先级虽然高但是可以改,这个文件只在重启时被读取一次,后面其它人再来改这个限制你之前的限制也就无效了。
vim /etc/profile
相当于每开一个shell时,自动执行ulimit -c unlimited(表示无限制其它具体数字通常表示多少k)
编辑/etc/security/limits.conf,需要重新登录,或者重新打开ssh客户端连接,永久生效
下面就是软限制和硬限制
在/etc/sysctrl.conf中保存的是用户给内核传递的参数,这里有个core_pattern参数是告诉内核怎么管理coredump的。
后面直接写等号就表示这个coredump文件的命名
%% 单个%字符
%p 所dump进程的进程ID
%u 所dump进程的实际用户ID
%g 所dump进程的实际组ID
%s 导致本次core dump的信号
%t core dump的时间 (由1970年1月1日计起的秒数)
%h 主机名
%e 程序文件名
kernel.core_pattern=|/usr/bin/coredump_helper.sh core_%e_%I_%p_sig_%s_time_%t.gz
像上面那样配置core_pattern后内核就可以使用我们指定的脚本去管理coredump文件了。
这个路径必须是绝对路径,前面必须是|前后都不能有空格,coredump文件会作为参数传入脚本。
这里我直接将文件以压缩格式传递给了脚本
#!/bin/sh
if [ ! -d "/var/coredump" ];then
mkdir -p /var/coredump
fi
gzip > "/var/coredump/$1"
上面的代码就是一个简单的示例可以把coredump都放到var下的coredump目录中,我们还可以添加一些遍历判断操作限制产生coredump的数量的单个coredump的大小。
#!/bin/bash
# Set the maximum number of files allowed in /var/coredump
max_files=1000
# Check if /var/coredump directory exists
if [ ! -d "/var/coredump" ]; then
mkdir -p /var/coredump
fi
# Get the current number of files in /var/coredump
current_files=$(ls -1 /var/coredump | wc -l)
# Check if the number of files exceeds the maximum limit
if [ $current_files -ge $max_files ]; then
# Sort the files based on creation time (oldest first)
sorted_files=$(ls -1t /var/coredump | head -n 1)
# Remove the oldest files to reduce the number of files to below the limit
rm $sorted_files
fi
# Compress the file and store it in /var/coredump using gzip
gzip > "/var/coredump/$1"
代码很简陋正常还要有出错管理,异常情况打印到日志备份等等。
因为现在的ubuntu使用的是systemd,在这里有些服务会对内核管理coredump的接口做配置,导致coredump文件被服务写到接口的脚本管理了。
systemd是在Ubuntu 15.04版本中首次出现的。在此之前,Ubuntu使用Upstart作为默认的init系统。然而,从Ubuntu 15.04开始,Ubuntu采用了systemd作为默认的init系统和服务管理器。systemd是一个用于启动、停止和管理系统进程的软件套件,它提供了更快的启动速度、更好的系统资源管理和更强大的服务控制能力。
而在Upstart之前用的才是SysVinit。
这个问题就要看源码了
void do_coredump(const siginfo_t *siginfo)
{
struct core_state core_state;
struct core_name cn;
struct mm_struct *mm = current->mm;
struct linux_binfmt * binfmt;
const struct cred *old_cred;
struct cred *cred;
int retval = 0;
int flag = 0;
int ispipe;
struct files_struct *displaced;
bool need_nonrelative = false;
bool core_dumped = false;
static atomic_t core_dump_count = ATOMIC_INIT(0);
struct coredump_params cprm = {
.siginfo = siginfo,
.regs = signal_pt_regs(),
.limit = rlimit(RLIMIT_CORE),
/*
* We must use the same mm->flags while dumping core to avoid
* inconsistency of bit flags, since this flag is not protected
* by any locks.
*/
.mm_flags = mm->flags,
};
audit_core_dumps(siginfo->si_signo);
binfmt = mm->binfmt;
if (!binfmt || !binfmt->core_dump)
goto fail;
if (!__get_dumpable(cprm.mm_flags))
goto fail;
cred = prepare_creds();
if (!cred)
goto fail;
/*
* We cannot trust fsuid as being the "true" uid of the process
* nor do we know its entire history. We only know it was tainted
* so we dump it as root in mode 2, and only into a controlled
* environment (pipe handler or fully qualified path).
*/
if (__get_dumpable(cprm.mm_flags) == SUID_DUMP_ROOT) {
/* Setuid core dump mode */
flag = O_EXCL; /* Stop rewrite attacks */
cred->fsuid = GLOBAL_ROOT_UID; /* Dump root private */
need_nonrelative = true;
}
retval = coredump_wait(siginfo->si_signo, &core_state);
if (retval < 0)
goto fail_creds;
old_cred = override_creds(cred);
ispipe = format_corename(&cn, &cprm);
if (ispipe) {
int dump_count;
char **helper_argv;
struct subprocess_info *sub_info;
if (ispipe < 0) {
printk(KERN_WARNING "format_corename failed\n");
printk(KERN_WARNING "Aborting core\n");
goto fail_unlock;
}
if (cprm.limit == 1) {
/* See umh_pipe_setup() which sets RLIMIT_CORE = 1.
*
* Normally core limits are irrelevant to pipes, since
* we're not writing to the file system, but we use
* cprm.limit of 1 here as a speacial value, this is a
* consistent way to catch recursive crashes.
* We can still crash if the core_pattern binary sets
* RLIM_CORE = !1, but it runs as root, and can do
* lots of stupid things.
*
* Note that we use task_tgid_vnr here to grab the pid
* of the process group leader. That way we get the
* right pid if a thread in a multi-threaded
* core_pattern process dies.
*/
printk(KERN_WARNING
"Process %d(%s) has RLIMIT_CORE set to 1\n",
task_tgid_vnr(current), current->comm);
printk(KERN_WARNING "Aborting core\n");
goto fail_unlock;
}
cprm.limit = RLIM_INFINITY;
dump_count = atomic_inc_return(&core_dump_count);
if (core_pipe_limit && (core_pipe_limit < dump_count)) {
printk(KERN_WARNING "Pid %d(%s) over core_pipe_limit\n",
task_tgid_vnr(current), current->comm);
printk(KERN_WARNING "Skipping core dump\n");
goto fail_dropcount;
}
helper_argv = argv_split(GFP_KERNEL, cn.corename, NULL);
if (!helper_argv) {
printk(KERN_WARNING "%s failed to allocate memory\n",
__func__);
goto fail_dropcount;
}
retval = -ENOMEM;
sub_info = call_usermodehelper_setup(helper_argv[0],
helper_argv, NULL, GFP_KERNEL,
umh_pipe_setup, NULL, &cprm);
if (sub_info)
retval = call_usermodehelper_exec(sub_info,
UMH_WAIT_EXEC);
argv_free(helper_argv);
if (retval) {
printk(KERN_INFO "Core dump to |%s pipe failed\n",
cn.corename);
goto close_fail;
}
} else {
struct inode *inode;
if (cprm.limit < binfmt->min_coredump)
goto fail_unlock;
if (need_nonrelative && cn.corename[0] != '/') {
printk(KERN_WARNING "Pid %d(%s) can only dump core "\
"to fully qualified path!\n",
task_tgid_vnr(current), current->comm);
printk(KERN_WARNING "Skipping core dump\n");
goto fail_unlock;
}
cprm.file = filp_open(cn.corename,
O_CREAT | 2 | O_NOFOLLOW | O_LARGEFILE | flag,
0600);
if (IS_ERR(cprm.file))
goto fail_unlock;
inode = file_inode(cprm.file);
if (inode->i_nlink > 1)
goto close_fail;
if (d_unhashed(cprm.file->f_path.dentry))
goto close_fail;
/*
* AK: actually i see no reason to not allow this for named
* pipes etc, but keep the previous behaviour for now.
*/
if (!S_ISREG(inode->i_mode))
goto close_fail;
/*
* Dont allow local users get cute and trick others to coredump
* into their pre-created files.
*/
if (!uid_eq(inode->i_uid, current_fsuid()))
goto close_fail;
if (!cprm.file->f_op->write)
goto close_fail;
if (do_truncate(cprm.file->f_path.dentry, 0, 0, cprm.file))
goto close_fail;
}
/* get us an unshared descriptor table; almost always a no-op */
retval = unshare_files(&displaced);
if (retval)
goto close_fail;
if (displaced)
put_files_struct(displaced);
if (!dump_interrupted()) {
file_start_write(cprm.file);
core_dumped = binfmt->core_dump(&cprm);
file_end_write(cprm.file);
}
if (ispipe && core_pipe_limit)
wait_for_dump_helpers(cprm.file);
close_fail:
if (cprm.file)
filp_close(cprm.file, NULL);
fail_dropcount:
if (ispipe)
atomic_dec(&core_dump_count);
fail_unlock:
kfree(cn.corename);
coredump_finish(mm, core_dumped);
revert_creds(old_cred);
fail_creds:
put_cred(cred);
fail:
return;
}
从源码中可以看到在产生coredump的源码中会对上面的core_pattern进行判断,如果传入了脚本就会走上面的分支,没传入走下面的分支,上面只判断了limit是不是1,1表示不产生coredump,如果不是1就赋值为0,0就是没有限制。
而下面有这个判断,当已经产生的coredump文件超过ulimit -c的限制时就会停止产生。有一个点需要注意,ulimit-c和这里的limit不是完全对应的,是在shell源码中有转化的,不过大体上还是对应的。
还是要看这张图,它每次写入的都是内核设计好的,就写这些。
在外面有个for循环调用do_coredump文件.
int get_signal_to_deliver(siginfo_t *info, struct k_sigaction *return_ka,
struct pt_regs *regs, void *cookie)
{
struct sighand_struct *sighand = current->sighand;
struct signal_struct *signal = current->signal;
int signr;
if (unlikely(current->task_works))
task_work_run();
if (unlikely(uprobe_deny_signal()))
return 0;
/*
* Do this once, we can't return to user-mode if freezing() == T.
* do_signal_stop() and ptrace_stop() do freezable_schedule() and
* thus do not need another check after return.
*/
try_to_freeze();
relock:
spin_lock_irq(&sighand->siglock);
/*
* Every stopped thread goes here after wakeup. Check to see if
* we should notify the parent, prepare_signal(SIGCONT) encodes
* the CLD_ si_code into SIGNAL_CLD_MASK bits.
*/
if (unlikely(signal->flags & SIGNAL_CLD_MASK)) {
int why;
if (signal->flags & SIGNAL_CLD_CONTINUED)
why = CLD_CONTINUED;
else
why = CLD_STOPPED;
signal->flags &= ~SIGNAL_CLD_MASK;
spin_unlock_irq(&sighand->siglock);
/*
* Notify the parent that we're continuing. This event is
* always per-process and doesn't make whole lot of sense
* for ptracers, who shouldn't consume the state via
* wait(2) either, but, for backward compatibility, notify
* the ptracer of the group leader too unless it's gonna be
* a duplicate.
*/
read_lock(&tasklist_lock);
do_notify_parent_cldstop(current, false, why);
if (ptrace_reparented(current->group_leader))
do_notify_parent_cldstop(current->group_leader,
true, why);
read_unlock(&tasklist_lock);
goto relock;
}
for (;;) {
struct k_sigaction *ka;
if (unlikely(current->jobctl & JOBCTL_STOP_PENDING) &&
do_signal_stop(0))
goto relock;
if (unlikely(current->jobctl & JOBCTL_TRAP_MASK)) {
do_jobctl_trap();
spin_unlock_irq(&sighand->siglock);
goto relock;
}
signr = dequeue_signal(current, ¤t->blocked, info);
if (!signr)
break; /* will return 0 */
if (unlikely(current->ptrace) && signr != SIGKILL) {
signr = ptrace_signal(signr, info);
if (!signr)
continue;
}
ka = &sighand->action[signr-1];
/* Trace actually delivered signals. */
trace_signal_deliver(signr, info, ka);
if (ka->sa.sa_handler == SIG_IGN) /* Do nothing. */
continue;
if (ka->sa.sa_handler != SIG_DFL) {
/* Run the handler. */
*return_ka = *ka;
if (ka->sa.sa_flags & SA_ONESHOT)
ka->sa.sa_handler = SIG_DFL;
break; /* will return non-zero "signr" value */
}
/*
* Now we are doing the default action for this signal.
*/
if (sig_kernel_ignore(signr)) /* Default is nothing. */
continue;
/*
* Global init gets no signals it doesn't want.
* Container-init gets no signals it doesn't want from same
* container.
*
* Note that if global/container-init sees a sig_kernel_only()
* signal here, the signal must have been generated internally
* or must have come from an ancestor namespace. In either
* case, the signal cannot be dropped.
*/
if (unlikely(signal->flags & SIGNAL_UNKILLABLE) &&
!sig_kernel_only(signr))
continue;
if (sig_kernel_stop(signr)) {
/*
* The default action is to stop all threads in
* the thread group. The job control signals
* do nothing in an orphaned pgrp, but SIGSTOP
* always works. Note that siglock needs to be
* dropped during the call to is_orphaned_pgrp()
* because of lock ordering with tasklist_lock.
* This allows an intervening SIGCONT to be posted.
* We need to check for that and bail out if necessary.
*/
if (signr != SIGSTOP) {
spin_unlock_irq(&sighand->siglock);
/* signals can be posted during this window */
if (is_current_pgrp_orphaned())
goto relock;
spin_lock_irq(&sighand->siglock);
}
if (likely(do_signal_stop(info->si_signo))) {
/* It released the siglock. */
goto relock;
}
/*
* We didn't actually stop, due to a race
* with SIGCONT or something like that.
*/
continue;
}
spin_unlock_irq(&sighand->siglock);
/*
* Anything else is fatal, maybe with a core dump.
*/
current->flags |= PF_SIGNALED;
if (sig_kernel_coredump(signr)) {
if (print_fatal_signals)
print_fatal_signal(info->si_signo);
proc_coredump_connector(current);
/*
* If it was able to dump core, this kills all
* other threads in the group and synchronizes with
* their demise. If we lost the race with another
* thread getting here, it set group_exit_code
* first and our do_group_exit call below will use
* that value and ignore the one we pass it.
*/
do_coredump(info);
}
/*
* Death signals, no core dump.
*/
do_group_exit(info->si_signo);
/* NOTREACHED */
}
spin_unlock_irq(&sighand->siglock);
return signr;
}
哪次判断超出了就会停止生成.
可以的,但是limit这个系统调用肯定是不行了,源码中我们就看到了人家都不鸟你。但是我们可以ulimit -f,这个脚本coredump文件从内核态到用户态是通过这个传入的脚本,所以直接ulimit -f限制脚本生成的全部文件大小就ok了,但是这里要写成正常两倍大小因为是先获取在写入,我们一个字节计算两次,这次限制的大小就完全准确了,也有个坏处就是文件会损坏限制住了也没法去看这个coredump。
由于使用systemd机制,所以现在的coredump是以服务的方式存在的,为了可以更好的保存和规范化管理,coredump内容被放到了日志中,使用coredump@和socket服务共同实现,将coredump文件从内核态转移到用户态后又用了socket机制做过滤和写入日志。完美限制大小和内容。只要日志管理的好可以想看任何时候的有用的coredump。
---------------------------------------------------------------------------------------------------------------------------------
这是我工作中遇到的第一个做了两周的问题的一部分,当时贼痛苦各种查资料翻源码,好在最后想到了还算不错的解决方案,工作的提升是真的大,不止是技术,主要是团队写作能力,做事情的条例,规范化做事才会少出错。这些很重要。还有表达能力,不然你写的永远都是垃圾。下周有时间更新另一半,日志的管理。