从 man开始:
$man 2 ptrace
PTRACE(2) Linux Programmer's Manual PTRACE(2)
NAME
ptrace - process trace
SYNOPSIS
#include
long ptrace(enum __ptrace_request request, pid_t pid, void *addr, void *data);
DESCRIPTION
The ptrace() system call provides a means by which one process (the "tracer") may observe and control the execution of another process (the "tracee"), and examine and change the tracee's memory and registers. It is primarily used to implement breakpoint debugging and system call tracing.
A tracee first needs to be attached to the tracer. Attachment and subsequent commands are per thread: in a multithreaded process, every thread can be individually attached to a (potentially different) tracer, or left not attached and thus not debugged. Therefore, "tracee" always means "(one) thread", never "a (possibly multithreaded) process".
Ptrace commands are always sent to a specific tracee using a call of the form
ptrace(PTRACE_foo, pid, ...)
where pid is the thread ID of the corresponding Linux thread.
(Note that in this page, a "multithreaded process" means a thread group consisting of threads created using the clone(2) CLONE_THREAD flag.)
A process can initiate a trace by calling fork(2) and having the resulting child do a PTRACE_TRACEME, followed (typically) by an execve(2). Alternatively, one process may commence tracing another process using PTRACE_ATTACH or PTRACE_SEIZE.
While being traced, the tracee will stop each time a signal is delivered, even if the signal is being ignored. (An exception is SIGKILL, which has its usual effect.) The tracer will be notified at its next call to waitpid(2) (or one of the related "wait" system calls); that call will return a status value containing information that indicates the cause of the stop in the tracee. While the tracee is stopped, the tracer can use various ptrace requests to inspect and modify the tracee. The tracer then causes the tracee to continue, optionally ignoring the delivered signal (or even delivering a different signal instead).
If the PTRACE_O_TRACEEXEC option is not in effect, all successful calls to execve(2) by the traced process will cause it to be sent a SIGTRAP signal, giving the parent a chanceto gain control before the new program begins execution.
When the tracer is finished tracing, it can cause the tracee to continue executing in a normal, untraced mode via PTRACE_DETACH.
The value of request determines the action to be performed:
.....
简单来讲, ptrace 就是用来提供进程跟踪/控制的一个系统调用,允许 tracer 观察/修改 tracee的 Memory & Registers;
它主要用来实现 断点调试 / 系统调用跟踪 的功能。
tracee 进程被跟踪的前提(参考 cap_ptrace_traceme 函数):
long ptrace(enum __ptrace_request request, pid_t pid, void *addr, void *data);
其中,ptrace 请求 request 决定了 ptrace系统调用的功能,下面是比较常用的一些请求及说明:
ptrace requests requests comment PTRACE_TRACEME 由 tracee进程发出该请求,表示tracee进程可被父进程 trace,除了该请求,其他的请求均由 tracer进程调用; PTRACE_PEEKTEXT, PTRACE_PEEKDATA Read a word at the address addr in the tracee's memory, returning the word as the result of the ptrace() call. 当前这个两个请求完全相等。data 参数忽略。 PTRACE_PEEKUSER 从 USER区域读出一个 word,addr是偏移量,读取的word作为 ptrace返回值。 PTRACE_POKETEXT, PTRACE_POKEDATA 向 tracee的内存写入一个 word,内存地址是 addr,data是要写入的数据 PTRACE_POKEUSER 向 USER区域写入一个 word,addr是偏移 PTRACE_CONT 使 stop状态的 tracee继续运行,当 data非0,表示将要deliver给 tracee的信号,否则不发送信号。addr参数忽略。
Thus, for example, the tracer can control whether a signal sent to the tracee is delivered or not.PTRACE_SYSCALL 使 stop状态的 tracee继续运行,但是会设置 tracee在下一个syscall进入和返回时进入 stop状态。 PTRACE_SINGLESTEP 使 stop状态的 tracee继续运行,但是会设置 tracee在执行完成一条指令后,再次进入 stop状态。 PTRACE_KILL Send the tracee a SIGKILL to terminate it. PTRACE_ATTACH Attach to the process specified in pid, making it a tracee of the calling process.
The tracee is sent a SIGSTOP, but will not necessarily have stopped by the completion of this call;
use waitpid(2) to wait for the tracee to stop.
Processes that are not dumpable can not be attached via ptrace(2) PTRACE_ATTACH.PTRACE_SEIZE Attach to the process specified in pid. Unlike PTRACE_ATTACH, PTRACE_SEIZE does not stop the process. PTRACE_DETACH Restart the stopped tracee as for PTRACE_CONT, but first detach from it.
值得关注的是,除了 PTRACE_ATTACH, PTRACE_SEIZE, PTRACE_INTERRUPT, and PTRACE_KILL, 这几个request,
在执行其他的 request之前,都要求 tracee 进程已经是 stopped 状态。
对于此类功能强大的系统调用,仅仅简单了解其使用是不够的,只有分析完源码,牢牢掌握其原理,才能算是一大利器的。此处基于 Android8.0 msm-4.4 分析 ptrace的实现。ptrace 函数对上层用户的接口由bionic中的提供: long ptrace(int, ...);
其实现在 /bionic/libc/bionic/ptrace.cpp中:可以看到其真正的实现函数是 __ptrace,而__ptrace 函数是使用汇编语言写的,与架构相关,这里看下其在 arm下的实现:#include
extern "C" long __ptrace(int req, pid_t pid, void* addr, void* data); long ptrace(int req, ...) { bool is_peek = (req == PTRACE_PEEKUSR || req == PTRACE_PEEKTEXT || req == PTRACE_PEEKDATA); long peek_result; va_list args; va_start(args, req); pid_t pid = va_arg(args, pid_t); void* addr = va_arg(args, void*); void* data; if (is_peek) { data = &peek_result; } else { data = va_arg(args, void*); } va_end(args); long result = __ptrace(req, pid, addr, data); if (is_peek && result == 0) { return peek_result; } return result; } 其中,关键的两条指令是:#include
ENTRY(__ptrace) mov ip, r7 .cfi_register r7, ip ldr r7, =__NR_ptrace swi #0 mov r7, ip .cfi_restore r7 cmn r0, #(MAX_ERRNO + 1) bxls lr neg r0, r0 b __set_errno_internal END(__ptrace) ldr r7, =__NR_ptrace
swi #0
第一条,把 __NR_ptrace 对应的地址保存到 r7,用来保存系统调用号,从而在系统调用表中查找对应的系统调用函数第二条,swi #0 是 arm下的软中断指令,这里的意思是,产生一个软中断异常,异常后,CPU会将执行流转到 SWI Handler,即 vector_swi 函数(中间还有一些过程)其中立即数 #0表示,使用默认的 sys_call_table 查找系统调用,立即数是 0x900000(__NR_OABI_SYSCALL_BASE)时,使用 sys_oabi_call_table来查找系统调用;
ldr指令知识补充:swi指令知识补充:ldr指令的格式: LDR R0, [R1] LDR R0, =NAME LDR R0, =0X123 对于第一种没有等号的情况,R1寄存器对应地址的数据被取出放入R0 对于第二种有等号的情况,R0寄存器的值将为NAME标号对应的地址。 对于第三种有等号的情况,R0寄存器的值将为立即数的值
5.5.1. SWI Software interrupt. Syntax SWI immed_8 where: immed_8 is a numeric expression evaluating to an integerin the range 0-255. Usage The SWI instruction causes a SWI exception. This means that the processor state changes to ARM, the processor mode changes to Supervisor, the CPSR is saved to the Supervisor Mode SPSR, and execution branches to the SWI vector (see the Handling Processor Exceptions chapter in ADS Developer Guide). immed_8 is ignored by the processor. However, it is present in bits[7:0] of the instruction opcode. It can be retrieved by the exception handler to determine what service is being requested. Condition flags This instruction does not affect the flags. Architectures This instruction is available in all T variants of the ARM architecture. Example SWI 12
所以 __ptrace 函数会触发系统调用号为 __NR_ptrace(26) 的系统调用函数,即在 sys_call_table中的 26个数据:查看 sys_call_table:addne scno, r7, #__NR_SYSCALL_BASE @ put OS number in .... adr tbl, sys_call_table @ load syscall table pointer ... cmp scno, #NR_syscalls @ check upper syscall limit badr lr, ret_fast_syscall @ return address ldrcc pc, [tbl, scno, lsl #2] @ call sys_* routine
查看#undef __SYSCALL #define __SYSCALL(nr, sym) [nr] = sym, void * const compat_sys_call_table[__NR_compat_syscalls] __aligned(4096) = { [0 ... __NR_compat_syscalls - 1] = sys_ni_syscall, #include
}; : 所以 ptrace 函数对应的系统调用在 kernel中是由 compat_sys_ptrace 函数实现的。#define __NR_ptrace 26 __SYSCALL(__NR_ptrace, compat_sys_ptrace)
函数实现如下:COMPAT_SYSCALL_DEFINE4(ptrace, compat_long_t, request, compat_long_t, pid, compat_long_t, addr, compat_long_t, data) { struct task_struct *child; long ret; if (request == PTRACE_TRACEME) { ret = ptrace_traceme(); goto out; } child = ptrace_get_task_struct(pid); if (IS_ERR(child)) { ret = PTR_ERR(child); goto out; } if (request == PTRACE_ATTACH || request == PTRACE_SEIZE) { ret = ptrace_attach(child, request, addr, data); /* * Some architectures need to do book-keeping after * a ptrace attach. */ if (!ret) arch_ptrace_attach(child); goto out_put_task_struct; } ret = ptrace_check_attach(child, request == PTRACE_KILL || request == PTRACE_INTERRUPT); if (!ret) { ret = compat_arch_ptrace(child, request, addr, data); if (ret || request != PTRACE_DETACH) ptrace_unfreeze_traced(child); } out_put_task_struct: put_task_struct(child); out: return ret; }
1. PTRACE_TRACEME:
static int ptrace_traceme(void) { int ret = -EPERM; write_lock_irq(&tasklist_lock); /* Are we already being traced? */ if (!current->ptrace) { ret = security_ptrace_traceme(current->parent); /* * Check PF_EXITING to ensure ->real_parent has not passed * exit_ptrace(). Otherwise we don't report the error but * pretend ->real_parent untraces us right after return. */ if (!ret && !(current->real_parent->flags & PF_EXITING)) { current->ptrace = PT_PTRACED; ptrace_link(current, current->real_parent); } } write_unlock_irq(&tasklist_lock); return ret; }
PTRACE_TRACEME 主要通过如下几步使当前线程进入调试状态:void __ptrace_link(struct task_struct *child, struct task_struct *new_parent, const struct cred *ptracer_cred) { BUG_ON(!list_empty(&child->ptrace_entry)); list_add(&child->ptrace_entry, &new_parent->ptraced); child->parent = new_parent; child->ptracer_cred = get_cred(ptracer_cred); }
- 把当前进程 current->ptrace 设置 PT_PTRACED
- 把当前进程 ptrace_entry 表添加到真实 parent的 ptraced 链表中
- 设置当前进程的 ptrace parent 为真实 parent
- 把真实 parent的 ptracer_cred保存到当前进程中
另外,PTRACE_TRACEME并没有使子进程停止,真正使得子进程停止的是exec系统调用,该系统调用成功之后,内核会判断该进程是否被ptrace跟踪,如果被跟踪的话,内核将向该进程发送SIGTRAP信号。该信号将导致当前进程停止。流程如下:exec 函数家族都会调用 do_execveat_common来完成功能:在其中会调用 exec_binprm来执行可执行文件:/* * sys_execve() executes a new program. */ static int do_execveat_common(int fd, struct filename *filename, struct user_arg_ptr argv, struct user_arg_ptr envp, int flags) { ..... retval = exec_binprm(bprm); ... }
通过调用 ptrace_event 发送当前进程一个 SIGTRAP信号:static int exec_binprm(struct linux_binprm *bprm) { pid_t old_pid, old_vpid; int ret; /* Need to fetch pid before load_binary changes it */ old_pid = current->pid; rcu_read_lock(); old_vpid = task_pid_nr_ns(current, task_active_pid_ns(current->parent)); rcu_read_unlock(); ret = search_binary_handler(bprm); if (ret >= 0) { audit_bprm(bprm); trace_sched_process_exec(current, old_pid, bprm); ptrace_event(PTRACE_EVENT_EXEC, old_vpid); proc_exec_connector(current); } return ret; }
static inline void ptrace_event(int event, unsigned long message) { if (unlikely(ptrace_event_enabled(current, event))) { current->ptrace_message = message; ptrace_notify((event << 8) | SIGTRAP); } else if (event == PTRACE_EVENT_EXEC) { /* legacy EXEC report via SIGTRAP */ if ((current->ptrace & (PT_PTRACED|PT_SEIZED)) == PT_PTRACED) send_sig(SIGTRAP, current, 0); } }
2.PTRACE_ATTACH,PTRACE_SEIZE
关键代码:ptrace_attach主要做了如下操作:static int ptrace_attach(struct task_struct *task, long request, unsigned long addr, unsigned long flags) { ... audit_ptrace(task); ... if (unlikely(task->flags & PF_KTHREAD)) goto out; if (same_thread_group(task, current)) goto out; retval = __ptrace_may_access(task, PTRACE_MODE_ATTACH_REALCREDS); if (unlikely(task->exit_state)) goto unlock_tasklist; if (task->ptrace) goto unlock_tasklist; if (seize) flags |= PT_SEIZED; task->ptrace = flags; ptrace_link(task, current); /* SEIZE doesn't trap tracee on attach */ if (!seize) send_sig_info(SIGSTOP, SEND_SIG_FORCED, task); /* * If the task is already STOPPED, set JOBCTL_TRAP_STOP and * TRAPPING, and kick it so that it transits to TRACED. TRAPPING * will be cleared if the child completes the transition or any * event which clears the group stop states happens. We'll wait * for the transition to complete before returning from this * function. * * This hides STOPPED -> RUNNING -> TRACED transition from the * attaching thread but a different thread in the same group can * still observe the transient RUNNING state. IOW, if another * thread's WNOHANG wait(2) on the stopped tracee races against * ATTACH, the wait(2) may fail due to the transient RUNNING. * * The following task_is_stopped() test is safe as both transitions * in and out of STOPPED are protected by siglock. */ if (task_is_stopped(task) && task_set_jobctl_pending(task, JOBCTL_TRAP_STOP | JOBCTL_TRAPPING)) signal_wake_up_state(task, __TASK_STOPPED); out: if (!retval) { wait_on_bit(&task->jobctl, JOBCTL_TRAPPING_BIT, TASK_UNINTERRUPTIBLE); proc_ptrace_connector(task, PTRACE_ATTACH); } return retval; }
- audit_ptrace,把 tracee线程的进程信息保存在当前进程 current->audit_context信息中,以做统计
- 不允许 attach kernel thread,不允许 attach当前线程组内的线程
- 检查是否具有调试 tracee的权限 (uid,gid等检测,tracee是否是 dumpable的)
- tracee 处于退出状态时不能被调试
- tracee 线程是否已经被其他进程调试
- 设置 tracee->ptrace 为被调试状态
- ptrace_link 把 tracee进程的 parent设置为当前进程,并把 tracee 添加到当前进程的 ptraced表中
- 如果不是 PTRACE_SEIZE,而是 PTRACE_ATTACH,则给 tracee 进程发送 SIGSTOP 信号
- 如果 tracee已经是 STOPPED状态,则将其切换为 TRACED状态(在函数返回时保证已经切换成功);
- 如果还没有称为 STOPPED状态,则不用做第9步处理,且在该函数返回时,也不必保证状态已经是 STOPPED 状态
- 最后调用 proc_ptrace_connector设置 cn_proc_event_id信息,包括 timestamp,process_pid,process_tgid等
3.ptrace_check_attach
如果当前的 ptrace request不是,PTRACE_KILL 或者 PTRACE_INTERRUPT,那么要先保证 tracee是 TASK_TRACED 状态,才能执行能request。如果是这两个 request,则当前 tracee 在任何状态下都可以执行这两个 request。
4.PTRACE_TRACEME,PTRACE_ATTACH,PTRACE_SEIZE之外的其他 request
除了这三个之外的其他 request都在 compat_arch_ptrace 函数执行,且执行之前已经通过 ptrace_check_attach函数检查过了,tracee已经处于 TASK_TRACED状态。这里不进行所有的 ptrace request 实现介绍,再介绍下面几个:ong compat_arch_ptrace(struct task_struct *child, compat_long_t request, compat_ulong_t caddr, compat_ulong_t cdata) { unsigned long addr = caddr; unsigned long data = cdata; void __user *datap = compat_ptr(data); int ret; switch (request) { case PTRACE_PEEKUSR: ret = compat_ptrace_read_user(child, addr, datap); break; case PTRACE_POKEUSR: ret = compat_ptrace_write_user(child, addr, data); break; case COMPAT_PTRACE_GETREGS: ret = copy_regset_to_user(child, &user_aarch32_view, REGSET_COMPAT_GPR, 0, sizeof(compat_elf_gregset_t), datap); break; case COMPAT_PTRACE_SETREGS: ret = copy_regset_from_user(child, &user_aarch32_view, REGSET_COMPAT_GPR, 0, sizeof(compat_elf_gregset_t), datap); break; case COMPAT_PTRACE_GET_THREAD_AREA: ret = put_user((compat_ulong_t)child->thread.tp_value, (compat_ulong_t __user *)datap); break; case COMPAT_PTRACE_SET_SYSCALL: task_pt_regs(child)->syscallno = data; ret = 0; break; case COMPAT_PTRACE_GETVFPREGS: ret = copy_regset_to_user(child, &user_aarch32_view, REGSET_COMPAT_VFP, 0, VFP_STATE_SIZE, datap); break; case COMPAT_PTRACE_SETVFPREGS: ret = copy_regset_from_user(child, &user_aarch32_view, REGSET_COMPAT_VFP, 0, VFP_STATE_SIZE, datap); break; #ifdef CONFIG_HAVE_HW_BREAKPOINT case COMPAT_PTRACE_GETHBPREGS: ret = compat_ptrace_gethbpregs(child, addr, datap); break; case COMPAT_PTRACE_SETHBPREGS: ret = compat_ptrace_sethbpregs(child, addr, datap); break; #endif default: ret = compat_ptrace_request(child, request, addr, data); break; } return ret; }
PTRACE_CONT,PTRACE_SYSCALL,PTRACE_SINGLESTEP,PTRACE_KILL,PTRACE_DETACH这几个 request的实现都在 compat_ptrace_request 函数中调用的 ptrace_request 内实现:int ptrace_request(struct task_struct *child, long request, unsigned long addr, unsigned long data) { .... switch (request) { .... case PTRACE_DETACH: /* detach a process that was attached. */ ret = ptrace_detach(child, data); break; .... case PTRACE_SINGLESTEP: case PTRACE_SYSCALL: case PTRACE_CONT: return ptrace_resume(child, request, data); case PTRACE_KILL: if (child->exit_state) /* already dead */ return 0; return ptrace_resume(child, request, SIGKILL); .... }
如果是 PTRACE_DETACH时,调用 ptrace_detach,
- 依次 disbale tracee的 attach能力
- 清空 tracee进程的 TIF_SYSCALL_TRACE标志
- 设置 tracee的 exit_code 为 ptrace调用中的有效信号 data
- 调用 __ptrace_detach,__ptrace_unlink(tracer/tracee task_struct还原),
- 调用 proc_ptrace_connector 清空 cn_proc_event_id信息,包括 timestamp,process_pid,process_tgid等
在 PTRACE_CONT,PTRACE_SYSCALL,PTRACE_SINGLESTEP时,ptrace函数的参数 data如果不是0,则代表唤醒 tracee进程后,向 tracee进程发送这个信号;
而可以看到 PTRACE_KILL的实现是,唤醒 tracee进程后,向 tracee发送 SIGKILL信号,tracee一旦被唤醒后,就会收到 SIGKILL信号而终止。看下 ptrace_resume 函数:在 ptrace_resume 函数中,如果 request是 PTRACE_SYSCALL,则给 tracee进程设置 TIF_SYSCALL_TRACE标志,否则清除这个标志,设置这个标志的目的是使得tracee进程在下一次执行系统调用的开始和结束时中止运行;static int ptrace_resume(struct task_struct *child, long request, unsigned long data) { bool need_siglock; if (!valid_signal(data)) return -EIO; if (request == PTRACE_SYSCALL) set_tsk_thread_flag(child, TIF_SYSCALL_TRACE); else clear_tsk_thread_flag(child, TIF_SYSCALL_TRACE); #ifdef TIF_SYSCALL_EMU if (request == PTRACE_SYSEMU || request == PTRACE_SYSEMU_SINGLESTEP) set_tsk_thread_flag(child, TIF_SYSCALL_EMU); else clear_tsk_thread_flag(child, TIF_SYSCALL_EMU); #endif if (is_singleblock(request)) { if (unlikely(!arch_has_block_step())) return -EIO; user_enable_block_step(child); } else if (is_singlestep(request) || is_sysemu_singlestep(request)) { if (unlikely(!arch_has_single_step())) return -EIO; user_enable_single_step(child); } else { user_disable_single_step(child); } /* * Change ->exit_code and ->state under siglock to avoid the race * with wait_task_stopped() in between; a non-zero ->exit_code will * wrongly look like another report from tracee. * * Note that we need siglock even if ->exit_code == data and/or this * status was not reported yet, the new status must not be cleared by * wait_task_stopped() after resume. * * If data == 0 we do not care if wait_task_stopped() reports the old * status and clears the code too; this can't race with the tracee, it * takes siglock after resume. */ need_siglock = data && !thread_group_empty(current); if (need_siglock) spin_lock_irq(&child->sighand->siglock); child->exit_code = data; wake_up_state(child, __TASK_TRACED); if (need_siglock) spin_unlock_irq(&child->sighand->siglock); return 0; }
判断的点是:在系统调用时(SVC handler和 swi handler中,会获取 task的 TI_FLAGS,如果发现进程的 _TIF_SYSCALL_WORK设置了,也即 enable 了 trace syscall的功能,则会先调用 syscall_trace_enter,再执行系统调用,返回后再执行 syscall_trace_exit,在这两个函数中,都会检查是否 tracee进程 TIF_SYSCALL_TRACE标志是否设置,如果设置就会分别调用 tracehook_report_syscall(regs, PTRACE_SYSCALL_ENTER); 和 tracehook_report_syscall(regs, PTRACE_SYSCALL_EXIT);然后分别调用 tracehook_report_syscall_entry 和 tracehook_report_syscall_exit 函数,这两个函数都会依次调用 ptrace_report_syscall -> ptrace_notify -> ptrace_do_notify -> ptrace_stop 从而使得 tracee进程 stop,从而完成了PTRACE_SYSCALL的功能,strace 功能应该就是这么实现的;
如果 request是 PTRACE_SINGLESTEP,则调用 user_enable_single_step 进行enable tracee进程的 single step功能,就设置了 TIF_SINGLESTEP,有了这个标志后,可以使被调试的进程,每执行一条指令之后,就触发一个 SIGTRAP信号,tracee 进入 STOPPED 状态;单步执行也是一种使进程中止的情况。当用户调用ptrace的PTRACE_SINGLESTEP功能时,ptrace处理中,将用户态标志寄存器EFLAG中TF标志为置位,并让进程继续运行。当进程回到用户态运行了一条指令后,CPU产生异常 1,从而转至函数do_debug处理。由于子进程在调试状态下属于正常调试异常,所以do_debug函数处理中产生SIGTRAP信号,为处理这个信号,进入do_signal,使被调试进程停止,并通知调试器(父进程),此时得到子进程终止原因为SIGTRAP。
如果 request是 PTRACE_CONT,即使得 tracee进程继续执行,则需要调用 user_disable_single_step 进行disable 单步执行功能,清空 TIF_SINGLESTEP;
如果这些 request 的 data参数设置是 valid,那么会设置给 tracee->exit_code最后调用 wake_up_state(child, __TASK_TRACED) 唤醒 tracee进程,唤醒后,仍为 __TASK_TRACED状态;
另外,断点:设置断点是调试器中的一个重要功能。两种方式,INT3和利用调试寄存器。 如果使用INT3方式设置断点,则调试器通过ptrace的PTRACE_POKETEXT功能在断点处插入INT3单字节指令。当进程运行到断点时(INT3处),则系统进入异常3的处理。 若使用调试寄存器,则调试器通过调用ptrace(PTRACE_POKEUSR,pid,0,data)在DR0-DR3寄存器设置与四个断点条件的每一个相联系的线性地址在DR7中设置断点条件。被跟踪进程运行到断点处时,CPU产生异常 1,从而转至函数do_debug处理。由于子进程在调试状态下属于正常调试异常,所以do_debug函数处理中产生SIGTRAP信号,为处理这个信号,进入do_signal,使被调试进程停止,并通知调试器(父进程),此时得到子进程终止原因为SIGTRAP。