Have you ever wondered how system callscan be intercepted? Have you ever tried fooling the kernel bychanging system call arguments? Have you ever wondered howdebuggers stop a running process and let you take control of theprocess?
If you are thinking of using complex kernel programming toaccomplish tasks, think again. Linux provides an elegant mechanismto achieve all of these things: the ptrace (Process Trace) systemcall.ptrace provides a mechanismby which a parent process may observe and control the execution ofanother process. It can examine and change its core image andregisters and is used primarily to implement breakpoint debuggingand system call tracing.
In this article, we learn how to intercept a system call andchange its arguments. In Part II of the article we will studyadvanced techniques—setting breakpoints and injecting code into arunning program. We will peek into the child process' registers anddata segment and modify the contents. We will also describe a wayto inject code so the process can be stopped and execute arbitraryinstructions.
Operating systems offer services through a standard mechanismcalled system calls. They provide a standard API for accessing theunderlying hardware and low-level services, such as thefilesystems. When a process wants to invoke a system call, it putsthe arguments to system calls in registers and calls soft interrupt0x80. This soft interrupt is like a gate to the kernel mode, andthe kernel will execute the system call after examining thearguments.
On the i386 architecture (all the code in this article isi386-specific), the system call number is put in the register %eax.The arguments to this system call are put into registers %ebx,%ecx, %edx, %esi and %edi, in that order. For example, thecall:
write(2, "Hello", 5)
roughly would translate into
movl $4, %eax movl $2, %ebx movl $hello,%ecx movl $5, %edx int $0x80where $hello points to a literal string “Hello”.
So where does ptrace come into picture? Before executing thesystem call, the kernel checks whether the process is being traced.If it is, the kernel stops the process and gives control to thetracking process so it can examine and modify the traced process'registers.
Let's clarify this explanation with an example of how theprocess works:
#include <sys/ptrace.h> #include <sys/types.h> #include <sys/wait.h> #include <unistd.h> #include <linux/user.h> /* For constants ORIG_EAX etc */ int main() { pid_t child; long orig_eax; child = fork(); if(child == 0) { ptrace(PTRACE_TRACEME, 0, NULL, NULL); execl("/bin/ls", "ls", NULL); } else { wait(NULL); orig_eax = ptrace(PTRACE_PEEKUSER, child, 4 * ORIG_EAX, NULL); printf("The child made a " "system call %ld\n", orig_eax); ptrace(PTRACE_CONT, child, NULL, NULL); } return 0; }
When run, this program prints:
The child made a system call 11along with the output of ls. System call number 11 is execve, andit's the first system call executed by the child. For reference,system call numbers can be found in /usr/include/asm/unistd.h.
As you can see in the example, a process forks a child andthe child executes the process we want to trace. Before runningexec, the child calls ptrace with the firstargument, equal to PTRACE_TRACEME. This tells the kernel that theprocess is being traced, and when the child executes the execvesystem call, it hands over control to its parent. The parent waitsfor notification from the kernel with a wait() call. Then theparent can check the arguments of the system call or do otherthings, such as looking into the registers.
When the system call occurs, the kernel saves the originalcontents of the eax register, which contains the system callnumber. We can read this value from child's USER segment by callingptrace with the first argument PTRACE_PEEKUSER, shown asabove.
After we are done examining the system call, the child cancontinue with a call to ptrace with the first argument PTRACE_CONT,which lets the system call continue.
ptrace Parameters
ptrace is called with fourarguments:
long ptrace(enum __ptrace_request request, pid_t pid, void *addr, void *data);
The first argument determines the behaviour of ptrace and howother arguments are used. The value of request should be one ofPTRACE_TRACEME, PTRACE_PEEKTEXT, PTRACE_PEEKDATA, PTRACE_PEEKUSER,PTRACE_POKETEXT, PTRACE_POKEDATA, PTRACE_POKEUSER, PTRACE_GETREGS,PTRACE_GETFPREGS, PTRACE_SETREGS, PTRACE_SETFPREGS, PTRACE_CONT,PTRACE_SYSCALL, PTRACE_SINGLESTEP, PTRACE_DETACH. The significanceof each of these requests will be explained in the rest of thearticle.
By calling ptrace with PTRACE_PEEKUSER as the first argument,we can examine the contents of the USER area where registercontents and other information is stored. The kernel stores thecontents of registers in this area for the parent process toexamine through ptrace.
Let's show this with an example:
#include <sys/ptrace.h> #include <sys/types.h> #include <sys/wait.h> #include <unistd.h> #include <linux/user.h> #include <sys/syscall.h> /* For SYS_write etc */ int main() { pid_t child; long orig_eax, eax; long params[3]; int status; int insyscall = 0; child = fork(); if(child == 0) { ptrace(PTRACE_TRACEME, 0, NULL, NULL); execl("/bin/ls", "ls", NULL); } else { while(1) { wait(&status); if(WIFEXITED(status)) break; orig_eax = ptrace(PTRACE_PEEKUSER, child, 4 * ORIG_EAX, NULL); if(orig_eax == SYS_write) { if(insyscall == 0) { /* Syscall entry */ insyscall = 1; params[0] = ptrace(PTRACE_PEEKUSER, child, 4 * EBX, NULL); params[1] = ptrace(PTRACE_PEEKUSER, child, 4 * ECX, NULL); params[2] = ptrace(PTRACE_PEEKUSER, child, 4 * EDX, NULL); printf("Write called with " "%ld, %ld, %ld\n", params[0], params[1], params[2]); } else { /* Syscall exit */ eax = ptrace(PTRACE_PEEKUSER, child, 4 * EAX, NULL); printf("Write returned " "with %ld\n", eax); insyscall = 0; } } ptrace(PTRACE_SYSCALL, child, NULL, NULL); } } return 0; }
This program should print an output similar to the following:
ppadala@linux:~/ptrace > ls a.out dummy.s ptrace.txt libgpm.html registers.c syscallparams.c dummy ptrace.html simple.c ppadala@linux:~/ptrace > ./a.out Write called with 1, 1075154944, 48 a.out dummy.s ptrace.txt Write returned with 48 Write called with 1, 1075154944, 59 libgpm.html registers.c syscallparams.c Write returned with 59 Write called with 1, 1075154944, 30 dummy ptrace.html simple.c Write returned with 30Here we are tracing the write system calls, and ls makes three write system calls. The call toptrace, with a first argument of PTRACE_SYSCALL, makes the kernelstop the child process whenever a system call entry or exit ismade. It's equivalent to doing a PTRACE_CONT and stopping at thenext system call entry/exit.
In the previous example, we used PTRACE_PEEKUSER to look intothe arguments of the write system call. When a system call returns,the return value is placed in %eax, and it can be read as shown inthat example.
The status variable in the wait call is used to check whetherthe child has exited. This is the typical way to check whether thechild has been stopped by ptrace or was able to exit. For moredetails on macros like WIFEXITED, see the wait(2) man page.
Reading Register Values
If you want to read register values at the time of a syscallentry or exit, the procedure shown above can be cumbersome. Callingptrace with a first argument of PTRACE_GETREGS will place all theregisters in a single call.
The code to fetch register values looks like this:
#include <sys/ptrace.h> #include <sys/types.h> #include <sys/wait.h> #include <unistd.h> #include <linux/user.h> #include <sys/syscall.h> int main() { pid_t child; long orig_eax, eax; long params[3]; int status; int insyscall = 0; struct user_regs_struct regs; child = fork(); if(child == 0) { ptrace(PTRACE_TRACEME, 0, NULL, NULL); execl("/bin/ls", "ls", NULL); } else { while(1) { wait(&status); if(WIFEXITED(status)) break; orig_eax = ptrace(PTRACE_PEEKUSER, child, 4 * ORIG_EAX, NULL); if(orig_eax == SYS_write) { if(insyscall == 0) { /* Syscall entry */ insyscall = 1; ptrace(PTRACE_GETREGS, child, NULL, ®s); printf("Write called with " "%ld, %ld, %ld\n", regs.ebx, regs.ecx, regs.edx); } else { /* Syscall exit */ eax = ptrace(PTRACE_PEEKUSER, child, 4 * EAX, NULL); printf("Write returned " "with %ld\n", eax); insyscall = 0; } } ptrace(PTRACE_SYSCALL, child, NULL, NULL); } } return 0; }
This code is similar to the previous example except for thecall to ptrace with PTRACE_GETREGS. Here we have made use of theuser_regs_struct defined in <linux/user.h> to read theregister values.