April 2, 2013 — Rafał Cieślak
This post assumes some basic C skills.
Linux puts you in full control. This is not always seen from everyone’s perspective, but a power user loves to be in control. I’m going to show you a basic trick that lets you heavily influence the behavior of most applications, which is not only fun, but also, at times, useful.
A motivational example
Let us begin with a simple example. Fun first, science later.
#include#include #include int main(){ srand(time(NULL)); int i = 10; while(i--) printf("%d\n",rand()%100); return 0; }
Simple enough, I believe. I compiled it with no special flags, just
gcc random_num.c -o random_num
I hope the resulting output is obvious – ten randomly selected numbers 0-99, hopefully different each time you run this program.
Now let’s pretend we don’t really have the source of this executable. Either delete the source file, or move it somewhere – we won’t need it. We will significantly modify this programs behavior, yet without touching it’s source code nor recompiling it.
For this, lets create another simple C file:
int rand(){ return 42; //the most random number in the universe }
We’ll compile it into a shared library.
gcc -shared -fPIC unrandom.c -o unrandom.so
So what we have now is an application that outputs some random data, and a custom library, which implements the rand() function as a constant value of 42. Now… just run random_num this way, and watch the result:
LD_PRELOAD=$PWD/unrandom.so ./random_nums
If you are lazy and did not do it yourself (and somehow fail to guess what might have happened), I’ll let you know – the output consists of ten 42’s.
This may be even more impressive it you first:
export LD_PRELOAD=$PWD/unrandom.so
and then run the program normally. An unchanged app run in an apparently usual manner seems to be affected by what we did in our tiny library…
Wait, what? What did just happen?
Yup, you are right, our program failed to generate random numbers, because it did not use the “real” rand(), but the one we provided – which returns 42 every time.
But we *told* it to use the real one. We programmed it to use the real one. Besides, at the time we created that program, the fake rand() did not even exist!
This is not entirely true. We did not choose which rand() we want our program to use. We told it just to use rand().
When our program is started, certain libraries (that provide functionality needed by the program) are loaded. We can learn which are these using ldd:
$ ldd random_nums linux-vdso.so.1 => (0x00007fff4bdfe000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f48c03ec000) /lib64/ld-linux-x86-64.so.2 (0x00007f48c07e3000)
What you see as the output is the list of libs that are needed by random_nums. This list is built into the executable, and is determined compile time. The exact output might slightly differ on your machine, but a libc.so must be there – this is the file which provides core C functionality. That includes the “real” rand().
We can have a peek at what functions does libc provide. I used the following to get a full list:
nm -D /lib/libc.so.6
The nm command lists symbols found in a binary file. The -D flag tells it to look for dynamic symbols, which makes sense, as libc.so.6 is a dynamic library. The output is very long, but it indeed lists rand() among many other standard functions.
Now what happens when we set up the environmental variable LD_PRELOAD? This variable forces some libraries to be loaded for a program. In our case, it loads unrandom.so for random_num, even though the program itself does not ask for it. The following command may be interesting:
$ LD_PRELOAD=$PWD/unrandom.so ldd random_nums linux-vdso.so.1 => (0x00007fff369dc000) /some/path/to/unrandom.so (0x00007f262b439000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f262b044000) /lib64/ld-linux-x86-64.so.2 (0x00007f262b63d000)
Note that it lists our custom library. And indeed this is the reason why it’s code get’s executed: random_num calls rand(), but if unrandom.so is loaded it is our library that provides implementation for rand(). Neat, isn’t it?
Being transparent
This is not enough. I’d like to be able to inject some code into an application in a similar manner, but in such way that it will be able to function normally. It’s clear if we implemented open() with a simple “return 0;“, the application we would like to hack should malfunction. The point is to be transparent, and to actually call the original open:
int open(const char *pathname, int flags){ /* Some evil injected code goes here. */ return open(pathname,flags); // Here we call the "real" open function, that is provided to us by libc.so }
Hm. Not really. This won’t call the “original” open(…). Obviously, this is an endless recursive call.
How do we access the “real” open function? It is needed to use the programming interface to the dynamic linker. It’s simpler than it sounds. Have a look at this complete example, and then I’ll explain what happens there:
#define _GNU_SOURCE #includetypedef int (*orig_open_f_type)(const char *pathname, int flags); int open(const char *pathname, int flags, ...) { /* Some evil injected code goes here. */ orig_open_f_type orig_open; orig_open = (orig_open_f_type)dlsym(RTLD_NEXT,"open"); return orig_open(pathname,flags); }
The dlfcn.h is needed for dlsym function we use later. That strange #define directive instructs the compiler to enable some non-standard stuff, we need it to enable RTLD_NEXT in dlfcn.h. That typedef is just creating an alias to a complicated pointer-to-function type, with arguments just as the original open – the alias name is orig_open_f_type, which we’ll use later.
The body of our custom open(…) consists of some custom code. The last part of it creates a new function pointer orig_open which will point to the original open(…) function. In order to get the address of that function, we ask dlsym to find for us the next “open” function on dynamic libraries stack. Finally, we call that function (passing the same arguments as were passed to our fake “open”), and return it’s return value as ours.
As the “evil injected code” I simply used:
printf("The victim used open(...) to access '%s'!!!\n",pathname); //remember to include stdio.h!
To compile it, I needed to slightly adjust compiler flags:
gcc -shared -fPIC inspect_open.c -o inspect_open.so -ldl
I had to append -ldl, so that this shared library is linked to libdl, which provides the dlsym function. (Nah, I am not going to create a fake version of dlsym, though this might be fun.)
So what do I have in result? A shared library, which implements the open(…) function so that it behaves exactly as the real open(…)… except it has a side effect of printfing the file path :-)
If you are not convinced this is a powerful trick, it’s the time you tried the following:
LD_PRELOAD=$PWD/inspect_open.so gnome-calculator
I encourage you to see the result yourself, but basically it lists every file this application accesses. In real time.
I believe it’s not that hard to imagine why this might be useful for debugging or investigating unknown applications. Please note, however, that this particular trick is not quite complete, because open() is not the only function that opens files… For example, there is also open64() in the standard library, and for full investigation you would need to create a fake one too.
Possible uses
If you are still with me and enjoyed the above, let me suggest a bunch of ideas of what can be achieved using this trick. Keep in mind that you can do all the above without to source of the affected app!
These are only the ideas I came up with. I bet you can find some too, if you do – share them by commenting!