Tracing Objective-C messages

In my OS X programming of late, one debugging technique I thought would be handy is the ability log and trace Objective-C messages sent. It turns out this is not only possible, but quite easy. However my journey to this easy answer was quite long and took me deep into the Objective-C rabbit hole.

Apple's Cocoa programing environment is based on Objective-C, a Smalltalk-influenced object oriented extension to C. ObjC is a dynamic language, meaning method calls ("sending messages" in true OO parlance) are done completely at runtime, rather than "hardwired" at compile-time like in C++. This runtime dispatching is handled by a function named objc_msgSend under most circumstances (see the Objective-C Runtime Reference for more information). Thus, the compiler turns the following method call:

    [foo doIt];

Into this C-style function call, behind the scenes:

    objc_msgSend(foo, @selector(doIt));

Since most message sends go through this single entry point, tracing and logging can be implemented right here, in one central location. I figured I could create a custom version of objc_msgSend that logs the class and selector and then jumps to the real objc_msgSend. If implemented as a shared library, you could even replace the system library by using the DYLD_INSERT_LIBRARIES environment variable. The ability to replace or override a function in a dynamic library is known as interposing, and Apple even has some nice documentation on how to do this.

The variable argument signature of objc_msgSend makes interposing less than straight forward, though. As is common with most variable argument functions, a "v" variant that takes a pointer to the arguments is available, called objc_msgSendv. However, after some digging, it turns it using objc_msgSendv is a bit more involved than I wanted to deal with.

Surely someone has tried this before, so I turned to Google. Google, in turn, led me to OpenDarwin, Apple's open source repository. The whole ObjC runtime is open source, under the package name "objc4". A quick glance at the code is quite impressive. Someone has clearly spent a lot of time optimizing objc_msgSend. One bit of code that looked promising was the OBJC_HELP environment variable. When set, this variable shows all the tweaks you can do to the runtime. Here's the output when run on a simple ObjC program:

% OBJC_HELP=1 ./build/Debug/HelloWorld
objc: OBJC_HELP: describe Objective-C runtime environment variables
objc: OBJC_PRINT_OPTIONS: list which options are set
objc: OBJC_PRINT_IMAGES: log image and library names as the runtime loads them
objc: OBJC_PRINT_CONNECTION: log progress of class and category connections
objc: OBJC_PRINT_LOAD_METHODS: log class and category +load methods as they are called
objc: OBJC_PRINT_RTP: log initialization of the Objective-C runtime pages
objc: OBJC_PRINT_GC: log some GC operations
objc: OBJC_PRINT_SHARING: log cross-process memory sharing
objc: OBJC_PRINT_CXX_CTORS: log calls to C++ ctors and dtors for instance variables
objc: OBJC_DEBUG_UNLOAD: warn about poorly-behaving bundles when unloaded
objc: OBJC_DEBUG_FRAGILE_SUPERCLASSES: warn about subclasses that may have been broken by subsequent changes to superclasses
objc: OBJC_USE_INTERNAL_ZONE: allocate runtime data in a dedicated malloc zone
objc: OBJC_ALLOW_INTERPOSING: allow function interposing of objc_msgSend()
objc: OBJC_FORCE_GC: force GC ON, even if the executable wants it off
objc: OBJC_FORCE_NO_GC: force GC OFF, even if the executable wants it on
objc: OBJC_CHECK_FINALIZERS: warn about classes that implement -dealloc but not -finalize
2006-04-22 12:08:17.544 HelloWorld[4831] Hello, World!

Pretty cool! There are a few interesting options here. First is the suite of garbage collection (GC) variables. This is more evidence in support of GC in the next version of OS X, which has been speculated before. There was nothing about logging or tracing, however. The OBJC_ALLOW_INTERPOSING looked particularly interesting, though. This means that not only has someone tried to interpose objc_msgSend, but that Apple actually has support for this. Unfortunately, doing a Google search on this environment variable turned up nothing. But now at least I now knew interposing objc_msgSend could be done.

 

The next idea I had was to resort to assembly code. Surely, if I could just save the registers on the stack, I could log the information, then pop the stack and call the real objc_msgSend. By restoring the registers, I could bypass all the variable argument ugliness and it would all just work. Well a few minutes into my PPC assembly hacking, I started thinking that this really sucked. Although tracing was cool, and it was not really worth the effort it would take to complete. But again, I thought "surely someone else has done this". Now that I had the source code to the ObjC runtime, maybe I could create a custom version with logging or something.

 

One lucky search later (it's all a matter of searching with the right set of words), I stumbled across Technical Note TN2124: Max OS X Debugging Magic. Now I've read this article before, and it does indeed have a lot of good information. But apparently I missed one sentence buried away:

 

If you set the  NSObjCMessageLoggingEnabled environment variable to "YES", the Objective-C runtime will log all dispatched Objective-C messages to a file named  /tmp/msgSends-<pid>.

Holy crap! This is exactly what I was trying to implement! Sure enough, it works:

 

% NSObjCMessageLoggingEnabled=YES ./build/Debug/HelloWorld
2006-04-22 12:30:52.399 HelloWorld[4870] Hello, World!
% cat /tmp/msgSends-4870 
+ NSObject NSObject initialize
+ NSNotificationCenter NSObject initialize
+ NSNotificationCenter NSNotificationCenter defaultCenter
....

I told you it was quite easy, and it doesn't get much easier than this. All that crap about interposing and writing my own objc_msgSend was completely unnecessary. Okay, now I was curious. How was this being implemented? Even though I had my answer, I wanted to dig further. Well, this environment variable isn't part of the ObjC runtime (it's actually part of Foundation), but the runtime does have a function calledinstrumentObjcMessageSends, which enables and disables logging. You can do this in your code to selectively trace only a section of your code:

    instrumentObjcMessageSends(YES);
    // Do stuff...
    instrumentObjcMessageSends(NO);

You can even call this from GDB to enable it while debugging:

(gdb) call (void)instrumentObjcMessageSends(YES)

After more digging in the runtime, I found these functions:

typedef int (*ObjCLogProc)(BOOL, const char *, const char *, SEL);
void logObjcMessageSends(ObjCLogProc logProc);
static int LogObjCMessageSend(BOOL isClassMethod,
                              const char * objectsClass,
                              const char * implementingClass,
                              SEL selector);

Ah... you can customize your own logger! With this, you could do something like filter by class or method. It turns out using logObjcMessageSends is not that easy, though. You'll get an "undefined symbol" link error, even though the function is not defined as static. For some reason, Apple decided not to export this function from the shared object:

% nm /usr/lib/libobjc.dylib | egrep 'instrumentObjcMessageSends|logObjcMessageSends'
909c7cd4 T _instrumentObjcMessageSends
909c7d4c t _logObjcMessageSends

[Update 5-Aug-2006: Greg Miller describes how to override _logObjcMessageSends using nlist(3) inthis blog entry. Apparently nlist(3) can lookup private symbols. Thanks for the information, Greg!]

The lower-case "t" means this is a local symbol, not a global symbol. This is confirmed by looking at theobjc-exports-ppc file in the objc4 distribution. Only Apple knows why this isn't exported, but since we have the source, we can modify the export list to include this and rebuild. I had to use gcc 3.3 to compile objc4, as gcc 4.0 choked on some of the flags. One way to do this is to use gcc_select. I just edited theCC variable in the Makefile, so I can still use gcc 4.0 for everything else. However, it still won't link, as it's not easy to get the compiler to link against the hacked libobjc. Using -L didn't work for me, even thoughld(1) says "directories specified with -L are searched before the standard directories".

This requires resorting to some shared library trickery. We can use dlsym to lookup the address of this symbol at runtime, and assign it to a function pointer:

#include <dlfcn.h>

typedef int (*ObjCLogProc)(BOOL, const char *, const char *, SEL);
typedef void (*logObjcMessageSends_t)(ObjCLogProc logProc);
logObjcMessageSends_t logObjcMessageSends = 0;

int main (int argc, const char * argv[]) {
    logObjcMessageSends = dlsym(RTLD_DEFAULT, "logObjcMessageSends");

Now, to make sure that our app runs even against the standard libobjc, let's only use the pointer if it's not null:

    if (logObjcMessageSends != 0)
        logObjcMessageSends(MyLogObjCMessageSend);

You could even leave this in production code, if you wanted. Because of this trickery, you'll need to turn off ZeroLink. Also, to actually use our hacked libobj, you'll need to force dyld to use our library, instead of the system one:

% DYLD_INSERT_LIBRARIES=~/src/objc4-267.1/libobjc.A.dylib ./build/Debug/HelloWorld

Now, MyLogObjCMessageSend gets called and we can log where ever and whatever we like. A cool implementation would use some sort of config file with regular expressions. Logging everything withNSObjCMessageLoggingEnabled is good enough for me, though. I'll just stop right here, but it's good to know I could customize the logging if I needed to, as some future point.

I've since found this blog post mentioning instrumentObjcMessageSends. But it turns out usingNSObjCMessageLoggingEnabled is much simpler than the solution of using an input manager, too.

你可能感兴趣的:(Objective-C)