A little-known option to recent Java VMs holds a fair amount of mystery and quite a lot of power. That option is 'javaagent', and it holds the key to monitoring and profiling JVMs and being able to dynamically modify Java classes as they're being loaded.
Tapping into this power is described in the java.lang.instrument documentation [1]. In a nutshell, writing a Java class with a single method and placing it in a jar file with a custom manifest is all that's needed to become a startup Java Agent. The agent class must implement one of two methods:public static void premain(String agentArgs) or public static void premain(String agentArgs, java.lang.Instrumentation inst).These methods are passed any options that appear on the -javaagent option along with an optional java.lang.Instrumentation object that allows you to interact with the JVM. Of particular interest is the method addTransformer on the Instrumentation object which allows you to register a callback that is called just before the system loads a class. Using this callback in addition to code generation libraries such as ASM [2] allow you to modify class structure (add/remove methods) or modify the actual Java bytecode of any method in any class.
What is bytecode, Anyway?
The ASM project has an excellent guide to bytecode engineering [3] and there is an Eclipse plugin called 'Bytecode Outline' [4] that displays the bytecode for methods in the IDE. The bytecode outline plugin was extremely valuable when researching bytecode manipulation.
The Java virtual machine is in many ways analogous to microprocessors. It has the concept of opcodes and operands and the concept of a stack. It also has many opcodes that can be found in a microprocessor - loads and stores, arithmetic functions, conditional and unconditional branches, stack operations, etc. If differs from typical microprocessors, though, by the addition of opcodes to invoke static, virtual, and 'special' methods on objects, create new objects and arrays, and the concept of type-awareness; the JVM knows the type of every intrinsic type and object reference in the heap or on the stack and can perform type casts between compatible types. For instance, the bytecode ICONST_0 pushes the int value 0 on to the top of the stack, and the 'POP' bytecode removes the single-width item on the top of the stack and discards it. What is a single-width item, you ask? In a fit of asymmetry it was decided that long and double values would occupy two adjacent slots on the stack and that they must be manipulated as a unit. If you attempt to POP half of a long or double you will get a nasty exception. To pop a long or double you must use the 'POP2' bytecode, and pushing a long (LCONST_0) or double (DCONST_0) value pushes two values on to the stack. This implies that single-width items (object references or any of the remaining intrinsic types) are 4 bytes wide since longs and doubles are 8 bytes wide.
Almost every instruction interacts with the stack, whether to retrieve (pop) values from the top of the stack, or to store (push) values on to the top of the stack. If you're not familiar with stack-oriented processing, the ASM bytecode engineering guide [3] has an excellent primer. For example, to call System.currentTimeMillis(), you could use the following bytecode instruction:
INVOKESTATIC"java/lang/System""currentTimeMillis""()J" |
The bytecode is 'INVOKESTATIC', and its operands are the class that is to be acted on, the method name to be called on that class, and the method signature of the method. The method signature looks strange but is easy to decipher once you know how to read it. The signature "()J" tells the JVM to call the method named 'currentMillis' that takes no parameters (the '()' part) and that returns a long (the 'J' part). Why 'J' instead of 'L', you might ask? Apparently 'L' was already chosen to indicate an object reference, so longs were relegated to 'J'.The result of this call is that the value returned by System.currentTimeMillis() will be on the top of the stack. This is represented concisely as: Stack Before: "...", Stack After: "...,c.m(v1,...vn)". The '...' represents what's already on the stack, the c represents the class that was acted upon, m represents the method called, and v1...vn represent parameters (if any) popped from the stack. As expected, the value of the call (a long) now resides on the top two slots on the stack.
To print the value of System.currentTimeMillis() to System.out, you could use the following bytecode sequence:
GETSTATIC"java/lang/System""out""Ljava/io/PrintStream;" |
INVOKESTATIC"java/lang/System""currentTimeMillis""()J" |
INVOKEVIRTUAL"java/io/PrintStream""println""(J)V" |
The execution proceeds as follows.
- The GETSTATIC bytecode retrieves the static object 'out' of type 'Ljava/io/PrintStream' from the class 'java.lang.System' and pushes it in to the stack.
- The INVOKESTATIC bytecode calls the static class method currentTimeMillis() on the class 'java/lang/System' that returns a long and pushes that value on to the stack.
- The INVOKEVIRTUAL bytecode calls the instance method 'void println(long)' on the class 'java/io/PrintStream', popping the long parameter and object to act on from the stack.
The stack during the execution of these instructions looked like this, where 'o' is an object reference and 'J' is a long:
Instruction Stack Before Stack After |
INVOKESTATIC ...,o ...,o,J |
INVOKEVIRTUAL ...,o,J ... |
Changing Bytecode on the fly
The java.lang.Instrument package provides the hooks to view and modify class definitions, but does not provide facilities to add or remove fields or methods or to actually produce a byte array of valid bytecode. For those functions one must use a bytecode manipulation package such as ASM.
For example, the following class is painfully simple - the go() method prints "Hello, World!" to the console.
01 |
packagecom.davidtiller.test; |
03 |
publicclassAddPrintlnTest { |
05 |
publicstaticvoidmain(String[] args) { |
06 |
AddPrintlnTest me =newAddPrintlnTest(); |
11 |
System.out.println("Hello World!"); |
If we run that application with the following agent on the command line, the go() method prints "I Live!" before it prints "Hello World!".
01 |
packagecom.davidtiller.test; |
03 |
importjava.lang.instrument.ClassFileTransformer; |
04 |
importjava.lang.instrument.IllegalClassFormatException; |
05 |
importjava.lang.instrument.Instrumentation; |
06 |
importjava.security.ProtectionDomain; |
09 |
importorg.objectweb.asm.ClassReader; |
10 |
importorg.objectweb.asm.ClassVisitor; |
11 |
importorg.objectweb.asm.ClassWriter; |
12 |
importorg.objectweb.asm.Opcodes; |
13 |
importorg.objectweb.asm.tree.ClassNode; |
14 |
importorg.objectweb.asm.tree.FieldInsnNode; |
15 |
importorg.objectweb.asm.tree.InsnList; |
16 |
importorg.objectweb.asm.tree.LdcInsnNode; |
17 |
importorg.objectweb.asm.tree.MethodInsnNode; |
18 |
importorg.objectweb.asm.tree.MethodNode; |
21 |
publicclassAddPrintlnAgentimplementsClassFileTransformer { |
23 |
publicstaticvoidpremain(String agentArgs, Instrumentation inst) { |
24 |
inst.addTransformer(newAddPrintlnAgent()); |
27 |
publicbyte[] transform(ClassLoader loader, String className, |
28 |
Class<?> classBeingRedefined, ProtectionDomain protectionDomain, |
29 |
byte[] classfileBuffer)throwsIllegalClassFormatException { |
32 |
if(className.equals("com/davidtiller/test/AddPrintlnTest")) { |
33 |
ClassWriter cw =newClassWriter(0); |
34 |
ClassVisitor ca =newMyClassAdapter(cw); |
35 |
ClassReader cr =newClassReader(classfileBuffer); |
37 |
retVal = cw.toByteArray(); |
42 |
publicclassMyClassAdapterextendsClassNodeimplementsOpcodes { |
43 |
privateClassVisitor cv; |
44 |
publicMyClassAdapter(ClassVisitor cv) { |
49 |
publicvoidvisitEnd() { |
50 |
for(MethodNode mn : (List<MethodNode>) methods) { |
51 |
if(mn.name.equals("go")) { |
53 |
InsnList il =newInsnList(); |
54 |
il.add(newFieldInsnNode(GETSTATIC,"java/lang/System","out","Ljava/io/PrintStream;")); |
55 |
il.add(newLdcInsnNode("I live!")); |
56 |
il.add(newMethodInsnNode(INVOKEVIRTUAL,"java/io/PrintStream", |
57 |
"println","(Ljava/lang/String;)V")); |
58 |
mn.instructions.insert(il); |
As you can see, the AddPrintlnAgent class implements ClassTransformer and has the requisite premain agent method signature. In premain() we register ourselves as a transformer and return. We could've saved a reference to the Instrumentation object if we needed to call other methods on it in other sections of code. As an instance of ClassTransformer, we must implement the method 'transform' which supplies us with lots of information about classes that are being loaded. For each subsequent class that the JVM loads, our transform method will be called. In this example we only want to modify the class 'com/davidtiller/test/AddPrintlnTest', so we check for that name explicitly. If that is the class in question, we use the ASM suite to create a ClassWriter, a custom ClassAdapter, and a ClassReader to handle the actual class modification methods. These methods are fashioned after the visitor pattern [5], and have callbacks for each of the parts of a class. We chose the easy way out and used the Tree API and the pattern described in seection 5.2.2 of the ASM guide. It uses the visitEnd callback that is called after all of the other callbacks are finished. In the vistEnd() method we iterate over the existing methods and process ones named 'go' (with any method signature). We first create a new InsnList (a special linked list for AbstractInsnNodes) and insert the bytecodes to get a reference to System.out as before, push the String constant 'I Live!' on to the stack, and call void println(String) on the instance of System.out previously pushed on to the stack. This leaves the stack in the same state as before, so no cleanup is necessary. We then insert these instructions at the beginning of the existing instruction list. We must also modify the class's idea of the maximum stack size - since we push two single-width objects on to the stack, we must tell the class loader that the max stack size can be 2 larger than before. We then call accept() to finalize the changes. As expected, the output of the program is:
I live!
Hello World!
Packaging and Running the Code
Actually getting the agent to run is relatively simple. The agent class must reside in a jar that has a special manifest file. For the example above the manifest file was:
Manifest-Version: 1.0
Premain-Class: com.davidtiller.test.AddPrintlnAgent
The Java command line to run the test program was entirely normal except for this parameter: -javaagent:addprintlnagent.jar. This tells the JVM to use the class referenced in the Premain-Class line in the manifest as a load-time agent. Note that for this example the asm-3.3.1 and asm-tree-3.3.1 jars must be on the classpath. There are options for bootclasspath entries in the manifest and methods on the Instrumentation object to dynamically add classes to the classpath/bootclasspath, but since I used eclipse it was a simple matter of exporting the ASM and other libraries from the Agents project and arranging the AgentTests project to depend on them.
Next Steps
In subsequent installments of this series [Part 2, Part 3, and Part 4], additional examples of how to use Java agents will be explored.
Examples
All of the source code for the first 3 parts of this series is available in the attached zip file. Contained within it are all of the classes and libraries needed to run the examples. I have included the appropriate ASM, log4j, and commons-cli runtime jars and their associated licenses.
Footnotes
[1] Oracle Package java.lang.instrument Documentation
[2] ASM project website
[3] ASM Bytecode Engineering Guide
[4] Bytecode Outline Eclipse plugin
[5] The visitor pattern