smali语法简介
dalvik's bytecode has two major classes of types, primitive types and reference types. Reference types are objects and arrays, everything else is a primitive.
Primitives are represented by a single letter. I didn't come up with these abbreviations - they are what is actually stored in the dex file, in string form. They are specified in the dex-format.html document (dalvik/docs/dex-format.html in the AOSP repository)
V | void - can only be used for return types |
Z | boolean |
B | byte |
S | short |
C | char |
I | int |
J | long (64 bits) |
F | float |
D | double (64 bits) |
Objects take the form Lpackage/name/ObjectName; - where the leading L indicates that it is an object type, package/name/ is the package that the object is in, ObjectName is the name of the object, and ; denotes the end of the object name. This would be equivalent topackage.name.ObjectName in java. Or for a more concrete example, Ljava/lang/String; is equivalent to java.lang.String
Arrays take the form [I - this would be an array of ints with a single dimension. i.e. int[] in java. For arrays with multiple dimensions, you simply add more [ characters. [[I = int[][], [[[I = int[][][], etc. (Note: The maximum number of dimensions you can have is 255).
You can also have arrays of objects, [Ljava/lang/String; would be an array of Strings.
Methods are always specified in a very verbose form that includes the type that contains the method, the method name, the types of the parameters and the return type. All this information is required for the virtual machine to be able to find the correct method, and to be able to perform static analysis on the bytecode (for verification/optimization purposes)
They take the form
Lpackage/name/ObjectName;->MethodName(III)Z
In this example, you should recognize Lpackage/name/ObjectName; as a type. MethodName is obviously the name of the method. (III)Z is the method's signature. III are the parameters (in this case, 3 ints), and Z is the return type (bool).
The method parameters are listed one right after another, with no separators between them.
Here's a more complex example:
method(I[[IILjava/lang/String;[Ljava/lang/Object;)Ljava/lang/String;
In java, this would be
String method(int, int[][], int, String, Object[])
Fields are likewise always specified in verbose form that includes the type that contains the field, the name of the field, and the type of the field. Again, this is to allow the virtual machine to be able to find the correct field, as well as to perform static analysis on the bytecode.
They take the form
Lpackage/name/ObjectName;->FieldName:Ljava/lang/String;
This should be pretty self-explanatory - it is the package name, the field name and the type of the field respectively.
In dalvik's bytecode, registers are always 32 bits, and can hold any type of value. 2 registers are used to hold 64 bit types (Long and Double).
There are two ways to specify how many registers are available in a method. the .registers directive specifies the total number of registers in the method, while the alternate .locals directive specifies the number of non-parameter registers in the method. The total number of registers would also include however many registers are needed to hold the method parameters.
When a method is invoked, the parameters to the method are placed into the last n registers. If a method has 2 arguments, and 5 registers (v0-v4), the arguments would be placed into the last 2 registers - v3 and v4.
The first parameter to a non-static methods is always the object that the method is being invoked on.
For example, let's say you are writing a non-static method LMyObject;->callMe(II)V. This method has 2 integer parameters, but it also has an implicit LMyObject; parameter before both integer parameters, so there are a total of 3 arguments to the method.
Let's say you specify that there are 5 registers in the method (v0-v4), with either the .registers 5 directive or the .locals 2 directive (i.e. 2local registers + 3 parameter registers). When the method is invoked, the object that the method is being invoked on (i.e. the this reference) will be in v2, the first integer parameter will be in v3, and the second integer parameter will be in v4.
For static methods it's the same thing, except there isn't an implicit this argument.
There are two naming schemes for registers - the normal v naming scheme and the p naming scheme for parameter registers. The first register in the p naming scheme is the first parameter register in the method. So let's go back to the previous example of a method with 3 arguments and 5 total registers. The following table shows the normal v name for each register, followed by the p name for the parameter registers
v0 | the first local register | |
v1 | the second local register | |
v2 | p0 | the first parameter register |
v3 | p1 | the second parameter register |
v4 | p2 | the third parameter register |
You can reference parameter registers by either name - it makes no difference.
The p naming scheme was introduced as a practical matter, to solve a common annoyance when editing smali code.
Say you have an existing method with a number of parameters and you are adding some code to the method, and you discover that you need an extra register. You think "No big deal, I'll just increase the number of registers specified in the .registers directive!".
Unfortunately, it isn't quite that easy. Keep in mind that the method parameters are stored in the last registers in the method. If you increase the number of registers - you change which registers the method arguments get put into. So you would have to change the .registers directive andrenumber every parameter register.
But if the p naming scheme was used to reference parameter registers throughout the method, you can easily change the number of registers in the method, without having to worry about renumbering any existing registers.
Note: by default baksmali will use the p naming scheme for parameter registers. If you want to disable this for some reason and force baksmali to always use the v naming scheme, you can use the -p/--no-parameter-registers option.
As mentioned previously, long and double primitives (J and D respectively) are 64 bit values, and require 2 registers. This is important to keep in mind when you are referencing method arguments. For example, let's say you have a (non-static) method LMyObject;->MyMethod(IJZ)V. The parameters to the method are LMyObject;, int, long, bool. So this method would require 5 registers for all of its parameters.
p0 | this |
p1 | I |
p2, p3 | J |
p4 | Z |
Also, when you are invoking the method later on, you do have to specify both registers for any double-wide arguments in the register list for the invoke- instruction.
baksmali now has the ability to disassemble odex files, and optionally to deodex them.
NOTE: This page is only applicable for baksmali v0.96-v1.1. Newer versions of baksmali starting at v1.2 do not require (and cannot be used with) deodexerant. Instructions for the latest versions can be found at http://code.google.com/p/smali/wiki/DeodexInstructions
It requires the help of deodexerant, which is a small binary that runs on the phone and links to the dalvik libraries. It's purpose is to provide info to baksmali that can only be obtained from a running instance of dalvik (things like vtable indexes, field byte offsets, etc.).
The syntax for deodexerant is:
deodexerant <odex_file> <port>
And then baksmali has a new -x option to tell it to deodex the file. The syntax is
baksmali -x <host>:<port> <odex_file>
You can also use -x :<port> as a shortcut for -x localhost:<port>
The main thing you have to keep in mind when deodexing something is that you must use the same "bootclasspath" jars that were used when the odex was created. The easiest way to satisfy this is to actually be running the firmware that you are trying to deodex. It's also possible to setup a chroot environment if you need to deodex something from a different firmware, but I'll leave that as an exercise for the reader. Feel free to ping me on #smali on irc.freenode.com if you need any help with that.
So let's say you are running a rom based on an official t-mobile US image, and the apps in /system/app are odexed. A deodex session would go something like this:
adb push deodexerant /data/local adb shell chmod 755 /data/local/deodexerant adb forward tcp:1234 tcp:1234 adb pull /system/app/Calculator.odex . adb shell /data/local/deodexerant /system/app/Calculator.odex 1234 & java -jar baksmali.jar -x :1234 Calculator.odex
You will then have the usual set of .smali files in the out directory, ready to be re-assembled back into a classes.dex file with smali.
Note how I use adb forward so that the communication happens over usb. If you try to do this over wifi it is slow. The latency of the usb link is much less than the latency over wifi, and since baksmali makes a large number of smallish, synchronous requests to deodexerent, latency killsits performance.
When deodexing, there are several common issues you may run into.
The most common, which is almost guaranteed to happen if you're deodexing an entire build, is an error which is something like "Cannot find class Lfoo/bar; for common superclass lookup". This is caused when the odex file has an additional dependency, beyond the jars in the normal BOOTCLASSPATH. To resolve the issue, you need to find which jar contains the class mentioned in the error message, and then add that to the BOOTCLASSPATH environment variable (on the phone/device/emulator) before running deodexerant. You can usually guess which jar it is from the class name, but if not, you can disassemble the jars and find which one has that class.
Once you find the extra dependency, let's say /system/framework/com.google.android.maps.jar (which is one that is commonly needed), the deodexerant command would be
(on linux/mac)
adb shell BOOTCLASSPATH=\$BOOTCLASSPATH:/system/framework/com.google.android.maps.jar deodexerant blah.odex 1234 &
(on windows)
adb shell BOOTCLASSPATH=$BOOTCLASSPATH:/system/framework/com.google.android.maps.jar deodexerant blah.odex 1234 &
Another issue you may run into is when it runs out of heap memory. The error might look something like "java.lang.OutOfMemoryError: Java heap space". To resolve this, you can add the java parameter -Xmx512m to increase the heap size to 512mb. You can increase it further if needed, of course. If you are using something like "java -jar baksmali.jar" to run baksmali, then the command would instead be
java -Xmx512m -jar baksmali.jar -x :1234 blah.odex
Or if you are using the wrapper script,
baksmali -JXmx512m -x :1234 blah.odex
Another issue that comes up on some platforms (I think windows in particular) is running out of stack space. The error will look something like "java.lang.StackOverflowError". The stack trace for the error will likely also contain a lot of lines that look something like "at org.jf.dexlib.Util.DeodexUtil$insn.propagateRegisters(DeodexUtil.java:1396)". To fix this issue, you can increase the stack size with -Xss10m.
java -Xss10m -jar baksmali.jar -x :1234 blah.odex
or
baksmali -JXss10m -x :1234 blah.odex
As of v1.2, baksmali has the ability to load the BOOTCLASSPATH files directly and calculate all the information needed to deodex, so that deodexerant is no longer needed. Deodexing can be done completely on a computer, without needing a device/emulator.
Prior to v1.2, in order to deodex an odex file, baksmali required a helper binary named deodexerant than ran on the phone, and provided information to baksmali about the classes loaded by dalvik, dumps of the virtual method table for classes, field offsets, inline methods, superclasses, etc. The previous deodex instructions for deodexerant can be found athttp://code.google.com/p/smali/wiki/DeodexInstructions_Deodexerant
In short, an odex file is an optimized version of a classes.dex file that has optimizations that are device specific. In particular, an odex file has dependencies on every "BOOTCLASSPATH" file that is loaded when it is generated. The odex file is only valid when used with these exact BOOTCLASSPATH files. dalvik enforces this by storing a checksum of each file that the odex file is dependent on, and ensuring that the checksum for each file matches when the odex file is loaded.
The BOOTCLASSPATH is simply a list of the jars/apk from which classes can be loaded, in addition to the main apk/jar that is loaded. A normal android system has 5 jars in it's base BOOTCLASSPATH - core.jar, ext.jar, framework.jar, android.policy.jar and services.jar. These can all be found in /system/framework. However, some apks have dependencies on additional jar or apks files beyond that of the base 5 jars. For example, for applications that use google maps, com.google.android.maps.jar will be appended to the BOOTCLASSPATH for that application's apk.
These odex dependencies make life a bit difficult for a couple of reasons. For one - you can't take an apk+odex file from one system image and run it on another system image (unless the other system image uses the exact same framework files). Another problem is that if you make any changes to any of the BOOTCLASSPATH files, it will invalidate every odex that depends on that file - basically every apk/jar on the device.
The examples below assume you are using the baksmali wrapper script available on the downloads page to call baksmali. If you are calling the jar directly, you can replace "baksmali" with "java -jar baksmali.jar" instead
In order for baksmali to be able to deodex something, it has to load every BOOTCLASSPATH file that the odex depends on. By default, baksmali will try to find the 5 "core" BOOTCLASSPATH files in the current directory. It can use either a jar/apk (with a classes.dex inside), or an odex file (in which case the corresponding jar/apk isn't needed).
You can simply use the -x option to tell baksmali that it should deodex the input. If the input isn't an odex file, the option will be ignored. For example:
baksmali -x Calculator.odex
You can specify the BOOTCLASSPATH files to use with the -c option. This option has two modes of operation. If the option value does not start with a colon ':', then it is used as the entire BOOTCLASSPATH. Alternatively, if the value does start with a colon ':', then any entries are appended to the default BOOTCLASSPATH instead. In either case, multiple entries are separated with a colon ':'. For example, assuming "blah.odex" is an odex file that additionally depends on com.google.android.gtalkservice.jar:
baksmali -c :com.google.android.gtalkservice.jar -x blah.odex
which is equivalent to
baksmali -c core.jar:ext.jar:framework.jar:android.policy.jar:services.jar:com.google.android.gtalkservice.jar -x blah.odex
In either case, it will look for the 5 base BOOTCLASSPATH files along with gtalkservice in the current directory.
If one or more of the BOOTCLASSPATH files aren't in the current directory, you can add a directory to search with the -d option. You can't add multiple directories with a single -d option, but you can specify multiple -d options. For example, if you pulled the /system partition off of a phone, and you are currently in the system/app folder, then you would specify -d ../framework, to have it look in the correct place for the BOOTCLASSPATH files.
baksmali -d ../framework.jar -x Calculator.odex
When you are deodexing files, there are a couple of common issues you may run into.
Missing BOOTCLASSPATH files
The most common issue, which is almost guaranteed to happen if you're deodexing an entire build, is an error which is something like "org.jf.dexlib.Code.Analysis.ValidationException: class Lcom/google/android/gtalkservice/IChatListener; cannot be resolved."
This is caused when the odex file has an additional dependency, beyond the jars in the normal BOOTCLASSPATH. To resolve the issue, you need to find which jar contains the class mentioned in the error message, and then add that to the BOOTCLASSPATH with the -c option. You can usually guess which jar it is in from the class name, but if not, you can disassemble the framework jars and find which one has that class.
No classes.dex file
There is currently an issue in baksmali so that if it finds a BOOTCLASSPATH jar file without a classes.dex file, it will immediately bomb out, instead of continuing to look (i.e. for an odex file). If you run into this, the error message will be something like "org.jf.dexlib.Util.ExceptionWithContext: zip file core.jar does not contain a classes.dex file".
A temporary work around is to delete or rename the BOOTCLASSPATH jar files, leaving only the odex files.
Heap Space
Another issue you may run into is when it runs out of heap memory. The error might look something like "java.lang.OutOfMemoryError: Java heap space". To resolve this, you can use the -Xmx parameter to increase the heap size. A reasonable size to try would be 512m.You can increase it further if needed, of course. If you are using the wrapper script to call baksmali, you can use -JXmx instead. For example
baksmali -JXmx512m -x blah.odex
Or if you are calling the jar directly with something like "java -jar baksmali.jar":
java -Xmx512m -jar baksmali.jar -x blah.odex
When deodexing, in some of the smali files that baksmali produces, you might notice some cases where baksmali replaces an odexed instruction with either a throw or something else, with a comment like "Replaced unresolvable optimized instruction with a throw".
Take, for example, the following java code:
Object blah = null; blah.toString();
The corresponding smali code would be:
const v0, 0 invoke-virtual {v0}, Ljava/lang/Object;->toString();
This will of course result in a null pointer exception - but it is valid code. In practice, these cases are a bit more disguised of course.
When the code is odexed, the invoke-virtual instruction would be replaced with an invoke-virtual-quick instruction, like so:
const v0, 0 invoke-virtual-quick {v0), vtable@7
Where vtable@7 is the index into Object's virtual method table where the toString method is.
But notice that there isn't any mention of what class the method is in. Since v0 is always null, and we only have a vtable index, it is impossible for baksmali to know what class to look at. So It's impossible to deodex this instruction. However, it can do the next best thing: it can replace the instruction with something that has the exact same effect. Keep in mind that v0 is always null, so any sort of method invocation on it would just end up in an NPE being thrown. So baksmali just replaces the unresolvable odex instruction with something else that will also throw an NPE.
Additionally, any code that comes after that (up until another code path branches in) is effectively dead code that can never be reached, and in some cases, if the code depended on the result of the method that we couldn't resolve, then that code is also impossible to deodex. Since it can never be reached, it is just removed (commented out).
So, in short, these cases shouldn't affect the semantics/functionality of the bytecode. When you see "replaced unresolvable optimized instruction", it's typically nothing to be concerned about. Baksmali handles it in a way that doesn't affect the functionality/semantics of the code. (And if it doesn't, it's a bug in baksmali).
Currently, there is a known issue related to this, where if all the code in a try block is commented out because it follows an unresolvable odex instruction, the (empty) try block is left in, and when installed onto a device, the empty try block will cause dalvik to reject the dex file.
This is likely an incomplete list. If you encounter something that is missing, feel free to leave a comment and I'll add it on
All of the java libraries that are needed should be automatically downloaded by mvn
git clone https://jesusfreke%[email protected]/p/smali/ cd smali/util mvn install cd ../dexlib mvn install cd ../smali mvn assembly:assembly cd ../baksmali mvn assembly:assembly
The jar files should be at
smali/target/smali-<version>-jar-with-dependencies.jar
and
baksmali/target/baksmali-<version>-jar-with-dependencies.jar
If you get any memory related exceptions while building (OutOfMemoryError, or whatever), you'll need to increase the maximum heap size for mvn.
If you're in a *nix environment
export MAVEN_OPTS=-Xmx512m
Or windows (unverified):
set MAVEN_OPTS=-Xmx512m