0x01 基本介绍
AsmJit是一个完整的JIT(just In Time, 运行时刻)的针对C++语言的汇编器,可以生成兼容x86和x64架构的原生代码,不仅支持整个x86/x64的指令集(包括传统的MMX和最新的AVX2指令集),而且提供了一套可以在编译时刻进行语义检查的API。AsmJit的使用也没有任何的限制,适用于多媒体,虚拟机的后端,远程代码生成等等。
0x02 特性
0x03 环境
1. 操作系统
2. C++编译器
3. 后端
0x04 代码生成
0x05 配置和编译
AsmJit在设计之初的目的就是为了嵌入到任何项目之中。但是我们可以使用一些宏定义来添加或者删除AsmJit库的某些特性。生成AsmJit项目最直接的方法是使用cmake工具www.cmake.org ,但是如果只是在项目中嵌入AsmJit的源代码,可以通过编辑” asmjit /config.h “文件来打开或者关闭某些特定的特性,最简便的使用方法就是直接复制asmjit的源代码到项目中,然后定义“ASMJIT_STATIC”宏。
1. 生成类型
2. 生成模式
3. 体系结构
4. 特性
0x06 使用
1. 命名空间
AsmIit库使用的是全局命名空间 “asmjit”`,但是其中只包含一些基本的内容,而针对特定处理器的代码是用处理器、处理器的寄存器或者操作数作为前缀当成命名空间。例如针对x86和x64体系结构的类都会带有有“X86“的前缀。通过` kx86 `枚举的寄存器和操作数在“X86”的命名空间下都是可访问的。虽然这种设计和AsmJit最初的版本不同,但是现在无疑是可移植性最好的。
2. 运行时刻和代码生成器
3. 指令操作数
所有操作的基类都是“Operand”, 它包含使用所有类型的操作数的接口,并且大多数是通过值传递,而不是通过指针传递。”Reg”,”Var”,”Mem”,”Label”和”Imm”类都是继承自”Operand”并且提供不同的功能。依赖于处理器体系结构的操作数都会带有处理器结构作为前缀,例如“X86Reg”,”X86Mem”。大多数的处理器都会提供几种寄存器,例如X86/X64体系结构下的”X86GpReg”,”X86MmReg”,”X86FpReg”,”X86XmmReg”和”X86YmmReg”寄存器加上一些额外的段寄存器和”rip”寄存器。在使用代码生成器时,必须使用AsmJit的接口来显式地创建一些操作数。例如,labels是用代码生成器类的newLabel()方法创建,而变量需要用针对不同体系结构的特定方法来创建,例如“newGpVar()”, “newMmVar()”和“newXmmVar()”。
4. 函数原型
AsmJit需要知道产生或调用的函数原型。AsmJit包含类型和寄存器之间的映射关系,并且用来表示函数原型。函数生成器是一个模板类,通过使用C/C++原生类型来生成可以描述函数参数和返回值的函数原型。它把C / C + +原生类型转化为AsmJit特定的标识符并且使这些标识符访问编译器。
5. 实际使用
#include <asmjit/asmjit.h> using namespace asmjit; int main(int argc, char* argv[]) { // Create JitRuntime and X86 Compiler. JitRuntime runtime; X86Compiler c(&runtime); // Build function having two arguments and a return value of type 'int'. // First type in function builder describes the return value. kFuncConvHost // tells compiler to use a host calling convention. c.addFunc(kFuncConvHost, FuncBuilder2<int, int, int>()); // Create 32-bit variables (virtual registers) and assign some names to // them. Using names is purely optional and only greatly helps while // debugging. X86GpVar a(c, kVarTypeInt32, "a"); X86GpVar b(c, kVarTypeInt32, "b"); // Tell asmjit to use these variables as function arguments. c.setArg(0, a); c.setArg(1, b); // a = a + b; c.add(a, b); // Tell asmjit to return 'a'. c.ret(a); // Finalize the current function. c.endFunc(); // Now the Compiler contains the whole function, but the code is not yet // generated. To tell compiler to generate the function make() has to be // called. // Make uses the JitRuntime passed to Compiler constructor to allocate a // buffer for the function and make it executable. void* funcPtr = c.make(); // In order to run 'funcPtr' it has to be casted to the desired type. // Typedef is a recommended and safe way to create a function-type. typedef int (*FuncType)(int, int); // Using asmjit_cast is purely optional, it's basically a C-style cast // that tries to make it visible that a function-type is returned. FuncType func = asmjit_cast<FuncType>(funcPtr); // Finally, run it and do something with the result... int x = func(1, 2); printf("x=%d\n", x); // Outputs "x=3". // The function will remain in memory after Compiler is destroyed, but // will be destroyed together with Runtime. This is just simple example // where we can just destroy both at the end of the scope and that's it. // However, it's a good practice to clean-up resources after they are // not needed and using runtime.release() is the preferred way to free // a function added to JitRuntime. runtime.release((void*)func); // Runtime and Compiler will be destroyed at the end of the scope. return 0; }
上面代码中的注释已经非常清楚了,但还是有些细节需要说明。上面使用的产生和调用函数的调用约定” kFuncConvHost “。32位的架构包含一个广泛的函数调用约定,所以了解C++编译器所采用的调用约定是非常重要的,大多数编译器默认采用cdecl的调用约定。但是在64位的架构上只有两种调用约定,一种是Windows的Win64调用约定,另一种是类Unix系统采用的AMD64 调用约定。因此”KFuncConvHost”根据处理器的架构和操作系统可以被定义为Cdecl,Win64或者是AMD64。
整数的默认大小也取决于特定的平台,虚拟类型”kVarTypeIntPtr”和”kVarTypeUIntPtr”用来增强程序的可移植性,并且在使用指针时应该尽量用虚拟类型来定义。当没有指定类型时,AsmJit总是使用默认类型”kVarTypeIntPtr”。 在上面的代码中整数默认为32位。
6. 标识符的使用
#include <asmjit/asmjit.h> using namespace asmjit; int main(int argc, char* argv[]) { JitRuntime runtime; X86Compiler c(&runtime); // This function uses 3 arguments. c.addFunc(kFuncConvHost, FuncBuilder3<int, int, int, int>()); // New variable 'op' added. X86GpVar op(c, kVarTypeInt32, "op"); X86GpVar a(c, kVarTypeInt32, "a"); X86GpVar b(c, kVarTypeInt32, "b"); c.setArg(0, op); c.setArg(1, a); c.setArg(2, b); // Create labels. Label L_Subtract(c); Label L_Skip(c); // If (op != 0) // goto L_Subtract; c.test(op, op); c.jne(L_Subtract); // a = a + b; // goto L_Skip; c.add(a, b); c.jmp(L_Skip); // L_Subtract: // a = a - b; c.bind(L_Subtract); c.sub(a, b); // L_Skip: c.bind(L_Skip); c.ret(a); c.endFunc(); // The prototype of the generated function changed also here. typedef int (*FuncType)(int, int, int); FuncType func = asmjit_cast<FuncType>(c.make()); int x = func(0, 1, 2); int y = func(1, 1, 2); printf("x=%d\n", x); // Outputs "x=3". printf("y=%d\n", y); // Outputs "y=-1". runtime.release((void*)func); return 0; }
在上面的例子中,有条件和无条件跳转一起使用。标识符是通过传递”Compiler”的一个实例给”Label”的构造函数或者是使用”Label l = c.newLable()“由”Compiler“显式的创建。每一个标识符都有唯一的标识,但它不是一个字符串,没有任何方法来查询已经存在的标识符的实例。标识符像其他的操作数一样被通过赋值来移动,因此该标签的副本将仍然引用原先的地址,而另一个复制的标识符将不会改变原来的标识符。
7. 内存地址
x86/x64架构有几种内存寻址方式,可以通过基址寄存器,变址寄存器和偏移寻址。AsmJit支持所有形式的内存寻址。内存操作数可以用”asmjit::x86Mem”创建,也可以使用相关的非成员函数例如:”asmjit::x86::ptr”`或者” asmjit::x86::ptr_abs “创建。使用”ptr”创建具有可选的索引寄存器和移位寄存器的内存操作数的使用和基础;` ptr_abs `创建一个内存操作数指内存中的绝对地址(32位)和任选地具有一个索引寄存器。
#include <asmjit/asmjit.h> using namespace asmjit; int main(int argc, char* argv[]) { JitRuntime runtime; X86Compiler c(&runtime); // Function returning 'int' accepting pointer and two indexes. c.addFunc(kFuncConvHost, FuncBuilder3<int, const int*, intptr_t, intptr_t>()); X86GpVar p(c, kVarTypeIntPtr, "p"); X86GpVar aIndex(c, kVarTypeIntPtr, "aIndex"); X86GpVar bIndex(c, kVarTypeIntPtr, "bIndex"); c.setArg(0, p); c.setArg(1, aIndex); c.setArg(2, bIndex); X86GpVar a(c, kVarTypeInt32, "a"); X86GpVar b(c, kVarTypeInt32, "b"); // Read 'a' by using a memory operand having base register, index register // and scale. Translates to 'mov a, dword ptr [p + aIndex << 2]'. c.mov(a, ptr(p, aIndex, 2)); // Read 'b' by using a memory operand having base register only. Variables // 'p' and 'bIndex' are both modified. // Shift bIndex by 2 (exactly the same as multiplying by 4). // And add scaled 'bIndex' to 'p' resulting in 'p = p + bIndex * 4'. c.shl(bIndex, 2); c.add(p, bIndex); // Read 'b'. c.mov(b, ptr(p)); // a = a + b; c.add(a, b); c.ret(a); c.endFunc(); // The prototype of the generated function changed also here. typedef int (*FuncType)(const int*, intptr_t, intptr_t); FuncType func = asmjit_cast<FuncType>(c.make()); // Array passed to 'func' const int array[] = { 1, 2, 3, 5, 8, 13 }; int x = func(array, 1, 2); int y = func(array, 3, 5); printf("x=%d\n", x); // Outputs "x=5". printf("y=%d\n", y); // Outputs "y=18". runtime.release((void*)func); return 0; }
8. 栈的使用
下面的例子中申请了256 bytes大小的栈,用0到255填充,然后迭代一次,计算所有值的和。
#include <asmjit/asmjit.h> using namespace asmjit; int main(int argc, char* argv[]) { JitRuntime runtime; X86Compiler c(&runtime); // Function returning 'int' without any arguments. c.addFunc(kFuncConvHost, FuncBuilder0<int>()); // Allocate a function stack of size 256 aligned to 4 bytes. X86Mem stack = c.newStack(256, 4); X86GpVar p(c, kVarTypeIntPtr, "p"); X86GpVar i(c, kVarTypeIntPtr, "i"); // Load a stack address to 'p'. This step is purely optional and shows // that 'lea' is useful to load a memory operands address (even absolute) // to a general purpose register. c.lea(p, stack); // Clear 'i'. Notice that xor_() is used instead of xor(), because xor is // unfortunately a keyword in C++. c.xor_(i, i); // First loop, fill the stack allocated by a sequence of bytes from 0 to 255. Label L1(c); c.bind(L1); // Mov [p + i], i. // // Any operand can be cloned and modified. By cloning 'stack' and calling // 'setIndex' we created a new memory operand based on stack having an // index register set. c.mov(stack.clone().setIndex(i), i.r8()); // if (++i < 256) // goto L1; c.inc(i); c.cmp(i, 256); c.jb(L1); // Second loop, sum all bytes stored in 'stack'. X86GpVar a(c, kVarTypeInt32, "a"); X86GpVar t(c, kVarTypeInt32, "t"); c.xor_(i, i); c.xor_(a, a); Label L2(c); c.bind(L2); // Movzx t, byte ptr [stack + i] c.movzx(t, stack.clone().setIndex(i).setSize(1)); // a += t; c.add(a, t); // if (++i < 256) // goto L2; c.inc(i); c.cmp(i, 256); c.jb(L2); c.ret(a); c.endFunc(); typedef int (*FuncType)(void); FuncType func = asmjit_cast<FuncType>(c.make()); printf("a=%d\n", func()); // Outputs "a=32640". runtime.release((void*)func); return 0; }
0x07 高级特性
1. 日志和错误处理
Logger可以分配给任何一个代码生成器实例,可以将单个的Logger实例分配给无限制的多个代码生成器使用,虽然使用多线程来运行多个代码生成器并不实用。因为”FileLogger”类使用的是标准的C语言的FILE* 文件流,是线程安全的,但是”StringLogger”类不是。
// Create logger logging to `stdout`. Logger life-time should always be // greater than lifetime of the code generator. FileLogger logger(stdout); // Create a code generator and assign our logger into it. X86Compiler c(...); c.setLogger(&logger); // ... Generate the code ...
StringLogger logger; // Create a code generator and assign our logger into it. X86Compiler c(...); c.setLogger(&logger); // ... Generate the code ... printf("Logger Content:\n%s", logger.getString()); // You can also use `logger.clearString()` if the logger // instance will be reused.
2. 代码注入
X86Compiler c(...); X86GpVar a(c, kVarTypeInt32, "a"); X86GpVar b(c, kVarTypeInt32, "b"); Node* here = c.getCursor(); c.mov(b, 2); // Now, 'here' can be used to inject something before 'mov b, 2'. To inject // anything it's good to remember the current cursor so it can be set back // after the injecting is done. When setCursor() is called it returns the old // cursor. Node* oldCursor = c.setCursor(here); c.mov(a, 1); c.setCursor(oldCursor);
c.mov(a, 1); c.mov(b, 2);