Luajit -bl命令用于将luajit字节码文件或者lua脚本文件反汇编,输出汇编指令,很好奇怎么将字节码文件和lua脚本文件放在一块处理的,下面一步步分析:
luajit虚拟机由luajit.c文件生成,首先定位到main函数,代码如下:
int main(int argc, char **argv)
{
int status;
lua_State *L = lua_open();
if (L == NULL) {
l_message(argv[0], "cannot create state: not enough memory");
return EXIT_FAILURE;
}
smain.argc = argc;
smain.argv = argv;
status = lua_cpcall(L, pmain, NULL);
report(L, status);
lua_close(L);
return (status || smain.status > 0) ? EXIT_FAILURE : EXIT_SUCCESS;
}
进行一些初始化操作后,若没出现异常,则调用pmain函数,pmian函数代码如下:
static int pmain(lua_State *L)
{
struct Smain *s = &smain;
char **argv = s->argv;
int argn;
int flags = 0;
globalL = L;
if (argv[0] && argv[0][0]) progname = argv[0];
LUAJIT_VERSION_SYM(); /* Linker-enforced version check. */
argn = collectargs(argv, &flags);
if (argn < 0) { /* Invalid args? */
print_usage();
s->status = 1;
return 0;
}
if ((flags & FLAGS_NOENV)) {
lua_pushboolean(L, 1);
lua_setfield(L, LUA_REGISTRYINDEX, "LUA_NOENV");
}
/* Stop collector during library initialization. */
lua_gc(L, LUA_GCSTOP, 0);
luaL_openlibs(L);
lua_gc(L, LUA_GCRESTART, -1);
createargtable(L, argv, s->argc, argn);
if (!(flags & FLAGS_NOENV)) {
s->status = handle_luainit(L);
if (s->status != LUA_OK) return 0;
}
if ((flags & FLAGS_VERSION)) print_version();
s->status = runargs(L, argv, argn);
if (s->status != LUA_OK){
return 0;
}
if (s->argc > argn) {
s->status = handle_script(L, argv + argn);
if (s->status != LUA_OK) return 0;
}
if ((flags & FLAGS_INTERACTIVE)) {
print_jit_status(L);
dotty(L);
} else if (s->argc == argn && !(flags & (FLAGS_EXEC|FLAGS_VERSION))) {
if (lua_stdin_is_tty()) {
print_version();
print_jit_status(L);
dotty(L);
} else {
dofile(L, NULL); /* Executes stdin as a file. */
}
}
return 0;
}
pmain函数中有两个关键调用:
argn = collectargs(argv, &flags);
s->status = runargs(L, argv, argn);
分别用于获取输入的参数和获取参数后执行,runargs代码如下:
static int runargs(lua_State *L, char **argv, int argn)
{
int i;
for (i = 1; i < argn; i++) {
if (argv[i] == NULL) continue;
lua_assert(argv[i][0] == '-');
switch (argv[i][1]) {
case 'e': {
const char *chunk = argv[i] + 2;
if (*chunk == '\0') chunk = argv[++i];
lua_assert(chunk != NULL);
if (dostring(L, chunk, "=(command line)") != 0)
return 1;
break;
}
case 'l': {
const char *filename = argv[i] + 2;
if (*filename == '\0') filename = argv[++i];
lua_assert(filename != NULL);
if (dolibrary(L, filename))
return 1;
break;
}
case 'j': { /* LuaJIT extension. */
const char *cmd = argv[i] + 2;
if (*cmd == '\0') cmd = argv[++i];
lua_assert(cmd != NULL);
if (dojitcmd(L, cmd))
return 1;
break;
}
case 'O': /* LuaJIT extension. */
if (dojitopt(L, argv[i] + 2))
return 1;
break;
case 'b': /* LuaJIT extension. */
return dobytecode(L, argv+i);
default: break;
}
}
return LUA_OK;
}
runarg函数根据获取的参数执行对应的操作,现在我们关心的是 -bl命令,可以看到只要命令的第一个字节为b时,进入dobytecode函数,这就包括了 -b -bl -bg等,该函数代码如下:
/* Save or list bytecode. */
static int dobytecode(lua_State *L, char **argv)
{
int narg = 0;
lua_pushliteral(L, "bcsave");
if (loadjitmodule(L))
return 1;
if (argv[0][2]) { // -b 后面有其它的参数 如 -bl -bg
narg++;
argv[0][1] = '-';
lua_pushstring(L, argv[0]+1);
}
for (argv++; *argv != NULL; narg++, argv++){
lua_pushstring(L, *argv);
}
report(L, lua_pcall(L, narg, 0, 0));
return -1;
}
从注释可以看出,该函数用于保存或反汇编字节码,它通过bcsave.lua脚本执行操作,先通过lua_pushstring压入参数,argv[0][2]为真表示 -b后面还有字符, 如 -bl命令,并将 -bl替换成 -l再压入参数。通过循环压入了文件名参数,先不管luajit内部怎么获取到bcsave脚本的,我们直接进入bcsave.lua查看源码:
-- Public module functions.
return {
start = docmd -- Process -b command line option.
}
这个语句说明先执行docmd函数,从注释可以看出 改文件也是处理 -b类命令的,docmd代码如下:
local function docmd(...)
local arg = {...}
local n = 1
local list = false
local ctx = {
strip = true, arch = jit.arch, os = string.lower(jit.os),
type = false, modname = false,
}
while n <= #arg do
local a = arg[n]
if type(a) == "string" and string.sub(a, 1, 1) == "-" and a ~= "-" then
table.remove(arg, n)
if a == "--" then break end
for m=2,#a do
local opt = string.sub(a, m, m)
if opt == "l" then
list = true
elseif opt == "s" then
ctx.strip = true
elseif opt == "g" then
ctx.strip = false
else
if arg[n] == nil or m ~= #a then usage() end
if opt == "e" then
if n ~= 1 then usage() end
arg[1] = check(loadstring(arg[1]))
elseif opt == "n" then
ctx.modname = checkmodname(table.remove(arg, n))
elseif opt == "t" then
ctx.type = checkarg(table.remove(arg, n), map_type, "file type")
elseif opt == "a" then
ctx.arch = checkarg(table.remove(arg, n), map_arch, "architecture")
elseif opt == "o" then
ctx.os = checkarg(table.remove(arg, n), map_os, "OS name")
else
usage()
end
end
end
else
n = n + 1
end
end
if list then
if #arg == 0 or #arg > 2 then usage() end
bclist(arg[1], arg[2] or "-")
else
if #arg ~= 2 then usage() end
bcsave(ctx, arg[1], arg[2])
end
end
从中可以看出,当命令为 -l时,执行bclist函数,否则执行 bcsave函数。这篇文章我们只关心 -bl的处理过程,此时arg[1]为 -bl后的参数,即文件名,arg[2]为空时,输入 -,bclist代码如下:
local function bclist(input, output)
local f = readfile(input)
require("jit.bc").dump(f, savefile(output, "w"), true)
end
它调用了jit/bc.lua中的dump函数,其中参数为readfile 和savefile返回的结果。先看readfile函数:
local function readfile(input)
if type(input) == "function" then return input end
if input == "-" then input = nil end
return check(loadfile(input))
end
readfile函数中,参数是function类型时,直接返回本身,lua中“-”和 nil都可以表示空,否则返回库函数 loadfile的结果。因此将源码和字节码统一处理的关键在于loadfile函数。loadfile库函数的定义在lib_base.c中,代码如下:
LJLIB_CF(loadfile)
{
GCstr *fname = lj_lib_optstr(L, 1);
GCstr *mode = lj_lib_optstr(L, 2);
int status;
lua_settop(L, 3); /* Ensure env arg exists. */
status = luaL_loadfilex(L, fname ? strdata(fname) : NULL,
mode ? strdata(mode) : NULL);
return load_aux(L, status, 3);
}
Loadfile函数调动了luaL_loadfilex函数,该函数定义如下:
LUALIB_API int luaL_loadfilex(lua_State *L, const char *filename,
const char *mode)
{
FileReaderCtx ctx;
int status;
const char *chunkname;
if (filename) {
ctx.fp = fopen(filename, "rb");
if (ctx.fp == NULL) {
lua_pushfstring(L, "cannot open %s: %s", filename, strerror(errno));
return LUA_ERRFILE;
}
chunkname = lua_pushfstring(L, "@%s", filename);
} else {
ctx.fp = stdin;
chunkname = "=stdin";
}
status = lua_loadx(L, reader_file, &ctx, chunkname, mode);
if (ferror(ctx.fp)) {
L->top -= filename ? 2 : 1;
lua_pushfstring(L, "cannot read %s: %s", chunkname+1, strerror(errno));
if (filename)
fclose(ctx.fp);
return LUA_ERRFILE;
}
if (filename) {
L->top--;
copyTV(L, L->top-1, L->top);
fclose(ctx.fp);
}
return status;
}
luaL_loadfilex根据传进来的filename读取文件后,实际调用lua_loadx函数,lua_loadx函数定义如下:
LUA_API int lua_loadx(lua_State *L, lua_Reader reader, void *data,
const char *chunkname, const char *mode)
{
LexState ls;
int status;
ls.rfunc = reader;
ls.rdata = data;
ls.chunkarg = chunkname ? chunkname : "?";
ls.mode = mode;
lj_buf_init(L, &ls.sb);
status = lj_vm_cpcall(L, NULL, &ls, cpparser);
lj_lex_cleanup(L, &ls);
lj_gc_check(L);
return status;
}
lua_loadx实际是执行了cpparser函数,该函数判断读入的函数是lua脚本时,执行lj_parse函数进行词法转换。是luajit字节码文件时,执行lj_bcread读取字节码函数信息,因此最终返回的都是编译后的函数。cpparser的进一步跟踪参考字节码自定义的内容:
luajit自定义修改
下面是savefile函数的代码:
local function savefile(name, mode)
if name == "-" then return io.stdout end
return check(io.open(name, mode))
end
当指定的参数为 “-”时,也就是 -bl后没有参数时,返回 stdout,即最后反汇编是在控制台输出,否则返回一个打开的文件标识符。
bc.lua中定义了 dump = bcdump,bcdump函数代码如下:
-- Dump bytecode instructions of a function.
local function bcdump(func, out, all)
if not out then out = stdout end
local fi = funcinfo(func)
if all and fi.children then
for n=-1,-1000000000,-1 do
local k = funck(func, n)
if not k then break end
if type(k) == "proto" then bcdump(k, out, true) end
end
end
out:write(format("-- BYTECODE -- %s-%d\n", fi.loc, fi.lastlinedefined))
local target = bctargets(func)
for pc=1,1000000000 do
local s = bcline(func, pc, target[pc] and "=>")
if not s then break end
out:write(s)
end
out:write("\n")
out:flush()
end
从备注可以看出就是输出字节码指令,这是一个递归函数,先根据传入的第一个参数,使用funcinfo函数获取函数信息,funcinfo是一个库函数,位于jit.util. funcinfo,如果它有孩子,则递归调用全部为原型类型的孩子。它通过循环获取所有的指令,并调用bcline函数输出每一行的指令,bcline函数代码如下:
-- Return one bytecode line.
local function bcline(func, pc, prefix)
local ins, m = funcbc(func, pc)
if not ins then return end
local ma, mb, mc = band(m, 7), band(m, 15*8), band(m, 15*128)
local a = band(shr(ins, 8), 0xff)
local oidx = 6*band(ins, 0xff)
local op = sub(bcnames, oidx+1, oidx+6)
local s = format("%04d %s %-6s %3s ",
pc, prefix or " ", op, ma == 0 and "" or a)
local d = shr(ins, 16)
if mc == 13*128 then -- BCMjump
return format("%s=> %04d\n", s, pc+d-0x7fff)
end
if mb ~= 0 then
d = band(d, 0xff)
elseif mc == 0 then
return s.."\n"
end
local kc
if mc == 10*128 then -- BCMstr
kc = funck(func, -d-1)
kc = format(#kc > 40 and '"%.40s"~' or '"%s"', gsub(kc, "%c", ctlsub))
elseif mc == 9*128 then -- BCMnum
kc = funck(func, d)
if op == "TSETM " then kc = kc - 2^52 end
elseif mc == 12*128 then -- BCMfunc
local fi = funcinfo(funck(func, -d-1))
if fi.ffid then
kc = vmdef.ffnames[fi.ffid]
else
kc = fi.loc
end
elseif mc == 5*128 then -- BCMuv
kc = funcuvname(func, d)
end
if ma == 5 then -- BCMuv
local ka = funcuvname(func, a)
if kc then kc = ka.." ; "..kc else kc = ka end
end
if mb ~= 0 then
local b = shr(ins, 24)
if kc then return format("%s%3d %3d ; %s\n", s, b, d, kc) end
return format("%s%3d %3d\n", s, b, d)
end
if kc then return format("%s%3d ; %s\n", s, d, kc) end
if mc == 7*128 and d > 32767 then d = d - 65536 end -- BCMlits
return format("%s%3d\n", s, d)
end
从函数注释即可以看出是输出字节码指令的一行,它通过bcnames得到opcode的具体值,通过vmdef.ffnames得到调用的库函数的函数名bcnames和ffnames都在vmdef.lua中定义,同时该文件由buildvm.c自动生成,vmdef.lua中完整定义了各个指令下标的opcode名和各个库函数的符号名
总结:当执行luajit -bl命令时,实际上是调用的 jit/bcsave.lua文件,通过loadfile库函数加载文件,经过luajit处理后,得到原型数据。接着调用bc.lua中的bcdump函数,递归解析各个原型,通过bcline函数,解析各个指令,bcline函数中用到了vmdef.lua中定义的opcode和各个库函数的符号名,这些符号名由buildvm.c在编译过程中自动生成。因此vmdef.lua是在反汇编时用于opcode下标到opcode符号名的转换。