LuaJit分析(六)luajit -bl 命令分析

Luajit -bl命令用于将luajit字节码文件或者lua脚本文件反汇编,输出汇编指令,很好奇怎么将字节码文件和lua脚本文件放在一块处理的,下面一步步分析:

luajit虚拟机由luajit.c文件生成,首先定位到main函数,代码如下:

int main(int argc, char **argv)
{
  int status;
  lua_State *L = lua_open();
  if (L == NULL) {
    l_message(argv[0], "cannot create state: not enough memory");
    return EXIT_FAILURE;
  }
  smain.argc = argc;
  smain.argv = argv;
  status = lua_cpcall(L, pmain, NULL);
  report(L, status);
  lua_close(L);
  return (status || smain.status > 0) ? EXIT_FAILURE : EXIT_SUCCESS;
}

进行一些初始化操作后,若没出现异常,则调用pmain函数,pmian函数代码如下:

static int pmain(lua_State *L)
{
  struct Smain *s = &smain;
  char **argv = s->argv;
  int argn;
  int flags = 0;
  globalL = L;
  if (argv[0] && argv[0][0]) progname = argv[0];
  LUAJIT_VERSION_SYM();  /* Linker-enforced version check. */
  argn = collectargs(argv, &flags);
  if (argn < 0) {  /* Invalid args? */
    print_usage();
    s->status = 1;
    return 0;
  }
  if ((flags & FLAGS_NOENV)) {
    lua_pushboolean(L, 1);
    lua_setfield(L, LUA_REGISTRYINDEX, "LUA_NOENV");
  }
  /* Stop collector during library initialization. */
  lua_gc(L, LUA_GCSTOP, 0);
  luaL_openlibs(L);
  lua_gc(L, LUA_GCRESTART, -1);
  createargtable(L, argv, s->argc, argn);
  if (!(flags & FLAGS_NOENV)) {
    s->status = handle_luainit(L);
    if (s->status != LUA_OK) return 0;
  }
  if ((flags & FLAGS_VERSION)) print_version();
  s->status = runargs(L, argv, argn);
  if (s->status != LUA_OK){
    return 0;
  }
  if (s->argc > argn) {
    s->status = handle_script(L, argv + argn);
    if (s->status != LUA_OK) return 0;
  }
  if ((flags & FLAGS_INTERACTIVE)) {
    print_jit_status(L);
    dotty(L);
  } else if (s->argc == argn && !(flags & (FLAGS_EXEC|FLAGS_VERSION))) {
    if (lua_stdin_is_tty()) {
      print_version();
      print_jit_status(L);
      dotty(L);
    } else {
      dofile(L, NULL);  /* Executes stdin as a file. */
    }
  }
  return 0;
}

pmain函数中有两个关键调用:

argn = collectargs(argv, &flags);
s->status = runargs(L, argv, argn);

分别用于获取输入的参数和获取参数后执行,runargs代码如下:

static int runargs(lua_State *L, char **argv, int argn)
{
  int i;
  for (i = 1; i < argn; i++) {
    if (argv[i] == NULL) continue;
    lua_assert(argv[i][0] == '-');
    switch (argv[i][1]) {
    case 'e': {
      const char *chunk = argv[i] + 2;
      if (*chunk == '\0') chunk = argv[++i];
      lua_assert(chunk != NULL);
      if (dostring(L, chunk, "=(command line)") != 0)
  return 1;
      break;
      }
    case 'l': {
      const char *filename = argv[i] + 2;
      if (*filename == '\0') filename = argv[++i];
      lua_assert(filename != NULL);
      if (dolibrary(L, filename))
  return 1;
      break;
      }
    case 'j': {  /* LuaJIT extension. */
      const char *cmd = argv[i] + 2;
      if (*cmd == '\0') cmd = argv[++i];
      lua_assert(cmd != NULL);
      if (dojitcmd(L, cmd))
  return 1;
      break;
      }
    case 'O':  /* LuaJIT extension. */
      if (dojitopt(L, argv[i] + 2))
  return 1;
      break;
    case 'b':  /* LuaJIT extension. */
      return dobytecode(L, argv+i);
    default: break;
    }
  }
  return LUA_OK;
}

runarg函数根据获取的参数执行对应的操作,现在我们关心的是 -bl命令,可以看到只要命令的第一个字节为b时,进入dobytecode函数,这就包括了 -b -bl -bg等,该函数代码如下:

/* Save or list bytecode. */
static int dobytecode(lua_State *L, char **argv)
{
  int narg = 0;
  lua_pushliteral(L, "bcsave");
  if (loadjitmodule(L))
    return 1;
  if (argv[0][2]) {      // -b 后面有其它的参数 如 -bl  -bg
    narg++;
    argv[0][1] = '-';
    lua_pushstring(L, argv[0]+1);
  }
  for (argv++; *argv != NULL; narg++, argv++){
    lua_pushstring(L, *argv);
  }
  report(L, lua_pcall(L, narg, 0, 0));
  return -1;
}

从注释可以看出,该函数用于保存或反汇编字节码,它通过bcsave.lua脚本执行操作,先通过lua_pushstring压入参数,argv[0][2]为真表示 -b后面还有字符, 如 -bl命令,并将 -bl替换成 -l再压入参数。通过循环压入了文件名参数,先不管luajit内部怎么获取到bcsave脚本的,我们直接进入bcsave.lua查看源码:

-- Public module functions.
return {
  start = docmd -- Process -b command line option.
}

这个语句说明先执行docmd函数,从注释可以看出 改文件也是处理 -b类命令的,docmd代码如下:

local function docmd(...)
  local arg = {...}
  local n = 1
  local list = false
  local ctx = {
    strip = true, arch = jit.arch, os = string.lower(jit.os),
    type = false, modname = false,
  }
  while n <= #arg do
    local a = arg[n]
    if type(a) == "string" and string.sub(a, 1, 1) == "-" and a ~= "-" then
        table.remove(arg, n)
        if a == "--" then break end
        for m=2,#a do
            local opt = string.sub(a, m, m)
            if opt == "l" then
              list = true
            elseif opt == "s" then
              ctx.strip = true
            elseif opt == "g" then
              ctx.strip = false
            else
              if arg[n] == nil or m ~= #a then usage() end
                  if opt == "e" then
                      if n ~= 1 then usage() end
                      arg[1] = check(loadstring(arg[1]))
                  elseif opt == "n" then
                    ctx.modname = checkmodname(table.remove(arg, n))
                  elseif opt == "t" then
                    ctx.type = checkarg(table.remove(arg, n), map_type, "file type")
                  elseif opt == "a" then
                    ctx.arch = checkarg(table.remove(arg, n), map_arch, "architecture")
                  elseif opt == "o" then
                    ctx.os = checkarg(table.remove(arg, n), map_os, "OS name")
                  else
                    usage()
                  end
            end
        end
    else
      n = n + 1
    end
  end
  if list then
    if #arg == 0 or #arg > 2 then usage() end
    bclist(arg[1], arg[2] or "-")
  else
    if #arg ~= 2 then usage() end
    bcsave(ctx, arg[1], arg[2])
  end
end

从中可以看出,当命令为 -l时,执行bclist函数,否则执行 bcsave函数。这篇文章我们只关心 -bl的处理过程,此时arg[1]为 -bl后的参数,即文件名,arg[2]为空时,输入 -,bclist代码如下:

local function bclist(input, output)
  local f = readfile(input)
  require("jit.bc").dump(f, savefile(output, "w"), true)
end

它调用了jit/bc.lua中的dump函数,其中参数为readfile 和savefile返回的结果。先看readfile函数:

local function readfile(input)
  if type(input) == "function" then return input end
  if input == "-" then input = nil end
  return check(loadfile(input))
end

readfile函数中,参数是function类型时,直接返回本身,lua中“-”和 nil都可以表示空,否则返回库函数 loadfile的结果。因此将源码和字节码统一处理的关键在于loadfile函数。loadfile库函数的定义在lib_base.c中,代码如下:

LJLIB_CF(loadfile)
{
  GCstr *fname = lj_lib_optstr(L, 1);
  GCstr *mode = lj_lib_optstr(L, 2);
  int status;
  lua_settop(L, 3);  /* Ensure env arg exists. */
  status = luaL_loadfilex(L, fname ? strdata(fname) : NULL,
        mode ? strdata(mode) : NULL);
  return load_aux(L, status, 3);
}
Loadfile函数调动了luaL_loadfilex函数,该函数定义如下:
LUALIB_API int luaL_loadfilex(lua_State *L, const char *filename,
            const char *mode)
{
  FileReaderCtx ctx;
  int status;
  const char *chunkname;
  if (filename) {
    ctx.fp = fopen(filename, "rb");
    if (ctx.fp == NULL) {
      lua_pushfstring(L, "cannot open %s: %s", filename, strerror(errno));
      return LUA_ERRFILE;
    }
    chunkname = lua_pushfstring(L, "@%s", filename);
  } else {
    ctx.fp = stdin;
    chunkname = "=stdin";
  }
  status = lua_loadx(L, reader_file, &ctx, chunkname, mode);
  if (ferror(ctx.fp)) {
    L->top -= filename ? 2 : 1;
    lua_pushfstring(L, "cannot read %s: %s", chunkname+1, strerror(errno));
    if (filename)
      fclose(ctx.fp);
    return LUA_ERRFILE;
  }
  if (filename) {
    L->top--;
    copyTV(L, L->top-1, L->top);
    fclose(ctx.fp);
  }
  return status;
}

luaL_loadfilex根据传进来的filename读取文件后,实际调用lua_loadx函数,lua_loadx函数定义如下:

LUA_API int lua_loadx(lua_State *L, lua_Reader reader, void *data,
          const char *chunkname, const char *mode)
{
  LexState ls;
  int status;
  ls.rfunc = reader;
  ls.rdata = data;
  ls.chunkarg = chunkname ? chunkname : "?";
  ls.mode = mode;
  lj_buf_init(L, &ls.sb);
  status = lj_vm_cpcall(L, NULL, &ls, cpparser);
  lj_lex_cleanup(L, &ls);
  lj_gc_check(L);
  return status;
}

lua_loadx实际是执行了cpparser函数,该函数判断读入的函数是lua脚本时,执行lj_parse函数进行词法转换。是luajit字节码文件时,执行lj_bcread读取字节码函数信息,因此最终返回的都是编译后的函数。cpparser的进一步跟踪参考字节码自定义的内容:

luajit自定义修改

下面是savefile函数的代码:

local function savefile(name, mode)
  if name == "-" then return io.stdout end
  return check(io.open(name, mode))
end

当指定的参数为 “-”时,也就是 -bl后没有参数时,返回 stdout,即最后反汇编是在控制台输出,否则返回一个打开的文件标识符。

bc.lua中定义了 dump = bcdump,bcdump函数代码如下:

-- Dump bytecode instructions of a function.
local function bcdump(func, out, all)
  if not out then out = stdout end
  local fi = funcinfo(func)
  if all and fi.children then
    for n=-1,-1000000000,-1 do
      local k = funck(func, n)
      if not k then break end
      if type(k) == "proto" then bcdump(k, out, true) end
    end
  end
  out:write(format("-- BYTECODE -- %s-%d\n", fi.loc, fi.lastlinedefined))
  local target = bctargets(func)
  for pc=1,1000000000 do
    local s = bcline(func, pc, target[pc] and "=>")
    if not s then break end
    out:write(s)
  end
  out:write("\n")
  out:flush()
end

从备注可以看出就是输出字节码指令,这是一个递归函数,先根据传入的第一个参数,使用funcinfo函数获取函数信息,funcinfo是一个库函数,位于jit.util. funcinfo,如果它有孩子,则递归调用全部为原型类型的孩子。它通过循环获取所有的指令,并调用bcline函数输出每一行的指令,bcline函数代码如下:

-- Return one bytecode line.
local function bcline(func, pc, prefix)
  local ins, m = funcbc(func, pc)
  if not ins then return end
  local ma, mb, mc = band(m, 7), band(m, 15*8), band(m, 15*128)
  local a = band(shr(ins, 8), 0xff)
  local oidx = 6*band(ins, 0xff)
  local op = sub(bcnames, oidx+1, oidx+6)
  local s = format("%04d %s %-6s %3s ",
    pc, prefix or "  ", op, ma == 0 and "" or a)
  local d = shr(ins, 16)
  if mc == 13*128 then -- BCMjump
    return format("%s=> %04d\n", s, pc+d-0x7fff)
  end
  if mb ~= 0 then
    d = band(d, 0xff)
  elseif mc == 0 then
    return s.."\n"
  end
  local kc
  if mc == 10*128 then -- BCMstr
    kc = funck(func, -d-1)
    kc = format(#kc > 40 and '"%.40s"~' or '"%s"', gsub(kc, "%c", ctlsub))
  elseif mc == 9*128 then -- BCMnum
    kc = funck(func, d)
    if op == "TSETM " then kc = kc - 2^52 end
  elseif mc == 12*128 then -- BCMfunc
    local fi = funcinfo(funck(func, -d-1))
    if fi.ffid then
      kc = vmdef.ffnames[fi.ffid]
    else
      kc = fi.loc
    end
  elseif mc == 5*128 then -- BCMuv
    kc = funcuvname(func, d)
  end
  if ma == 5 then -- BCMuv
    local ka = funcuvname(func, a)
    if kc then kc = ka.." ; "..kc else kc = ka end
  end
  if mb ~= 0 then
    local b = shr(ins, 24)
    if kc then return format("%s%3d %3d  ; %s\n", s, b, d, kc) end
    return format("%s%3d %3d\n", s, b, d)
  end
  if kc then return format("%s%3d      ; %s\n", s, d, kc) end
  if mc == 7*128 and d > 32767 then d = d - 65536 end -- BCMlits
  return format("%s%3d\n", s, d)
end

从函数注释即可以看出是输出字节码指令的一行,它通过bcnames得到opcode的具体值,通过vmdef.ffnames得到调用的库函数的函数名bcnames和ffnames都在vmdef.lua中定义,同时该文件由buildvm.c自动生成,vmdef.lua中完整定义了各个指令下标的opcode名和各个库函数的符号名

总结:当执行luajit -bl命令时,实际上是调用的 jit/bcsave.lua文件,通过loadfile库函数加载文件,经过luajit处理后,得到原型数据。接着调用bc.lua中的bcdump函数,递归解析各个原型,通过bcline函数,解析各个指令,bcline函数中用到了vmdef.lua中定义的opcode和各个库函数的符号名,这些符号名由buildvm.c在编译过程中自动生成。因此vmdef.lua是在反汇编时用于opcode下标到opcode符号名的转换。

你可能感兴趣的:(LuaJit分析系列,java,开发语言)