Hermes源码分析(二)——解析字节码

前面一节讲到字节码序列化为二进制是有固定的格式的,这里我们分析一下源码里面是怎么处理的

一、写入字节码

  1. 写入头部
    先看BytecodeSerializer的serialize方法,这里初始化了一个BytecodeFileHeader对象,并通过writeBinary方法将其写入文件
void BytecodeSerializer::serialize(BytecodeModule &BM, const SHA1 &sourceHash) {
  bytecodeModule_ = &BM;
  uint32_t cjsModuleCount = BM.getBytecodeOptions().cjsModulesStaticallyResolved
      ? BM.getCJSModuleTableStatic().size()
      : BM.getCJSModuleTable().size();
  BytecodeFileHeader header{MAGIC,
                            BYTECODE_VERSION,
                            sourceHash,
                            fileLength_,
                            BM.getGlobalFunctionIndex(),
                            BM.getNumFunctions(),
                            static_cast(BM.getStringKinds().size()),
                            BM.getIdentifierCount(),
                            BM.getStringTableSize(),
                            overflowStringEntryCount_,
                            BM.getStringStorageSize(),
                            static_cast(BM.getRegExpTable().size()),
                            static_cast(BM.getRegExpStorage().size()),
                            BM.getArrayBufferSize(),
                            BM.getObjectKeyBufferSize(),
                            BM.getObjectValueBufferSize(),
                            BM.getCJSModuleOffset(),
                            cjsModuleCount,
                            debugInfoOffset_,
                            BM.getBytecodeOptions()};
  writeBinary(header);
  // Sizes of file and function headers are tuned for good cache line packing.
  // If you reorder the format, try to avoid headers crossing cache lines.
  visitBytecodeSegmentsInOrder(*this);
  serializeFunctionsBytecode(BM);

  for (auto &entry : BM.getFunctionTable()) {
    serializeFunctionInfo(*entry);
  }

  serializeDebugInfo(BM);

  if (isLayout_) {
    finishLayout(BM);
    serialize(BM, sourceHash);
  }
}

这里可以看到首先写入的是魔数,他的值为

const static uint64_t MAGIC = 0x1F1903C103BC1FC6;

对应的二进制见下图,注意是小端字节序


magic

第二项是字节码的版本,笔者的版本是74,也即 上图中的4a00 0000
第三项是源码的hash,这里采用的是SHA1算法,生成的哈希值是160位,因此占用了20个字节


source hash

第四项是文件长度,这个字段是32位的,也就是下图中的为0aa030,转换成十进制就是696368,实际文件大小也是这么多


file length

文件大小

后面的字段类似,就不一一分析了,头部所有字段的类型都可以在BytecodeFileHeader.h 中看到,Hermes按照既定的内存布局把字段写入后再序列化,就得到了我们看到的字节码文件。

struct BytecodeFileHeader {
  uint64_t magic;
  uint32_t version;
  uint8_t sourceHash[SHA1_NUM_BYTES];
  uint32_t fileLength;
  uint32_t globalCodeIndex;
  uint32_t functionCount;
  uint32_t stringKindCount; // Number of string kind entries.
  uint32_t identifierCount; // Number of strings which are identifiers.
  uint32_t stringCount; // Number of strings in the string table.
  uint32_t overflowStringCount; // Number of strings in the overflow table.
  uint32_t stringStorageSize; // Bytes in the blob of string contents.
  uint32_t regExpCount;
  uint32_t regExpStorageSize;
  uint32_t arrayBufferSize;
  uint32_t objKeyBufferSize;
  uint32_t objValueBufferSize;
  uint32_t cjsModuleOffset; // The starting module ID in this segment.
  uint32_t cjsModuleCount; // Number of modules.
  uint32_t debugInfoOffset;
  BytecodeOptions options;
  1. 按序写入其他段
    还是看BytecodeSerializer的serialize方法, 在writeBinary之后调用了visitBytecodeSegmentsInOrder,这个方法是通过visitor模式去写入其他段,这是一个模版方法,里面调用了visitor的对应方法。 visitor在BytecodeSerializer和BytecodeFileFields里面都有各自的实现,我们这里只关注BytecodeSerializer的。
template 
void visitBytecodeSegmentsInOrder(Visitor &visitor) {
  visitor.visitFunctionHeaders();
  visitor.visitStringKinds();
  visitor.visitIdentifierTranslations();
  visitor.visitSmallStringTable();
  visitor.visitOverflowStringTable();
  visitor.visitStringStorage();
  visitor.visitArrayBuffer();
  visitor.visitObjectKeyBuffer();
  visitor.visitObjectValueBuffer();
  visitor.visitRegExpTable();
  visitor.visitRegExpStorage();
  visitor.visitCJSModuleTable();
}

这里写入的数据很多,以函数头的写入为例,我们调用了visitFunctionHeader方法,并通过byteCodeModule拿到函数的签名,将其写入函数表(存疑,在实际的文件中并没有看到这一部分)。注意这些数据必须按顺序写入,因为读出的时候也是按对应顺序来的。

void BytecodeSerializer::visitFunctionHeaders() {
  pad(BYTECODE_ALIGNMENT);
  serializeFunctionTable(*bytecodeModule_);
}

void BytecodeSerializer::serializeFunctionTable(BytecodeModule &BM) {
  for (auto &entry : BM.getFunctionTable()) {
    if (options_.stripDebugInfoSection) {
      // Change flag on the actual BF, so it's seen by serializeFunctionInfo.
      entry->mutableFlags().hasDebugInfo = false;
    }
    FunctionHeader header = entry->getHeader();
    writeBinary(SmallFuncHeader(header));
  }
}

二、 读取字节码

我们知道react-native 在加载字节码的时候需要调用hermes的prepareJavaScript方法, 那这个方法做了些什么事呢?

std::shared_ptr
HermesRuntimeImpl::prepareJavaScript(
    const std::shared_ptr &jsiBuffer,
    std::string sourceURL) {
  std::pair, std::string> bcErr{};
  auto buffer = std::make_unique(std::move(jsiBuffer));
  vm::RuntimeModuleFlags runtimeFlags{};
  runtimeFlags.persistent = true;

  bool isBytecode = isHermesBytecode(buffer->data(), buffer->size());
#ifdef HERMESVM_PLATFORM_LOGGING
  hermesLog(
      "HermesVM", "Prepare JS on %s.", isBytecode ? "bytecode" : "source");
#endif

  // Construct the BC provider either from buffer or source.
  if (isBytecode) {
    bcErr = hbc::BCProviderFromBuffer::createBCProviderFromBuffer(
        std::move(buffer));
  } else {
    compileFlags_.lazy =
        (buffer->size() >=
         ::hermes::hbc::kDefaultSizeThresholdForLazyCompilation);
#if defined(HERMESVM_LEAN)
    bcErr.second = "prepareJavaScript source compilation not supported";
#else
    bcErr = hbc::BCProviderFromSrc::createBCProviderFromSrc(
        std::move(buffer), sourceURL, compileFlags_);
#endif
  }
  if (!bcErr.first) {
    LOG_EXCEPTION_CAUSE(
        "Compiling JS failed: %s", bcErr.second.c_str());
    throw jsi::JSINativeException(
        "Compiling JS failed: \n" + std::move(bcErr.second));
  }
  return std::make_shared(
      std::move(bcErr.first), runtimeFlags, std::move(sourceURL));
}

这里做了两件事情:
1. 判断是否是字节码,如果是则调用createBCProviderFromBuffer,否则调用createBCProviderFromSrc,我们这里只关注createBCProviderFromBuffer
2.通过BCProviderFromBuffer的构造方法得到文件头和函数头的信息(populateFromBuffer方法),下面是这个方法的实现。

BCProviderFromBuffer::BCProviderFromBuffer(
   std::unique_ptr buffer,
   BytecodeForm form)
   : buffer_(std::move(buffer)),
     bufferPtr_(buffer_->data()),
     end_(bufferPtr_ + buffer_->size()) {
 ConstBytecodeFileFields fields;
 if (!fields.populateFromBuffer(
         {bufferPtr_, buffer_->size()}, &errstr_, form)) {
   return;
 }
 const auto *fileHeader = fields.header;
 options_ = fileHeader->options;
 functionCount_ = fileHeader->functionCount;
 globalFunctionIndex_ = fileHeader->globalCodeIndex;
 debugInfoOffset_ = fileHeader->debugInfoOffset;
 functionHeaders_ = fields.functionHeaders.data();
 stringKinds_ = fields.stringKinds;
 identifierTranslations_ = fields.identifierTranslations;
 stringCount_ = fileHeader->stringCount;
 stringTableEntries_ = fields.stringTableEntries.data();
 overflowStringTableEntries_ = fields.stringTableOverflowEntries;
 stringStorage_ = fields.stringStorage;
 arrayBuffer_ = fields.arrayBuffer;
 objKeyBuffer_ = fields.objKeyBuffer;
 objValueBuffer_ = fields.objValueBuffer;
 regExpTable_ = fields.regExpTable;
 regExpStorage_ = fields.regExpStorage;
 cjsModuleOffset_ = fileHeader->cjsModuleOffset;
 cjsModuleTable_ = fields.cjsModuleTable;
 cjsModuleTableStatic_ = fields.cjsModuleTableStatic;
}

BytecodeFileFields的populateFromBuffer方法也是一个模版方法,注意这里调用populateFromBuffer方法的是一个 ConstBytecodeFileFields对象,他代表的是不可变的字节码字段。

template 
bool BytecodeFileFields::populateFromBuffer(
    Array buffer,
    std::string *outError,
    BytecodeForm form) {
  if (!sanityCheck(buffer, form, outError)) {
    return false;
  }

  // Helper type which populates a BytecodeFileFields. This is nested inside the
  // function so we can leverage BytecodeFileFields template types.
  struct BytecodeFileFieldsPopulator {
    /// The fields being populated.
    BytecodeFileFields &f;

    /// Current buffer position.
    Pointer buf;

    /// A pointer to the bytecode file header.
    const BytecodeFileHeader *h;

    /// End of buffer.
    const uint8_t *end;

    BytecodeFileFieldsPopulator(
        BytecodeFileFields &fields,
        Pointer buffer,
        const uint8_t *bufEnd)
        : f(fields), buf(buffer), end(bufEnd) {
      f.header = castData(buf);
      h = f.header;
    }

    void visitFunctionHeaders() {
      align(buf);
      f.functionHeaders =
          castArrayRef(buf, h->functionCount, end);
    }
.....
}

细心的读者会发现这里也有visitFunctionHeaders方法, 这里主要为了复用visitBytecodeSegmentsInOrder的逻辑,把populator当作一个visitor来按顺序读取buffer的内容,并提前加载到BytecodeFileFields里面,以减少后面执行字节码时解析的时间。

Hermes引擎在读取了字节码之后会通过解析BytecodeFileHeader这个结构体中的字段来获取一些关键信息,例如bundle是否是字节码格式,是否包含了函数,字节码的版本是否匹配等。注意这里我们只是解析了头部,没有解析整个字节码,后面执行字节码时才会解析剩余的部分。

三、执行字节码

evaluatePreparedJavaScript这个方法,主要是调用了HermesRuntime的 runBytecode方法,这里hermesPrep时上一步解析头部时获取的BCProviderFromBuffer实例。

jsi::Value HermesRuntimeImpl::evaluatePreparedJavaScript(
    const std::shared_ptr &js) {
  return maybeRethrow([&] {
    assert(
        dynamic_cast(js.get()) &&
        "js must be an instance of HermesPreparedJavaScript");
    auto &stats = runtime_.getRuntimeStats();
    const vm::instrumentation::RAIITimer timer{
        "Evaluate JS", stats, stats.evaluateJS};
    const auto *hermesPrep =
        static_cast(js.get());
    vm::GCScope gcScope(&runtime_);
    auto res = runtime_.runBytecode(
        hermesPrep->bytecodeProvider(),
        hermesPrep->runtimeFlags(),
        hermesPrep->sourceURL(),
        vm::Runtime::makeNullHandle());
    checkStatus(res.getStatus());
    return valueFromHermesValue(*res);
  });
}

runBytecode这个方法比较长,主要做了几件事情:

  1. 获取globalFunctionIndex
auto globalFunctionIndex = bytecode->getGlobalFunctionIndex();
  1. 创建全局的作用域、用于垃圾回收的Domain和相应的运行时模块。并通过globalFunctionIndex获取到全局入口的代码
GCScope scope(this);

  Handle domain = makeHandle(Domain::create(this));

  auto runtimeModuleRes = RuntimeModule::create(
      this, domain, nextScriptId_++, std::move(bytecode), flags, sourceURL);
  if (LLVM_UNLIKELY(runtimeModuleRes == ExecutionStatus::EXCEPTION)) {
    return ExecutionStatus::EXCEPTION;
  }
  auto runtimeModule = *runtimeModuleRes;
  auto globalCode = runtimeModule->getCodeBlockMayAllocate(globalFunctionIndex);

这里说明一下,Domain是用于垃圾回收的运行时模块的代理, Domain被创建时是空的,并跟随着运行时模块进行传播, 在运行时模块的整个生命周期内都一直存在。在某个Domain下创建的所有函数都会保持着对这个Domain的强引用。当Domain被回收的时候,这个Domain下的所有函数都不能使用。

  1. 调用runRequireCall来执行全局的入口函数
if (runtimeModule->hasCJSModules()) {
    auto requireContext = RequireContext::create(
        this, domain, getPredefinedStringHandle(Predefined::dotSlash));
    return runRequireCall(
        this, requireContext, domain, *domain->getCJSModuleOffset(this, 0));
  } else if (runtimeModule->hasCJSModulesStatic()) {
    return runRequireCall(
        this,
        makeNullHandle(),
        domain,
        *domain->getCJSModuleOffset(this, 0));
  } else {
    // Create a JSFunction which will reference count the runtime module.
    // Note that its handle gets registered in the scope, so we don't need to
    // save it. Also note that environment will often be null here, except if
    // this is local eval.
    auto func = JSFunction::create(
        this,
        domain,
        Handle::vmcast(&functionPrototype),
        environment,
        globalCode);

    ScopedNativeCallFrame newFrame{this,
                                   0,
                                   func.getHermesValue(),
                                   HermesValue::encodeUndefinedValue(),
                                   *thisArg};
    if (LLVM_UNLIKELY(newFrame.overflowed()))
      return raiseStackOverflow(StackOverflowKind::NativeStack);
    return shouldRandomizeMemoryLayout_
        ? interpretFunctionWithRandomStack(this, globalCode)
        : interpretFunction(globalCode);
  }

未完待续。。。

你可能感兴趣的:(Hermes源码分析(二)——解析字节码)