前面一节讲到字节码序列化为二进制是有固定的格式的,这里我们分析一下源码里面是怎么处理的
一、写入字节码
- 写入头部
先看BytecodeSerializer的serialize方法,这里初始化了一个BytecodeFileHeader对象,并通过writeBinary方法将其写入文件
void BytecodeSerializer::serialize(BytecodeModule &BM, const SHA1 &sourceHash) {
bytecodeModule_ = &BM;
uint32_t cjsModuleCount = BM.getBytecodeOptions().cjsModulesStaticallyResolved
? BM.getCJSModuleTableStatic().size()
: BM.getCJSModuleTable().size();
BytecodeFileHeader header{MAGIC,
BYTECODE_VERSION,
sourceHash,
fileLength_,
BM.getGlobalFunctionIndex(),
BM.getNumFunctions(),
static_cast(BM.getStringKinds().size()),
BM.getIdentifierCount(),
BM.getStringTableSize(),
overflowStringEntryCount_,
BM.getStringStorageSize(),
static_cast(BM.getRegExpTable().size()),
static_cast(BM.getRegExpStorage().size()),
BM.getArrayBufferSize(),
BM.getObjectKeyBufferSize(),
BM.getObjectValueBufferSize(),
BM.getCJSModuleOffset(),
cjsModuleCount,
debugInfoOffset_,
BM.getBytecodeOptions()};
writeBinary(header);
// Sizes of file and function headers are tuned for good cache line packing.
// If you reorder the format, try to avoid headers crossing cache lines.
visitBytecodeSegmentsInOrder(*this);
serializeFunctionsBytecode(BM);
for (auto &entry : BM.getFunctionTable()) {
serializeFunctionInfo(*entry);
}
serializeDebugInfo(BM);
if (isLayout_) {
finishLayout(BM);
serialize(BM, sourceHash);
}
}
这里可以看到首先写入的是魔数,他的值为
const static uint64_t MAGIC = 0x1F1903C103BC1FC6;
对应的二进制见下图,注意是小端字节序
第二项是字节码的版本,笔者的版本是74,也即 上图中的4a00 0000
第三项是源码的hash,这里采用的是SHA1算法,生成的哈希值是160位,因此占用了20个字节
第四项是文件长度,这个字段是32位的,也就是下图中的为0aa030,转换成十进制就是696368,实际文件大小也是这么多
后面的字段类似,就不一一分析了,头部所有字段的类型都可以在BytecodeFileHeader.h 中看到,Hermes按照既定的内存布局把字段写入后再序列化,就得到了我们看到的字节码文件。
struct BytecodeFileHeader {
uint64_t magic;
uint32_t version;
uint8_t sourceHash[SHA1_NUM_BYTES];
uint32_t fileLength;
uint32_t globalCodeIndex;
uint32_t functionCount;
uint32_t stringKindCount; // Number of string kind entries.
uint32_t identifierCount; // Number of strings which are identifiers.
uint32_t stringCount; // Number of strings in the string table.
uint32_t overflowStringCount; // Number of strings in the overflow table.
uint32_t stringStorageSize; // Bytes in the blob of string contents.
uint32_t regExpCount;
uint32_t regExpStorageSize;
uint32_t arrayBufferSize;
uint32_t objKeyBufferSize;
uint32_t objValueBufferSize;
uint32_t cjsModuleOffset; // The starting module ID in this segment.
uint32_t cjsModuleCount; // Number of modules.
uint32_t debugInfoOffset;
BytecodeOptions options;
- 按序写入其他段
还是看BytecodeSerializer的serialize方法, 在writeBinary之后调用了visitBytecodeSegmentsInOrder,这个方法是通过visitor模式去写入其他段,这是一个模版方法,里面调用了visitor的对应方法。 visitor在BytecodeSerializer和BytecodeFileFields里面都有各自的实现,我们这里只关注BytecodeSerializer的。
template
void visitBytecodeSegmentsInOrder(Visitor &visitor) {
visitor.visitFunctionHeaders();
visitor.visitStringKinds();
visitor.visitIdentifierTranslations();
visitor.visitSmallStringTable();
visitor.visitOverflowStringTable();
visitor.visitStringStorage();
visitor.visitArrayBuffer();
visitor.visitObjectKeyBuffer();
visitor.visitObjectValueBuffer();
visitor.visitRegExpTable();
visitor.visitRegExpStorage();
visitor.visitCJSModuleTable();
}
这里写入的数据很多,以函数头的写入为例,我们调用了visitFunctionHeader方法,并通过byteCodeModule拿到函数的签名,将其写入函数表(存疑,在实际的文件中并没有看到这一部分)。注意这些数据必须按顺序写入,因为读出的时候也是按对应顺序来的。
void BytecodeSerializer::visitFunctionHeaders() {
pad(BYTECODE_ALIGNMENT);
serializeFunctionTable(*bytecodeModule_);
}
void BytecodeSerializer::serializeFunctionTable(BytecodeModule &BM) {
for (auto &entry : BM.getFunctionTable()) {
if (options_.stripDebugInfoSection) {
// Change flag on the actual BF, so it's seen by serializeFunctionInfo.
entry->mutableFlags().hasDebugInfo = false;
}
FunctionHeader header = entry->getHeader();
writeBinary(SmallFuncHeader(header));
}
}
二、 读取字节码
我们知道react-native 在加载字节码的时候需要调用hermes的prepareJavaScript方法, 那这个方法做了些什么事呢?
std::shared_ptr
HermesRuntimeImpl::prepareJavaScript(
const std::shared_ptr &jsiBuffer,
std::string sourceURL) {
std::pair, std::string> bcErr{};
auto buffer = std::make_unique(std::move(jsiBuffer));
vm::RuntimeModuleFlags runtimeFlags{};
runtimeFlags.persistent = true;
bool isBytecode = isHermesBytecode(buffer->data(), buffer->size());
#ifdef HERMESVM_PLATFORM_LOGGING
hermesLog(
"HermesVM", "Prepare JS on %s.", isBytecode ? "bytecode" : "source");
#endif
// Construct the BC provider either from buffer or source.
if (isBytecode) {
bcErr = hbc::BCProviderFromBuffer::createBCProviderFromBuffer(
std::move(buffer));
} else {
compileFlags_.lazy =
(buffer->size() >=
::hermes::hbc::kDefaultSizeThresholdForLazyCompilation);
#if defined(HERMESVM_LEAN)
bcErr.second = "prepareJavaScript source compilation not supported";
#else
bcErr = hbc::BCProviderFromSrc::createBCProviderFromSrc(
std::move(buffer), sourceURL, compileFlags_);
#endif
}
if (!bcErr.first) {
LOG_EXCEPTION_CAUSE(
"Compiling JS failed: %s", bcErr.second.c_str());
throw jsi::JSINativeException(
"Compiling JS failed: \n" + std::move(bcErr.second));
}
return std::make_shared(
std::move(bcErr.first), runtimeFlags, std::move(sourceURL));
}
这里做了两件事情:
1. 判断是否是字节码,如果是则调用createBCProviderFromBuffer,否则调用createBCProviderFromSrc,我们这里只关注createBCProviderFromBuffer
2.通过BCProviderFromBuffer的构造方法得到文件头和函数头的信息(populateFromBuffer方法),下面是这个方法的实现。
BCProviderFromBuffer::BCProviderFromBuffer(
std::unique_ptr buffer,
BytecodeForm form)
: buffer_(std::move(buffer)),
bufferPtr_(buffer_->data()),
end_(bufferPtr_ + buffer_->size()) {
ConstBytecodeFileFields fields;
if (!fields.populateFromBuffer(
{bufferPtr_, buffer_->size()}, &errstr_, form)) {
return;
}
const auto *fileHeader = fields.header;
options_ = fileHeader->options;
functionCount_ = fileHeader->functionCount;
globalFunctionIndex_ = fileHeader->globalCodeIndex;
debugInfoOffset_ = fileHeader->debugInfoOffset;
functionHeaders_ = fields.functionHeaders.data();
stringKinds_ = fields.stringKinds;
identifierTranslations_ = fields.identifierTranslations;
stringCount_ = fileHeader->stringCount;
stringTableEntries_ = fields.stringTableEntries.data();
overflowStringTableEntries_ = fields.stringTableOverflowEntries;
stringStorage_ = fields.stringStorage;
arrayBuffer_ = fields.arrayBuffer;
objKeyBuffer_ = fields.objKeyBuffer;
objValueBuffer_ = fields.objValueBuffer;
regExpTable_ = fields.regExpTable;
regExpStorage_ = fields.regExpStorage;
cjsModuleOffset_ = fileHeader->cjsModuleOffset;
cjsModuleTable_ = fields.cjsModuleTable;
cjsModuleTableStatic_ = fields.cjsModuleTableStatic;
}
BytecodeFileFields的populateFromBuffer方法也是一个模版方法,注意这里调用populateFromBuffer方法的是一个 ConstBytecodeFileFields对象,他代表的是不可变的字节码字段。
template
bool BytecodeFileFields::populateFromBuffer(
Array buffer,
std::string *outError,
BytecodeForm form) {
if (!sanityCheck(buffer, form, outError)) {
return false;
}
// Helper type which populates a BytecodeFileFields. This is nested inside the
// function so we can leverage BytecodeFileFields template types.
struct BytecodeFileFieldsPopulator {
/// The fields being populated.
BytecodeFileFields &f;
/// Current buffer position.
Pointer buf;
/// A pointer to the bytecode file header.
const BytecodeFileHeader *h;
/// End of buffer.
const uint8_t *end;
BytecodeFileFieldsPopulator(
BytecodeFileFields &fields,
Pointer buffer,
const uint8_t *bufEnd)
: f(fields), buf(buffer), end(bufEnd) {
f.header = castData(buf);
h = f.header;
}
void visitFunctionHeaders() {
align(buf);
f.functionHeaders =
castArrayRef(buf, h->functionCount, end);
}
.....
}
细心的读者会发现这里也有visitFunctionHeaders方法, 这里主要为了复用visitBytecodeSegmentsInOrder的逻辑,把populator当作一个visitor来按顺序读取buffer的内容,并提前加载到BytecodeFileFields里面,以减少后面执行字节码时解析的时间。
Hermes引擎在读取了字节码之后会通过解析BytecodeFileHeader这个结构体中的字段来获取一些关键信息,例如bundle是否是字节码格式,是否包含了函数,字节码的版本是否匹配等。注意这里我们只是解析了头部,没有解析整个字节码,后面执行字节码时才会解析剩余的部分。
三、执行字节码
evaluatePreparedJavaScript这个方法,主要是调用了HermesRuntime的 runBytecode方法,这里hermesPrep时上一步解析头部时获取的BCProviderFromBuffer实例。
jsi::Value HermesRuntimeImpl::evaluatePreparedJavaScript(
const std::shared_ptr &js) {
return maybeRethrow([&] {
assert(
dynamic_cast(js.get()) &&
"js must be an instance of HermesPreparedJavaScript");
auto &stats = runtime_.getRuntimeStats();
const vm::instrumentation::RAIITimer timer{
"Evaluate JS", stats, stats.evaluateJS};
const auto *hermesPrep =
static_cast(js.get());
vm::GCScope gcScope(&runtime_);
auto res = runtime_.runBytecode(
hermesPrep->bytecodeProvider(),
hermesPrep->runtimeFlags(),
hermesPrep->sourceURL(),
vm::Runtime::makeNullHandle());
checkStatus(res.getStatus());
return valueFromHermesValue(*res);
});
}
runBytecode这个方法比较长,主要做了几件事情:
- 获取globalFunctionIndex
auto globalFunctionIndex = bytecode->getGlobalFunctionIndex();
- 创建全局的作用域、用于垃圾回收的Domain和相应的运行时模块。并通过globalFunctionIndex获取到全局入口的代码
GCScope scope(this);
Handle domain = makeHandle(Domain::create(this));
auto runtimeModuleRes = RuntimeModule::create(
this, domain, nextScriptId_++, std::move(bytecode), flags, sourceURL);
if (LLVM_UNLIKELY(runtimeModuleRes == ExecutionStatus::EXCEPTION)) {
return ExecutionStatus::EXCEPTION;
}
auto runtimeModule = *runtimeModuleRes;
auto globalCode = runtimeModule->getCodeBlockMayAllocate(globalFunctionIndex);
这里说明一下,Domain是用于垃圾回收的运行时模块的代理, Domain被创建时是空的,并跟随着运行时模块进行传播, 在运行时模块的整个生命周期内都一直存在。在某个Domain下创建的所有函数都会保持着对这个Domain的强引用。当Domain被回收的时候,这个Domain下的所有函数都不能使用。
- 调用runRequireCall来执行全局的入口函数
if (runtimeModule->hasCJSModules()) {
auto requireContext = RequireContext::create(
this, domain, getPredefinedStringHandle(Predefined::dotSlash));
return runRequireCall(
this, requireContext, domain, *domain->getCJSModuleOffset(this, 0));
} else if (runtimeModule->hasCJSModulesStatic()) {
return runRequireCall(
this,
makeNullHandle(),
domain,
*domain->getCJSModuleOffset(this, 0));
} else {
// Create a JSFunction which will reference count the runtime module.
// Note that its handle gets registered in the scope, so we don't need to
// save it. Also note that environment will often be null here, except if
// this is local eval.
auto func = JSFunction::create(
this,
domain,
Handle::vmcast(&functionPrototype),
environment,
globalCode);
ScopedNativeCallFrame newFrame{this,
0,
func.getHermesValue(),
HermesValue::encodeUndefinedValue(),
*thisArg};
if (LLVM_UNLIKELY(newFrame.overflowed()))
return raiseStackOverflow(StackOverflowKind::NativeStack);
return shouldRandomizeMemoryLayout_
? interpretFunctionWithRandomStack(this, globalCode)
: interpretFunction(globalCode);
}
未完待续。。。