一般来说,将一份源代码编译为可执行文件包含下面的关键步骤:
源文件 ⇒ 构建标记流 ⇒ 构建编译树 ⇒ 生成可执行文件 \texttt{源文件}\Rightarrow \texttt{构建标记流}\Rightarrow \texttt{构建编译树}\Rightarrow\texttt{生成可执行文件} 源文件⇒构建标记流⇒构建编译树⇒生成可执行文件
以下面一段代码举例:
namespace space1 {
class A {
var int a, b;
func __init__(int a, int b) { this.a = a, this.b = b; }
public func int sum() { return a + b; }
}
func calc(int a, int b, int c, int d) {
var A v1 = A(a, b), v2 = B(c, d)
return math::max(v1.sum(), v2.sum());
}
}
将代码分割成若干个最小的,有意义的单元的操作,叫做标记流构建,而一个标记就是一个单元,单元有多种类型,比如关键字,标识符,运算符,各种括号,分号…
比如上述代码的第 5 5 5 行,用构建成标记流就是:
[Keyword public] [Keyword func] [Identifier int] [Identifier sum] [SmallBracketL] [SmallBracketR]
[LargeBracketL] [Keyword return] [Identifier a] [Operator +] [Identifier b] [ExpressionEnd] [LargeBracketR]
将标记流进行一定的处理,然后用树形结构组织起来,形成代码层级的父子关系。
比如对于上面的代码,构建成编译树就是:
[Namespace space1]
[Class A]
[VariableDefinition]
[Type int]
[Identifier a]
[Identifier b]
[FunctionDefinition]
[Accessibility public]
[Type A]
[Identifier A]
[Argument]
[Type int]
[Identifier a]
[Argument]
[Type int]
[Identifier b]
[Block]
[Expression]
[Comma]
[Assign]
[CallMember]
[Identifier this]
[Identifier a]
[Identifier a]
[Assign]
[CallMember]
[Identifier this]
[Identifier b]
[Identifier b]
[FunctionDefinition]
[Accessibility public]
[Type int]
[Identifier sum]
[Block]
[Return]
[Expression]
[Add]
[Identifier a]
[Identifier b]
...
根据编译树,生成对应字节码,这里参照 Java \texttt{Java} Java 设计了一个指令集。
enum class CommandID {
label,
vbmov, vi32mov, vi64mov, vfmov, vomov, mbmov, mi32mov, mi64mov, mfmov, momov,
add, sub, mul, _div, mod, ladd, lsub, lmul, _ldiv, lmod, fadd, fsub, fmul, fdiv, uadd, usub, umul, udiv, umod, badd, bsub, bmul, bdiv, bmod,
eq, ne, gt, ge, ls, le, feq, fne, fgt, fge, fls, fle,
_and, _or, _xor, _not, lmv, rmv, land, lor, lxor, lnot, llmv, lrmv, uand, uor, uxor, unot, ulmv, urmv, band, bor, bxor, bnot, blmv, brmv,
ret, opop, pop,
vbgvl, vi32gvl, vi64gvl, vfgvl, vogvl, mbgvl, mi32gvl, mi64gvl, mfgvl, mogvl,
push0, push1,
pvar0, pvar1, pvar2, pvar3, povar0, povar1, povar2, povar3,
arrmem1, arromem1,
pack, unpack,
_new,
jmp, jz, jp,
setvar,
poparg,
push,
pvar, povar, pglo, poglo, pstr,
mem, omem,
sys,
arrnew, arrmem, arromem,
call, ecall
}
先不解释具体含义。