前言
在自制编译器的Part1和Part2中,只想说明一下编译器的实现的大体逻辑和有趣的、值得探讨地方,不想对代码本身进行分析,所以对于代码的分析流程就放到了这里。
这篇文章主要是分析一下生成输出的语法树节点的过程。
函数体和流程语句被编译成中间代码的过程:
compileFunctionBody() :
public List compileFunctionBody(DefinedFunction f) {
stmts = new ArrayList<>();
scopeStack = new LinkedList<>();
breakStack = new LinkedList<>();
continueStack = new LinkedList<>();
jumpMap = new HashMap<>();
transformStmt(f.body());
checkJumpLinks(jumpMap);
return stmts;
}
刨去对于函数体作用域(ScopeStack)的压栈过程等等,剩下的逻辑可以用这张图表示:
虽然transformstmt(f.body())
,中的f.body()是一个StmtNode 但是这个StmtNode包含了很多内容,所以各种类型的节点还是会被Visitor访问到。
举例
在Part1中说过,因为流程控制语句只能定义在函数体中,所以了解了一个流程控制语句的算法就可以了解函数体的编译过程。通过一个实际的例子 来说明一下编译后的中间节点代码是如何处理流程控制语句的。
void fun(int a, char b) {
if (a == 5) {
printf("%d\n",a);
}else{
printf("%d\n",b);
}
while(b < 100) {
b ++;
if (b == 50)
break;
else
continue;
}
}
int main(int argc, char ** argv){
}
这个代码在Cflat编译器中会编译为如下的抽象语法树和中间代码语法树,两者可以对比起来看看:
抽象语法树
variables:
functions:
<> (test.cb:3)
name: "fun"
isPrivate: false
params:
parameters:
<> (test.cb:3)
name: "a"
typeNode: int
<> (test.cb:3)
name: "b"
typeNode: char
body:
<> (test.cb:3)
variables:
stmts:
<> (test.cb:4)
cond:
<> (test.cb:4)
operator: "=="
left:
<> (test.cb:4)
name: "a"
right:
<> (test.cb:4)
typeNode: int
value: 5
thenBody:
<> (test.cb:4)
variables:
stmts:
<> (test.cb:5)
expr:
<> (test.cb:5)
expr:
<> (test.cb:5)
name: "printf"
args:
<> (test.cb:5)
value: "%d\n"
<> (test.cb:5)
name: "a"
elseBody:
<> (test.cb:6)
variables:
stmts:
<> (test.cb:7)
expr:
<> (test.cb:7)
expr:
<> (test.cb:7)
name: "printf"
args:
<> (test.cb:7)
value: "%d\n"
<> (test.cb:7)
name: "b"
<> (test.cb:10)
cond:
<> (test.cb:10)
operator: "<"
left:
<> (test.cb:10)
name: "b"
right:
<> (test.cb:10)
typeNode: int
value: 100
body:
<> (test.cb:10)
variables:
stmts:
<> (test.cb:11)
expr:
<> (test.cb:11)
operator: "++"
expr:
<> (test.cb:11)
name: "b"
<> (test.cb:12)
cond:
<> (test.cb:12)
operator: "=="
left:
<> (test.cb:12)
name: "b"
right:
<> (test.cb:12)
typeNode: int
value: 50
thenBody:
<> (test.cb:13)
elseBody:
<> (test.cb:15)
<> (test.cb:20)
name: "main"
isPrivate: false
params:
parameters:
<> (test.cb:20)
name: "argc"
typeNode: int
<> (test.cb:20)
name: "argv"
typeNode: char**
body:
<> (test.cb:20)
variables:
stmts:
中间代码:
variables:
functions:
<> (test.cb:3)
name: fun
isPrivate: false
type: void(int, char)
body:
<> (test.cb:4)
cond:
<>
type: INT32
op: EQ
left:
<>
type: INT32
entity: a
right:
<>
type: INT32
value: 5
thenLabel: 26f0a63f
elseLabel: 4361bd48
<> (null)
label: 26f0a63f
<> (test.cb:5)
expr:
<>
type: INT32
expr:
<>
type: INT32
entity: printf
args:
<>
type: INT32
entry: net.loveruby.cflat.entity.ConstantEntry@53bd815b
<>
type: INT32
entity: a
<> (null)
label: 2401f4c3
<> (null)
label: 4361bd48
<> (test.cb:7)
expr:
<>
type: INT32
expr:
<>
type: INT32
entity: printf
args:
<>
type: INT32
entry: net.loveruby.cflat.entity.ConstantEntry@53bd815b
<>
type: INT32
op: S_CAST
expr:
<>
type: INT8
entity: b
<> (null)
label: 2401f4c3
<> (null)
//WhileNode 的 begLabel
label: 7637f22
<> (test.cb:10)
cond:
<>
type: INT32
op: S_LT
left:
<>
type: INT8
entity: b
right:
<>
type: INT32
value: 100
thenLabel: 4926097b
elseLabel: 762efe5d
<> (null)
label: 4926097b
<> (test.cb:11)
lhs:
<>
type: INT32
entity: b
rhs:
<>
type: INT8
op: ADD
left:
<>
type: INT8
entity: b
right:
<>
type: INT32
value: 1
<> (test.cb:12)
cond:
<>
type: INT32
op: EQ
left:
<>
type: INT8
entity: b
right:
<>
type: INT32
value: 50
thenLabel: 5d22bbb7
elseLabel: 41a4555e
<> (null)
// 正确的话就jump break 退出到endLabel
label: 5d22bbb7
<> (test.cb:13)
label: 762efe5d
<> (null)
label: 3830f1c0
<> (null)
//else 就跳回到begLabel
label: 41a4555e
<> (test.cb:15)
label: 7637f22
<> (null)
label: 3830f1c0
<> (null)
label: 7637f22
<> (null)
//endLabel
label: 762efe5d
<> (test.cb:20)
name: main
isPrivate: false
type: int(int, char**)
body:
明显,为了达成中间代码贴近于最后的汇编代码的目的,很多抽象语法树中的节点,比如IfNode都被处理成为了CJump(有条件跳转), Jump(无条件跳转)这样的中间代码节点,这些节点之间的跳转关系是通过条件表达式cond和通过十六进制字符串构成的label确定的。因为WhileNode和IfNode的逻辑比较相似,而且解析他们的算法已经在Part2中讲过了,所以这里只看一下IfNode的Visitor的处理逻辑:
public Void visit(IfNode node) {
Label thenLabel = new Label();
Label elseLabel = new Label();
Label endLabel = new Label();
Expr cond = transformExpr(node.cond());
if (node.elseBody() == null) {
cjump(node.location(), cond, thenLabel, endLabel);
label(thenLabel);
transformStmt(node.thenBody());
} else {
cjump(node.location(), cond, thenLabel, elseLabel);
label(thenLabel);
transformStmt(node.thenBody());
jump(endLabel);
label(elseLabel);
transformStmt(node.elseBody());
}
label(endLabel);
return null;
}
首先IfNode会判断是否有elseBody然后进行不同的处理方式。不论是调用cjump()还是label(),最终的目的都是向保存了语句结果的List
举cjump()为例:
private void cjump(Location loc, Expr cond, Label thenLabel, Label elseLabel) {
stmts.add(new CJump(loc, cond, thenLabel, elseLabel));
}
插入了一个CJump类。
中间代码语法树输出节点的过程
像CJump这样的中间代码树的遍历是由下一步的汇编代码生成器(CodeGenerator,这个类实现了IRVisitor)进行遍历的。我们可以发现在上文中的中间代码跳转Label还通过了16进制字符串来标识跳转到了具体的节点。这个16进制的字符串其实就是节点的hashCode():
CJump#_dump()
protected void _dump(Dumper d) {
d.printMember("cond", cond);
d.printMember("thenLabel", thenLabel);
d.printMember("elseLabel", elseLabel);
}
Dumper#printMember()
public void printMember(String name, Label memb) {
printPair(name, Integer.toHexString(memb.hashCode()));
}