《自制编译器》函数体和流程控制语句的编译过程

前言

在自制编译器的Part1和Part2中,只想说明一下编译器的实现的大体逻辑和有趣的、值得探讨地方,不想对代码本身进行分析,所以对于代码的分析流程就放到了这里。

这篇文章主要是分析一下生成输出的语法树节点的过程。

函数体和流程语句被编译成中间代码的过程:

compileFunctionBody() :

  public List compileFunctionBody(DefinedFunction f) {
        stmts = new ArrayList<>();
        scopeStack = new LinkedList<>();
        breakStack = new LinkedList<>();
        continueStack = new LinkedList<>();
        jumpMap = new HashMap<>();
        transformStmt(f.body());
        checkJumpLinks(jumpMap);
        return stmts;
    }

刨去对于函数体作用域(ScopeStack)的压栈过程等等,剩下的逻辑可以用这张图表示:


虽然transformstmt(f.body()),中的f.body()是一个StmtNode 但是这个StmtNode包含了很多内容,所以各种类型的节点还是会被Visitor访问到。

举例

在Part1中说过,因为流程控制语句只能定义在函数体中,所以了解了一个流程控制语句的算法就可以了解函数体的编译过程。通过一个实际的例子 来说明一下编译后的中间节点代码是如何处理流程控制语句的。

void fun(int a, char b) {
        if (a == 5) {
            printf("%d\n",a);
        }else{
            printf("%d\n",b);
        }

        while(b < 100) {
            b ++;
            if (b == 50)
                break;
            else
                continue;
        }
    }


int main(int argc, char ** argv){

}

这个代码在Cflat编译器中会编译为如下的抽象语法树和中间代码语法树,两者可以对比起来看看:

抽象语法树

variables:
functions:
    <> (test.cb:3)
    name: "fun"
    isPrivate: false
    params:
        parameters:
            <> (test.cb:3)
            name: "a"
            typeNode: int
            <> (test.cb:3)
            name: "b"
            typeNode: char
    body:
        <> (test.cb:3)
        variables:
        stmts:
            <> (test.cb:4)
            cond:
                <> (test.cb:4)
                operator: "=="
                left:
                    <> (test.cb:4)
                    name: "a"
                right:
                    <> (test.cb:4)
                    typeNode: int
                    value: 5
            thenBody:
                <> (test.cb:4)
                variables:
                stmts:
                    <> (test.cb:5)
                    expr:
                        <> (test.cb:5)
                        expr:
                            <> (test.cb:5)
                            name: "printf"
                        args:
                            <> (test.cb:5)
                            value: "%d\n"
                            <> (test.cb:5)
                            name: "a"
            elseBody:
                <> (test.cb:6)
                variables:
                stmts:
                    <> (test.cb:7)
                    expr:
                        <> (test.cb:7)
                        expr:
                            <> (test.cb:7)
                            name: "printf"
                        args:
                            <> (test.cb:7)
                            value: "%d\n"
                            <> (test.cb:7)
                            name: "b"
            <> (test.cb:10)
            cond:
                <> (test.cb:10)
                operator: "<"
                left:
                    <> (test.cb:10)
                    name: "b"
                right:
                    <> (test.cb:10)
                    typeNode: int
                    value: 100
            body:
                <> (test.cb:10)
                variables:
                stmts:
                    <> (test.cb:11)
                    expr:
                        <> (test.cb:11)
                        operator: "++"
                        expr:
                            <> (test.cb:11)
                            name: "b"
                    <> (test.cb:12)
                    cond:
                        <> (test.cb:12)
                        operator: "=="
                        left:
                            <> (test.cb:12)
                            name: "b"
                        right:
                            <> (test.cb:12)
                            typeNode: int
                            value: 50
                    thenBody:
                        <> (test.cb:13)
                    elseBody:
                        <> (test.cb:15)
    <> (test.cb:20)
    name: "main"
    isPrivate: false
    params:
        parameters:
            <> (test.cb:20)
            name: "argc"
            typeNode: int
            <> (test.cb:20)
            name: "argv"
            typeNode: char**
    body:
        <> (test.cb:20)
        variables:
        stmts:

中间代码:

variables:
functions:
    <> (test.cb:3)
    name: fun
    isPrivate: false
    type: void(int, char)
    body:
        <> (test.cb:4)
        cond:
            <>
            type: INT32
            op: EQ
            left:
                <>
                type: INT32
                entity: a
            right:
                <>
                type: INT32
                value: 5
        thenLabel: 26f0a63f
        elseLabel: 4361bd48
        <> (null)
        label: 26f0a63f
        <> (test.cb:5)
        expr:
            <>
            type: INT32
            expr:
                <>
                type: INT32
                entity: printf
            args:
                <>
                type: INT32
                entry: net.loveruby.cflat.entity.ConstantEntry@53bd815b
                <>
                type: INT32
                entity: a
        <> (null)
        label: 2401f4c3
        <> (null)
        label: 4361bd48
        <> (test.cb:7)
        expr:
            <>
            type: INT32
            expr:
                <>
                type: INT32
                entity: printf
            args:
                <>
                type: INT32
                entry: net.loveruby.cflat.entity.ConstantEntry@53bd815b
                <>
                type: INT32
                op: S_CAST
                expr:
                    <>
                    type: INT8
                    entity: b
        <> (null)
        label: 2401f4c3
        <> (null)

        //WhileNode 的 begLabel
        label: 7637f22

        <> (test.cb:10)
        cond:
            <>
            type: INT32
            op: S_LT
            left:
                <>
                type: INT8
                entity: b
            right:
                <>
                type: INT32
                value: 100
        thenLabel: 4926097b
        elseLabel: 762efe5d
        <> (null)
        label: 4926097b
        <> (test.cb:11)
        lhs:
            <>
            type: INT32
            entity: b
        rhs:
            <>
            type: INT8
            op: ADD
            left:
                <>
                type: INT8
                entity: b
            right:
                <>
                type: INT32
                value: 1
        <> (test.cb:12)
        cond:
            <>
            type: INT32
            op: EQ
            left:
                <>
                type: INT8
                entity: b
            right:
                <>
                type: INT32
                value: 50
        thenLabel: 5d22bbb7
        elseLabel: 41a4555e
        <> (null)

        // 正确的话就jump break 退出到endLabel
        label: 5d22bbb7
        <> (test.cb:13)
        label: 762efe5d

        <> (null)
        label: 3830f1c0
        <> (null)

        //else 就跳回到begLabel
        label: 41a4555e
        <> (test.cb:15)
        label: 7637f22

        <> (null)
        label: 3830f1c0
        <> (null)
        label: 7637f22

        <> (null)
        //endLabel
        label: 762efe5d
    <> (test.cb:20)
    name: main
    isPrivate: false
    type: int(int, char**)
    body:

明显,为了达成中间代码贴近于最后的汇编代码的目的,很多抽象语法树中的节点,比如IfNode都被处理成为了CJump(有条件跳转), Jump(无条件跳转)这样的中间代码节点,这些节点之间的跳转关系是通过条件表达式cond和通过十六进制字符串构成的label确定的。因为WhileNode和IfNode的逻辑比较相似,而且解析他们的算法已经在Part2中讲过了,所以这里只看一下IfNode的Visitor的处理逻辑:

public Void visit(IfNode node) {
        Label thenLabel = new Label();
        Label elseLabel = new Label();
        Label endLabel = new Label();

        Expr cond = transformExpr(node.cond());
        if (node.elseBody() == null) {
            cjump(node.location(), cond, thenLabel, endLabel);
            label(thenLabel);
            transformStmt(node.thenBody());
        } else {
            cjump(node.location(), cond, thenLabel, elseLabel);
            label(thenLabel);
            transformStmt(node.thenBody());
            jump(endLabel);
            label(elseLabel);
            transformStmt(node.elseBody());
        }
        label(endLabel);
        return null;
    }

首先IfNode会判断是否有elseBody然后进行不同的处理方式。不论是调用cjump()还是label(),最终的目的都是向保存了语句结果的List中存放作为中间代码节点的类。

举cjump()为例:

private void cjump(Location loc, Expr cond, Label thenLabel, Label elseLabel) {
        stmts.add(new CJump(loc, cond, thenLabel, elseLabel));
    }

插入了一个CJump类。

中间代码语法树输出节点的过程

像CJump这样的中间代码树的遍历是由下一步的汇编代码生成器(CodeGenerator,这个类实现了IRVisitor)进行遍历的。我们可以发现在上文中的中间代码跳转Label还通过了16进制字符串来标识跳转到了具体的节点。这个16进制的字符串其实就是节点的hashCode():
CJump#_dump()

protected void _dump(Dumper d) {
        d.printMember("cond", cond);
        d.printMember("thenLabel", thenLabel);
        d.printMember("elseLabel", elseLabel);
    }

Dumper#printMember()

public void printMember(String name, Label memb) {
        printPair(name, Integer.toHexString(memb.hashCode()));
    }

你可能感兴趣的:(《自制编译器》函数体和流程控制语句的编译过程)