[语法分析]SLR(1)分析预测表Action表中移进的构造

    在构造之前当然要先抽取接口,让其它语法分析器能愉快的跟LR分析器相处

/* syntax-analyser.h */
void initialLRStates(void);
void destructLRStates(void);

第一个函数是构造函数,第二个则是析构函数。这些东西跟LR分析器的构造函数是独立,因为分析预测表(也就是LR状态集)是一个静态数据结构,比如我曾经说在状态Sentence_LBraceBasicBlockRBrace_1中,可以委托一个新的LR分析器对内部的基本块进行分析,而不用当前LR分析器再去纠结于大括号嵌套,而无论当前有多少个LR分析器在分析器栈中,它们使用的分析预测表是同一个。所以在语法分析开始之前,要记得调用initialLRStates,而结束之后要记得destructLRStates

LR状态数据结构

    一个LR状态(也就是分析预测表的一行)由两部分构成,一边是Action表,另一边是Goto表。Goto表很简单,跟DFA差不多,只不过DFA接受的是字符,而LR状态接受的是非终结符,表中每一项是一个指针,指向该Goto的LR状态。而Action则稍许复杂,可以为移进,或者是规约,所以每一项不会是一个简单的值,而应该是一个函数入口地址。所以每个LR状态对应的数据结构为

typedef void (*Action)(struct LRAnalyser*, struct Token*);

struct LRState {
	Action actions[SKIP];
	struct LRState* gotoes[/* some value */];
};

Action是一个函数入口地址类型,Action表有SKIP项,SKIPAcceptType枚举中最后一个,除了表示一种可忽略的终结符类型,还可以表示不可忽略的终结符类型数量,所以这里就顺理成章地成为了数组大小。现在为了得到some value的值,也可以采取类似的方法,构造一个非终结符枚举出来

/* const.h */
typedef enum {
	JerryLRSyntaxRoot, BasicBlock = 0, Sentence,
	IfElseBranch, ElseBlock, WhileLoop, Declaration,
	Assignment, Variable, Initialization, VariableRegister,
	NR_GOTO_TABLE_COLUMN
} LRNonTerminal;

JerryLRSyntaxRoot表示LR语法中顶级产生式的左部,它是不用出现在Goto表中的,所以计数从它后面一个BasicBlock开始。但是,some value并不等于NR_GOTO_TABLE_COLUMN,而应该是

struct LRState {
	Action actions[SKIP];
	struct LRState* gotoes[NR_GOTO_TABLE_COLUMN + 1];
};

多出来的一个项目在实现LR分析器的consumeNonTerminal时再解释。

    除此之外,还需要一个枚举,告诉我们状态一共有多少个,一边确定分析预测表的大小

/* const.h */
typedef enum {
    LRState0 = 0, LRState1,
    BasicBlock_SentenceBasicBlock_1, BasicBlock_SentenceBasicBlock_2,
    Sentence_EoS_1,

    Sentence_IOAssignmentEoS_1, Sentence_IOAssignmentEoS_2,
    Sentence_IOAssignmentEoS_3,

    Sentence_BreakEoS_1, Sentence_BreakEoS_2,

    Sentence_AssignmentEoS_1, Sentence_AssignmentEoS_2,

    IfElseBranch_1, IfElseBranch_2, IfElseBranch_3, IfElseBranch_4,
    IfElseBranch_5, IfElseBranch_6, ElseBlock_1, ElseBlock_2,

    WhileLoop_1, WhileLoop_2, WhileLoop_3, WhileLoop_4, WhileLoop_5,

    Sentence_LBraceBasicBlockRBrace_1, Sentence_LBraceBasicBlockRBrace_2,
    Sentence_LBraceBasicBlockRBrace_3,

    Declaration_1, VariableRegister_VariableInitialization_1,
    VariableRegister_VariableInitialization_2,
    Initialization_AssignAssignment_1, Initialization_AssignAssignment_2,
    DeclarationVariableRegister, Declaration_3,

    NR_LR_STATE
} LRStateFamily;

这样在构造分析表的函数开头就应该是

struct LRState* jerryLRStates = NULL; // LR状态数组指针

void initialLRStates(void)
{
    jerryLRStates = (struct LRState*)
                            allocate(NR_LR_STATE * sizeof(struct LRState));
    memset(jerryLRStates, 0, NR_LR_STATE * sizeof(struct LRState));


    ...
}

移进

    每个移进函数应该是形如这样的:

 

    void some_shift(struct LRAnalyser* self, struct Token* token)

    {

      // 将 转移目标状态 压入 self 的状态栈

      // 将 token 压入 self 的符号栈

    }

 

现在有个小问题,需要见那个token压入符号栈吗?实际操作中会发现,大部分的token都是“无意义”的。这里的“无意义”是指,在以后规约时,它们对于规约后的非终结符的内容没有影响,如在规约

WhileLoop -> <WHILE> <LPARENT> Assignment <RPARENT> BasicBlock

时,规约函数没有必要去检查第一个符号是不是<WHILE>以及第二个符号是不是<LPARENT>——如果不是,分析器何以走到这个状态来呢?所以大部分情况下没有必要将token压入栈中。

    不过,有一些token还是有意义的,比如规约

    Sentence -> <IO> Assignment <EOS>

时,第一个符号也许是<READ>,也许是<WRITE>,这种特殊情况则采取特殊的方案:回头去看看IONode的构造函数,它只需要一个AcceptType作为参数,而无需后继的Assignment,也就是说当读入一个<IO>时,可以直接利用它来构造一个IONode,并压入符号栈;最后在状态Sentence_IOAssignmentEoS_3进行规约时,再将符号栈栈顶的Assignment弹出,追加到次栈顶IONode中(注:<EOS>是无意义符号,不入栈,因此栈顶是Assignment)。除此之外,还有一些有意义符号也可以这样办,比如产生式

    Declaration -> <TYPE> VariableRegister <EOS>

中的主角DeclarationNode,它在构造时也只需要传入一个类型参数,并且在构造实现中,这个类型参数还被转化过:

    node->dectype = type - INTEGER_TYPE + INTEGER;

从声明类型变为数据类型。

    那么,移进函数的实现分为这样两种形式:

 

    void normal_shift(struct LRAnalyser* self, struct Token* token)

    {

      self->stateStack->push(self->stateStack, /* 目标状态指针 */ );

    }

 

    void declare_shift(struct LRAnalyser* self, struct Token* token)

    {

      self->stateStack->push(self->stateStack, jerryLRStates + Declaration_1);

      self->symbolStack->push(self->stateStack, newDeclarationNode(token->type));

    }

 

    void io_shift(struct LRAnalyser* self, struct Token* token)

    {

      self->stateStack->push(self->stateStack, jerryLRStates + Sentence_IOAssignmentEoS_1);

      self->symbolStack->push(self->stateStack, newIONode(token->type));

    }

 

不过,如果手动去写每一个表项目对应的移进函数,那会很痛苦。幸而,C语言提供了捷径,这也是我喜欢C语言的部分原因——宏。可以设计一个简单的宏,然后利用展开这个宏则得到一个移进函数,这样就好多了。这里提供一种实现的参考

#define linkName(f, l) f ## l

#define shift(current, encounter, target)                                 \
    static void linkName(current, encounter)(struct LRAnalyser* self,     \
                                             struct Token* t)             \
    {                                                                     \
        self->stateStack->push(self->stateStack, jerryLRStates + target); \
    } /*******************************************************************/

第一个宏linkName用来产生函数名标识符,后面一个宏才是重要的移进函数模板,它接受3个参数,前两个参数并不在函数中出现,而仅仅传递给linkName来产生函数名,最后一个参数target应该是LR状态枚举中的一个,它告诉该函数应该将什么状态入栈。当然,仅仅有这一个宏是不够的,下面再来介绍一个更为强大的宏

#define shiftFirstSentence(current)                                         \
    shift(current, EOS,    Sentence_EoS_1)                                  \
    shift(current, LBRACE, Sentence_LBraceBasicBlockRBrace_1)               \
    shift(current, IF,     IfElseBranch_1)                                  \
    shift(current, WHILE,  WhileLoop_1)                                     \
    shift(current, BREAK,  Sentence_BreakEoS_1)                             \
    static void linkName(current, INTEGER_TYPE)(struct LRAnalyser* self,    \
                                                 struct Token* t)           \
    {                                                                       \
        self->symbolStack->push(self->symbolStack,                          \
                                newDeclarationNode(t->type));               \
        self->stateStack->push(self->stateStack,                            \
                               jerryLRStates + Declaration_1);              \
    }                                                                       \
                                                                            \
    static void linkName(current, REAL_TYPE)(struct LRAnalyser* self,       \
                                              struct Token* t)              \
    {                                                                       \
        self->symbolStack->push(self->symbolStack,                          \
                                newDeclarationNode(t->type));               \
        self->stateStack->push(self->stateStack,                            \
                               jerryLRStates + Declaration_1);              \
    }                                                                       \
                                                                            \
    static void linkName(current, READ)(struct LRAnalyser* self,            \
                                        struct Token* t)                    \
    {                                                                       \
        self->symbolStack->push(self->symbolStack, newIONode(t->type));     \
        self->stateStack->push(self->stateStack,                            \
                               jerryLRStates + Sentence_IOAssignmentEoS_1); \
    }                                                                       \
                                                                            \
    static void linkName(current, WRITE)(struct LRAnalyser* self,           \
                                         struct Token* t)                   \
    {                                                                       \
        self->symbolStack->push(self->symbolStack, newIONode(t->type));     \
        self->stateStack->push(self->stateStack,                            \
                               jerryLRStates + Sentence_IOAssignmentEoS_1); \
    } /*********************************************************************/

正如它的名字shiftFirstSentence喻示的那样,这个宏展开以后将得到一系列函数,这些函数将一个状态与First(Sentence)集合中的每一项联系起来,适用于这个宏的状态是那些即将接受一个语句的状态,包括LRState0BasicBlock_SentenceBasicBlock_1IfElseBranch_4WhileLoop_4ElseBlock_1

Sentence_LBraceBasicBlockRBrace_1这些状态,但是状态Sentence_LBraceBasicBlockRBrace_1如之前所说,它直接产生一个委托的LR分析器去分析内部基本块,因此用不到这个宏。

    当然,光生成了函数还不够,还得将函数与对应的表项目关联起来,这就涉及到initialLRStates内部的操作。为了省代码,这一过程也可以用宏来辅助

#define setAction(state, encounter)                                    \
    jerryLRStates[state].actions[encounter] = linkName(state, encounter)

#define setActionFirstAssignment(state) \
    setAction(state, PLUS);             \
    setAction(state, MINUS);            \
    setAction(state, NOT);              \
    setAction(state, INTEGER);          \
    setAction(state, REAL);             \
    setAction(state, IDENT);            \
    setAction(state, LPARENT) /*********/

#define setActionFirstSentence(state) \
    setAction(state, EOS);            \
    setAction(state, READ);           \
    setAction(state, WRITE);          \
    setAction(state, LBRACE);         \
    setAction(state, IF);             \
    setAction(state, WHILE);          \
    setAction(state, BREAK);          \
    setAction(state, INTEGER_TYPE);   \
    setAction(state, REAL_TYPE);      \
    setActionFirstAssignment(state) ///

第一个宏参数中的中stateLRStateFamily中的一个,encounter则是AcceptType中的一个。每次展开这个宏就可以得到一个赋值语句,关联表项与函数。而后面两个宏则分别对应于一次关联整个First(Assignment)和First(Sentence)的语句集。注意语句结尾的那些分号。

附:移进相关的代码实现

/*
 * 函数构造
 */
#define shift(current, encounter, target)                                 \
    static void linkName(current, encounter)(struct LRAnalyser* self,     \
                                             struct Token* t)             \
    {                                                                     \
        self->stateStack->push(self->stateStack, jerryLRStates + target); \
        printf( #current " encouters " #encounter "\n");                  \
    } /*******************************************************************/
shift(Sentence_AssignmentEoS_1, EOS, Sentence_AssignmentEoS_2)
shift(Sentence_BreakEoS_1, EOS, Sentence_BreakEoS_2)
shift(Sentence_LBraceBasicBlockRBrace_2, RBRACE,
      Sentence_LBraceBasicBlockRBrace_3)
shift(IfElseBranch_1, LPARENT, IfElseBranch_2)
shift(IfElseBranch_3, RPARENT, IfElseBranch_4)
shift(IfElseBranch_5, ELSE, ElseBlock_1)
shift(WhileLoop_1, LPARENT, WhileLoop_2)
shift(WhileLoop_3, RPARENT, WhileLoop_4)
shift(VariableRegister_VariableInitialization_1, ASSIGN,
      Initialization_AssignAssignment_1)
shift(DeclarationVariableRegister, EOS, Declaration_3)
shift(Sentence_IOAssignmentEoS_2, EOS, Sentence_IOAssignmentEoS_3)
#define shiftFirstSentence(current)                                         \
    shift(current, EOS,    Sentence_EoS_1)                                  \
    shift(current, LBRACE, Sentence_LBraceBasicBlockRBrace_1)               \
    shift(current, IF,     IfElseBranch_1)                                  \
    shift(current, WHILE,  WhileLoop_1)                                     \
    shift(current, BREAK,  Sentence_BreakEoS_1)                             \
    static void linkName(current, INTEGER_TYPE)(struct LRAnalyser* self,    \
                                                 struct Token* t)           \
    {                                                                       \
        self->symbolStack->push(self->symbolStack,                          \
                                newDeclarationNode(t->type));               \
        self->stateStack->push(self->stateStack,                            \
                               jerryLRStates + Declaration_1);              \
    }                                                                       \
                                                                            \
    static void linkName(current, REAL_TYPE)(struct LRAnalyser* self,       \
                                              struct Token* t)              \
    {                                                                       \
        self->symbolStack->push(self->symbolStack,                          \
                                newDeclarationNode(t->type));               \
        self->stateStack->push(self->stateStack,                            \
                               jerryLRStates + Declaration_1);              \
    }                                                                       \
                                                                            \
    static void linkName(current, READ)(struct LRAnalyser* self,            \
                                        struct Token* t)                    \
    {                                                                       \
        self->symbolStack->push(self->symbolStack, newIONode(t->type));     \
        self->stateStack->push(self->stateStack,                            \
                               jerryLRStates + Sentence_IOAssignmentEoS_1); \
    }                                                                       \
                                                                            \
    static void linkName(current, WRITE)(struct LRAnalyser* self,           \
                                         struct Token* t)                   \
    {                                                                       \
        self->symbolStack->push(self->symbolStack, newIONode(t->type));     \
        self->stateStack->push(self->stateStack,                            \
                               jerryLRStates + Sentence_IOAssignmentEoS_1); \
    } /*********************************************************************/
shiftFirstSentence(LRState0)
shiftFirstSentence(BasicBlock_SentenceBasicBlock_1)
shiftFirstSentence(IfElseBranch_4)
shiftFirstSentence(WhileLoop_4)
shiftFirstSentence(ElseBlock_1)
#undef shiftFirstSentence
#undef shift

/*
 * initialState 函数绑定
 */
void initialLRStates(void)
{
    jerryLRStates = (struct LRState*)
                            allocate(NR_LR_STATE * sizeof(struct LRState));
    memset(jerryLRStates, 0, NR_LR_STATE * sizeof(struct LRState));
    #define setAction(state, encounter)                                    \
        jerryLRStates[state].actions[encounter] = linkName(state, encounter)
    setAction(Sentence_AssignmentEoS_1, EOS);
    setAction(Sentence_BreakEoS_1, EOS);
    setAction(Sentence_LBraceBasicBlockRBrace_2, RBRACE);
    setAction(IfElseBranch_1, LPARENT);
    setAction(IfElseBranch_3, RPARENT);
    setAction(IfElseBranch_5, ELSE);
    setAction(WhileLoop_1, LPARENT);
    setAction(WhileLoop_3, RPARENT);
    setAction(VariableRegister_VariableInitialization_1, ASSIGN);
    setAction(DeclarationVariableRegister, EOS);
    setAction(Sentence_IOAssignmentEoS_2, EOS);

    #define setActionFirstAssignment(state) \
        setAction(state, PLUS);             \
        setAction(state, MINUS);            \
        setAction(state, NOT);              \
        setAction(state, INTEGER);          \
        setAction(state, REAL);             \
        setAction(state, IDENT);            \
        setAction(state, LPARENT) /*********/

    #define setActionFirstSentence(state) \
        setAction(state, EOS);            \
        setAction(state, READ);           \
        setAction(state, WRITE);          \
        setAction(state, LBRACE);         \
        setAction(state, IF);             \
        setAction(state, WHILE);          \
        setAction(state, BREAK);          \
        setAction(state, INTEGER_TYPE);   \
        setAction(state, REAL_TYPE);      \
        setActionFirstAssignment(state) ///
    setActionFirstSentence(LRState0);
    setActionFirstSentence(BasicBlock_SentenceBasicBlock_1);
    setActionFirstSentence(IfElseBranch_4);
    setActionFirstSentence(IfElseBranch_5);
    setActionFirstSentence(WhileLoop_4);
    setActionFirstSentence(ElseBlock_1);

    // 后面还有 规约 的部分实现和 Goto 表的填充
}

 

你可能感兴趣的:(数据结构,F#)