在构造之前当然要先抽取接口,让其它语法分析器能愉快的跟LR分析器相处
/* syntax-analyser.h */ void initialLRStates(void); void destructLRStates(void);
第一个函数是构造函数,第二个则是析构函数。这些东西跟LR分析器的构造函数是独立,因为分析预测表(也就是LR状态集)是一个静态数据结构,比如我曾经说在状态Sentence_LBraceBasicBlockRBrace_1中,可以委托一个新的LR分析器对内部的基本块进行分析,而不用当前LR分析器再去纠结于大括号嵌套,而无论当前有多少个LR分析器在分析器栈中,它们使用的分析预测表是同一个。所以在语法分析开始之前,要记得调用initialLRStates,而结束之后要记得destructLRStates。
一个LR状态(也就是分析预测表的一行)由两部分构成,一边是Action表,另一边是Goto表。Goto表很简单,跟DFA差不多,只不过DFA接受的是字符,而LR状态接受的是非终结符,表中每一项是一个指针,指向该Goto的LR状态。而Action则稍许复杂,可以为移进,或者是规约,所以每一项不会是一个简单的值,而应该是一个函数入口地址。所以每个LR状态对应的数据结构为
typedef void (*Action)(struct LRAnalyser*, struct Token*); struct LRState { Action actions[SKIP]; struct LRState* gotoes[/* some value */]; };
Action是一个函数入口地址类型,Action表有SKIP项,SKIP是AcceptType枚举中最后一个,除了表示一种可忽略的终结符类型,还可以表示不可忽略的终结符类型数量,所以这里就顺理成章地成为了数组大小。现在为了得到some value的值,也可以采取类似的方法,构造一个非终结符枚举出来
/* const.h */
typedef enum {
JerryLRSyntaxRoot, BasicBlock = 0, Sentence,
IfElseBranch, ElseBlock, WhileLoop, Declaration,
Assignment, Variable, Initialization, VariableRegister,
NR_GOTO_TABLE_COLUMN
} LRNonTerminal;
JerryLRSyntaxRoot表示LR语法中顶级产生式的左部,它是不用出现在Goto表中的,所以计数从它后面一个BasicBlock开始。但是,some value并不等于NR_GOTO_TABLE_COLUMN,而应该是
struct LRState { Action actions[SKIP]; struct LRState* gotoes[NR_GOTO_TABLE_COLUMN + 1]; };
多出来的一个项目在实现LR分析器的consumeNonTerminal时再解释。
除此之外,还需要一个枚举,告诉我们状态一共有多少个,一边确定分析预测表的大小
/* const.h */ typedef enum { LRState0 = 0, LRState1, BasicBlock_SentenceBasicBlock_1, BasicBlock_SentenceBasicBlock_2, Sentence_EoS_1, Sentence_IOAssignmentEoS_1, Sentence_IOAssignmentEoS_2, Sentence_IOAssignmentEoS_3, Sentence_BreakEoS_1, Sentence_BreakEoS_2, Sentence_AssignmentEoS_1, Sentence_AssignmentEoS_2, IfElseBranch_1, IfElseBranch_2, IfElseBranch_3, IfElseBranch_4, IfElseBranch_5, IfElseBranch_6, ElseBlock_1, ElseBlock_2, WhileLoop_1, WhileLoop_2, WhileLoop_3, WhileLoop_4, WhileLoop_5, Sentence_LBraceBasicBlockRBrace_1, Sentence_LBraceBasicBlockRBrace_2, Sentence_LBraceBasicBlockRBrace_3, Declaration_1, VariableRegister_VariableInitialization_1, VariableRegister_VariableInitialization_2, Initialization_AssignAssignment_1, Initialization_AssignAssignment_2, DeclarationVariableRegister, Declaration_3, NR_LR_STATE } LRStateFamily;
这样在构造分析表的函数开头就应该是
struct LRState* jerryLRStates = NULL; // LR状态数组指针 void initialLRStates(void) { jerryLRStates = (struct LRState*) allocate(NR_LR_STATE * sizeof(struct LRState)); memset(jerryLRStates, 0, NR_LR_STATE * sizeof(struct LRState)); ... }
每个移进函数应该是形如这样的:
void some_shift(struct LRAnalyser* self, struct Token* token)
{
// 将 转移目标状态 压入 self 的状态栈
// 将 token 压入 self 的符号栈
}
现在有个小问题,需要见那个token压入符号栈吗?实际操作中会发现,大部分的token都是“无意义”的。这里的“无意义”是指,在以后规约时,它们对于规约后的非终结符的内容没有影响,如在规约
WhileLoop -> <WHILE> <LPARENT> Assignment <RPARENT> BasicBlock
时,规约函数没有必要去检查第一个符号是不是<WHILE>以及第二个符号是不是<LPARENT>——如果不是,分析器何以走到这个状态来呢?所以大部分情况下没有必要将token压入栈中。
不过,有一些token还是有意义的,比如规约
Sentence -> <IO> Assignment <EOS>
时,第一个符号也许是<READ>,也许是<WRITE>,这种特殊情况则采取特殊的方案:回头去看看IONode的构造函数,它只需要一个AcceptType作为参数,而无需后继的Assignment,也就是说当读入一个<IO>时,可以直接利用它来构造一个IONode,并压入符号栈;最后在状态Sentence_IOAssignmentEoS_3进行规约时,再将符号栈栈顶的Assignment弹出,追加到次栈顶IONode中(注:<EOS>是无意义符号,不入栈,因此栈顶是Assignment)。除此之外,还有一些有意义符号也可以这样办,比如产生式
Declaration -> <TYPE> VariableRegister <EOS>
中的主角DeclarationNode,它在构造时也只需要传入一个类型参数,并且在构造实现中,这个类型参数还被转化过:
node->dectype = type - INTEGER_TYPE + INTEGER;
从声明类型变为数据类型。
那么,移进函数的实现分为这样两种形式:
void normal_shift(struct LRAnalyser* self, struct Token* token)
{
self->stateStack->push(self->stateStack, /* 目标状态指针 */ );
}
void declare_shift(struct LRAnalyser* self, struct Token* token)
{
self->stateStack->push(self->stateStack, jerryLRStates + Declaration_1);
self->symbolStack->push(self->stateStack, newDeclarationNode(token->type));
}
void io_shift(struct LRAnalyser* self, struct Token* token)
{
self->stateStack->push(self->stateStack, jerryLRStates + Sentence_IOAssignmentEoS_1);
self->symbolStack->push(self->stateStack, newIONode(token->type));
}
不过,如果手动去写每一个表项目对应的移进函数,那会很痛苦。幸而,C语言提供了捷径,这也是我喜欢C语言的部分原因——宏。可以设计一个简单的宏,然后利用展开这个宏则得到一个移进函数,这样就好多了。这里提供一种实现的参考
#define linkName(f, l) f ## l #define shift(current, encounter, target) \ static void linkName(current, encounter)(struct LRAnalyser* self, \ struct Token* t) \ { \ self->stateStack->push(self->stateStack, jerryLRStates + target); \ } /*******************************************************************/
第一个宏linkName用来产生函数名标识符,后面一个宏才是重要的移进函数模板,它接受3个参数,前两个参数并不在函数中出现,而仅仅传递给linkName来产生函数名,最后一个参数target应该是LR状态枚举中的一个,它告诉该函数应该将什么状态入栈。当然,仅仅有这一个宏是不够的,下面再来介绍一个更为强大的宏
#define shiftFirstSentence(current) \ shift(current, EOS, Sentence_EoS_1) \ shift(current, LBRACE, Sentence_LBraceBasicBlockRBrace_1) \ shift(current, IF, IfElseBranch_1) \ shift(current, WHILE, WhileLoop_1) \ shift(current, BREAK, Sentence_BreakEoS_1) \ static void linkName(current, INTEGER_TYPE)(struct LRAnalyser* self, \ struct Token* t) \ { \ self->symbolStack->push(self->symbolStack, \ newDeclarationNode(t->type)); \ self->stateStack->push(self->stateStack, \ jerryLRStates + Declaration_1); \ } \ \ static void linkName(current, REAL_TYPE)(struct LRAnalyser* self, \ struct Token* t) \ { \ self->symbolStack->push(self->symbolStack, \ newDeclarationNode(t->type)); \ self->stateStack->push(self->stateStack, \ jerryLRStates + Declaration_1); \ } \ \ static void linkName(current, READ)(struct LRAnalyser* self, \ struct Token* t) \ { \ self->symbolStack->push(self->symbolStack, newIONode(t->type)); \ self->stateStack->push(self->stateStack, \ jerryLRStates + Sentence_IOAssignmentEoS_1); \ } \ \ static void linkName(current, WRITE)(struct LRAnalyser* self, \ struct Token* t) \ { \ self->symbolStack->push(self->symbolStack, newIONode(t->type)); \ self->stateStack->push(self->stateStack, \ jerryLRStates + Sentence_IOAssignmentEoS_1); \ } /*********************************************************************/
正如它的名字shiftFirstSentence喻示的那样,这个宏展开以后将得到一系列函数,这些函数将一个状态与First(Sentence)集合中的每一项联系起来,适用于这个宏的状态是那些即将接受一个语句的状态,包括LRState0、BasicBlock_SentenceBasicBlock_1、IfElseBranch_4、WhileLoop_4、ElseBlock_1、
Sentence_LBraceBasicBlockRBrace_1这些状态,但是状态Sentence_LBraceBasicBlockRBrace_1如之前所说,它直接产生一个委托的LR分析器去分析内部基本块,因此用不到这个宏。
当然,光生成了函数还不够,还得将函数与对应的表项目关联起来,这就涉及到initialLRStates内部的操作。为了省代码,这一过程也可以用宏来辅助
#define setAction(state, encounter) \ jerryLRStates[state].actions[encounter] = linkName(state, encounter) #define setActionFirstAssignment(state) \ setAction(state, PLUS); \ setAction(state, MINUS); \ setAction(state, NOT); \ setAction(state, INTEGER); \ setAction(state, REAL); \ setAction(state, IDENT); \ setAction(state, LPARENT) /*********/ #define setActionFirstSentence(state) \ setAction(state, EOS); \ setAction(state, READ); \ setAction(state, WRITE); \ setAction(state, LBRACE); \ setAction(state, IF); \ setAction(state, WHILE); \ setAction(state, BREAK); \ setAction(state, INTEGER_TYPE); \ setAction(state, REAL_TYPE); \ setActionFirstAssignment(state) ///
第一个宏参数中的中state是LRStateFamily中的一个,encounter则是AcceptType中的一个。每次展开这个宏就可以得到一个赋值语句,关联表项与函数。而后面两个宏则分别对应于一次关联整个First(Assignment)和First(Sentence)的语句集。注意语句结尾的那些分号。
/* * 函数构造 */ #define shift(current, encounter, target) \ static void linkName(current, encounter)(struct LRAnalyser* self, \ struct Token* t) \ { \ self->stateStack->push(self->stateStack, jerryLRStates + target); \ printf( #current " encouters " #encounter "\n"); \ } /*******************************************************************/ shift(Sentence_AssignmentEoS_1, EOS, Sentence_AssignmentEoS_2) shift(Sentence_BreakEoS_1, EOS, Sentence_BreakEoS_2) shift(Sentence_LBraceBasicBlockRBrace_2, RBRACE, Sentence_LBraceBasicBlockRBrace_3) shift(IfElseBranch_1, LPARENT, IfElseBranch_2) shift(IfElseBranch_3, RPARENT, IfElseBranch_4) shift(IfElseBranch_5, ELSE, ElseBlock_1) shift(WhileLoop_1, LPARENT, WhileLoop_2) shift(WhileLoop_3, RPARENT, WhileLoop_4) shift(VariableRegister_VariableInitialization_1, ASSIGN, Initialization_AssignAssignment_1) shift(DeclarationVariableRegister, EOS, Declaration_3) shift(Sentence_IOAssignmentEoS_2, EOS, Sentence_IOAssignmentEoS_3) #define shiftFirstSentence(current) \ shift(current, EOS, Sentence_EoS_1) \ shift(current, LBRACE, Sentence_LBraceBasicBlockRBrace_1) \ shift(current, IF, IfElseBranch_1) \ shift(current, WHILE, WhileLoop_1) \ shift(current, BREAK, Sentence_BreakEoS_1) \ static void linkName(current, INTEGER_TYPE)(struct LRAnalyser* self, \ struct Token* t) \ { \ self->symbolStack->push(self->symbolStack, \ newDeclarationNode(t->type)); \ self->stateStack->push(self->stateStack, \ jerryLRStates + Declaration_1); \ } \ \ static void linkName(current, REAL_TYPE)(struct LRAnalyser* self, \ struct Token* t) \ { \ self->symbolStack->push(self->symbolStack, \ newDeclarationNode(t->type)); \ self->stateStack->push(self->stateStack, \ jerryLRStates + Declaration_1); \ } \ \ static void linkName(current, READ)(struct LRAnalyser* self, \ struct Token* t) \ { \ self->symbolStack->push(self->symbolStack, newIONode(t->type)); \ self->stateStack->push(self->stateStack, \ jerryLRStates + Sentence_IOAssignmentEoS_1); \ } \ \ static void linkName(current, WRITE)(struct LRAnalyser* self, \ struct Token* t) \ { \ self->symbolStack->push(self->symbolStack, newIONode(t->type)); \ self->stateStack->push(self->stateStack, \ jerryLRStates + Sentence_IOAssignmentEoS_1); \ } /*********************************************************************/ shiftFirstSentence(LRState0) shiftFirstSentence(BasicBlock_SentenceBasicBlock_1) shiftFirstSentence(IfElseBranch_4) shiftFirstSentence(WhileLoop_4) shiftFirstSentence(ElseBlock_1) #undef shiftFirstSentence #undef shift /* * initialState 函数绑定 */ void initialLRStates(void) { jerryLRStates = (struct LRState*) allocate(NR_LR_STATE * sizeof(struct LRState)); memset(jerryLRStates, 0, NR_LR_STATE * sizeof(struct LRState)); #define setAction(state, encounter) \ jerryLRStates[state].actions[encounter] = linkName(state, encounter) setAction(Sentence_AssignmentEoS_1, EOS); setAction(Sentence_BreakEoS_1, EOS); setAction(Sentence_LBraceBasicBlockRBrace_2, RBRACE); setAction(IfElseBranch_1, LPARENT); setAction(IfElseBranch_3, RPARENT); setAction(IfElseBranch_5, ELSE); setAction(WhileLoop_1, LPARENT); setAction(WhileLoop_3, RPARENT); setAction(VariableRegister_VariableInitialization_1, ASSIGN); setAction(DeclarationVariableRegister, EOS); setAction(Sentence_IOAssignmentEoS_2, EOS); #define setActionFirstAssignment(state) \ setAction(state, PLUS); \ setAction(state, MINUS); \ setAction(state, NOT); \ setAction(state, INTEGER); \ setAction(state, REAL); \ setAction(state, IDENT); \ setAction(state, LPARENT) /*********/ #define setActionFirstSentence(state) \ setAction(state, EOS); \ setAction(state, READ); \ setAction(state, WRITE); \ setAction(state, LBRACE); \ setAction(state, IF); \ setAction(state, WHILE); \ setAction(state, BREAK); \ setAction(state, INTEGER_TYPE); \ setAction(state, REAL_TYPE); \ setActionFirstAssignment(state) /// setActionFirstSentence(LRState0); setActionFirstSentence(BasicBlock_SentenceBasicBlock_1); setActionFirstSentence(IfElseBranch_4); setActionFirstSentence(IfElseBranch_5); setActionFirstSentence(WhileLoop_4); setActionFirstSentence(ElseBlock_1); // 后面还有 规约 的部分实现和 Goto 表的填充 }