文法产生式举例
void stmtO {
}
switch ( lookahead ) {
case expr:
match(expr); match(' ; ') ; break;
case if:
match(if); match(' C') ; match(expr); match(' ) ') ; stmtO;
break;
case for:
match(for); match(' C') ;
optexprO; match(' ; '); optexprO; match(' ; '); optexprO;
match(' ) ') ; stmtO ; break;
case other;
match(other); break;
default:
report("syntax error") ;
}
void optexprO {
if ( lookahead = = expr ) match(expr);
}
void match(terminal t) {
}
if ( lookahead = = t ) lookahead = nextTerminal;
else report("syntax error") ;
递归下降语法如果遇到左递归就会陷入无限循环,如下
A -->Aα|β
利用右递归消除左递归
A --> βR
R --> αR|ε
The methods commonly used in compilers can be classified as being either top-down or bottom-up. As implied by their names, top-down methods build parse trees from the top (root) to the bottom (leaves), while bottom-up methods
start from the leaves and work their way up to the root. In either case, the
input to the parser is scanned from left to right, one symbol at a time.
The construction of a parse tree can be made precise by taking a derivational
view, in which productions are treated as rewriting rules. Beginning with the
start symbol, each rewriting step replaces a nonterminal by the body of one ofits
productions. This derivational view corresponds to the top-down construction
of a parse tree, but the precision afforded by derivations will be especially helpful
when bottom-up parsing is discussed.
The abstract language is Ll = {wcw I w is in (alb)* }. Ll consists of
all words composed of a repeated string of a's and b's separated by c, such
as aabcaab. While it is beyond the scope of this book to prove it, the noncontext-freedom of Ll directly implies the non-context-freedom of programming
languages like C and Java, which require declaration of identifiers before their
use and which allow identifiers of arbitrary length.
For this reason, a grammar for C or Java does not distinguish among identifiers that are different character strings. Instead, all identifiers are represented by a token such as id in the grammar. In a compiler for such a language,
the semantic-analysis phase checks that identifiers are declared before they are
used.
Predictive parsers, that is, recursive-descent parsers needing no backtracking,
can be constructed for a class of grammars called LL(I), The first "L" in LL(I)
stands for scanning the input from left to right, the second "L" for producing
a leftmost derivation, and the "I" for using one input symbol of lookahead at
each step to make parsing action decisions.
We can think of bottom-up parsing as the process of "reducing" a string w to
the start symbol of the grammar. At each reduction step, a specific substring
matching the body of a production is replaced by the nonterminal at the head
of that production.
The key decisions during bottom-up parsing are about when to reduce and
about what production to apply, as the parse proceeds.
LR(k)语法分析法
The 'most prevalent type of bottom-up parser today is based on a concept called
LR(k) parsing; the "L" is for left-to-right scanning of the input, the "R" for
constructing a rightmost derivation in reverse, and the k for the number of
input symbols of lookahead that are used in making parsing decisions. The
cases k = a or k = 1 are of practical interest, and we shall only consider LR
parsers with k <= 1 here. When (k) is omitted, k is assumed to be 1.
LR的优势
LR parsing is attractive for a variety of reasons:
• LR parsers can be constructed to recognize virtually all programminglanguage constructs for which context-free grammars can be written. NonLR context-free grammars exist, but these can generally be avoided for
typical programming-language constructs.
• The LR-parsing method is the most general nonbacktracking shift-reduce
parsing method known, yet it can be implemented as efficiently as other,
more primitive shift-reduce methods (see the bibliographic notes).
• An LR parser can detect a syntactic error as soon as it is possible to do
so on a left-to-right scan of the input.
为LR手工构造分析器,工作量巨大,有工具协助比较好
The principal drawback of the LR method is that it is too much work to
construct an LR parser by hand for a typical programming-language grammar.
A specialized tool, an LR parser generator, is needed. (比如yacc bison)