Why can LR(0) automata be used to make shift-reduce decisions? The LR(0) automaton for a grammar characterizes the strings of grammar symbols that can appear on the stack of a shift-reduce parser for the grammar. The stack contents must be a prefix of a right-sentential form. If the stack holds α and the rest of the input is x, then a sequence of reductions will take αx to S. In terms of derivations, S=>αx.
Not all prefixes of right-sentential forms can appear on the stack, however, since the parser must not shift past the handle. For example, suppose
E=>F*id=>(E)*id
Then, at various times during the parse, the stack will hold (, (E, and (E), but it must not hold (E)*, since (E) is a handle, which the parser must reduce to F before shifting * .
The prefixes of right sentential forms that can appear on the stack of a shiftreduce parser are called viable prefixes. They are defined as follows: a viable prefix is a prefix of a right-sentential form that does not continue past the right end of the rightmost handle of that sentential form. By this definition, it is always possible to add terminal symbols to the end of a viable prefix to obtain a right-sentential form.
SLR parsing is based on the fact that LR(0) automata recognize viable prefixes. We say item A -> β1β2 is valid for a viable prefix αβ1 if there is a derivation S' => αAω => αβ1β2ω. In general, an item will be valid for many viable prefixes.
The fact that A -> β1β2 is valid for αβ1 tells us a lot about whether to shift or reduce when we find αβ1 on the parsing stack. In particular, if β2≠ε, then it suggests that we have not yet shifted the handle onto the stack, so shift is our move. If β2=ε, then it looks as if A -> β1 is the handle, and we should reduce by this production. Of course, two valid items may tell us to do different things for the same viable prefix. Some of these conflicts can be resolved by looking at the next input symbol, and others can be resolved by the methods of Section 4.8, but we should not suppose that all parsing action conflicts can be resolved if the LR method is applied to an arbitrary grammar.
We can easily compute the set of valid items for each viable prefix that can appear on the stack of an LR parser. In fact, it is a central theorem of LR-parsing theory that the set of valid items for a viable prefix γ is exactly the set of items reached from the initial state along the path labeled γ in the LR(0) automaton for the grammar. In essence, the set of valid items embodies all the useful information that can be gleaned from the stack. While we shall not prove this theorem here, we shall give an example.
Example 4.50: Let us consider the augmented expression grammar again, whose sets of items and GOTO function are exhibited in Fig. 4.31. Clearly, the string E + T* is a viable prefix of the grammar. The automaton of Fig. 4.31 will be in state 7 after having read E + T*. State 7 contains the items
T->T*@F
F->@(E)
F->@id
which are precisely the items valid for E+T*. To see why, consider the following three rightmost derivations
E'=>E=>E+T=>E+T*F
E'=>E=>E+T=>E+T*F=>E+T*(E)
E'=>E=>E+T=>E+T*F=>E+T*id
The first derivation shows the validity of T -> T * @F, the second the validity of F -> @(E), and the third the validity of F -> @id. It can be shown that there are no other valid items for E + T*, although we shall not prove that fact here.