In this section, we shall extend the previous LR parsing techniques to use one symbol of lookahead on the input. There are two different methods:
After introducing both these methods, we conclude with a discussion of how to compact LR parsing tables for environments with limited memory.
We shall now present the most general technique for constructing an LR parsing table from a grammar. Recall that in the SLR method, state i calls for reduction by A -> α if the set of items Ii contains item [A -> α@] and a is in FOLLOW(A). In some situations, however, when state i appears on top of the stack, the viable prefix βα on the stack is such that βA cannot be followed by a in any right-sentential form. Thus, the reduction by A -> α should be invalid on input a.
Example 4.51: Let us reconsider Example 4.48, where in state 2 we had item R -> L@, which could correspond to A -> α above, and a could be the = sign, which is in FOLLOW(R). Thus, the SLR parser calls for reduction by R -> L in state 2 with = as the next input (the shift action is also called for, because of item S -> L@=R in state 2). However, there is no right-sentential form of the grammar in Example 4.48 that begins R = ... . Thus state 2, which is the state corresponding to viable prefix L only, should not really call for reduction of that L to R.
It is possible to carry more information in the state that will allow us to rule out some of these invalid reductions by A -> α. By splitting states when necessary, we can arrange to have each state of an LR parser indicate exactly which input symbols can follow a handle α for which there is a possible reduction to A.
The extra information is incorporated into the state by redefining items to include a terminal symbol as a second component. The general form of an item becomes [A -> α@β, a] , where A -> αβ is a production and a is a terminal or the right endmarker $. We call such an object an LR(1) item. The 1 refers to the length of the second component, called the lookahead of the item.@6 The lookahead has no effect in an item of the form [A -> α@β, a], where β is not ε, but an item of the form [A -> α@, a] calls for a reduction by A->α only if the next input symbol is a. Thus, we are compelled to reduce by A->α only on those input symbols a for which [A -> α@, a] is an LR(1) item in the state on top of the stack. The set of such a's will always be a subset of FOLLOW(A), but it could be a proper subset, as in Example 4.5l.
@6: Lookaheads that are strings of length greater than one are possible, of course, but we shall not consider such lookaheads here.
Formally, we say LR(1) item [A -> α@β, a] is valid for a viable prefix γ if there is a derivation S => δAω => δαβω , where
Example 4.52: Let us consider the grammar
S->BB
B->aB|b
There is a rightmost derivation S => aaBab => aaaBab. We see that item [B -> a@B, a] is valid for a viable prefix γ = aaa by letting δ = aa, A = B, ω = ab, α = a, and β = B in the above definition. There is also a rightmost derivation S => BaB => BaaB. From this derivation we see that item [B -> a@B, $] is valid for viable prefix Baa.
The method for building the collection of sets of valid LR (1) items is essentially the same as the one for building the canonical collection of sets of LR (0) items. We need only to modify the two procedures CLOSURE and GOTO.
Figure 4.40: Sets-of-LR(1)-items construction for grammar G'
To appreciate the new definition of the CLOSURE operation, in particular, why b must be in FIRST(βα), consider an item of the form [A -> α@Bβ, a] in the set of items valid for some viable prefix γ. Then there is a rightmost derivation S => δAax => δαBβax, where γ=δα. Suppose βax derives terminal string by. Then for each production of the form B -> η for some η , we have derivation S => γBby => γηby. Thus, [B -> @η, b] is valid for γ. Note that b can be the first terminal derived from β, or it is possible that β derives ε in the derivation βax => by, and b can therefore be a. To summarize both possibilities we say that b can be any terminal in FIRST(βax), where FIRST is the function from Section 4.4. Note that x cannot contain the first terminal of by, so FIRST(βax) = FIRST(βa) . We now give the LR(1) sets of items construction.
Figure 4.41: The GOTO graph for grammar (4.55)