options { LOOKAHEAD = 1; CHOICE_AMBIGUITY_CHECK = 2; OTHER_AMBIGUITY_CHECK = 1; STATIC = true; DEBUG_PARSER = false; DEBUG_LOOKAHEAD = false; DEBUG_TOKEN_MANAGER = false; ERROR_REPORTING = true; JAVA_UNICODE_ESCAPE = false; UNICODE_INPUT = false; IGNORE_CASE = false; USER_TOKEN_MANAGER = false; USER_CHAR_STREAM = false; BUILD_PARSER = true; BUILD_TOKEN_MANAGER = true; SANITY_CHECK = true; FORCE_LA_CHECK = false; } PARSER_BEGIN(Simple1) /** Simple brace matcher. */ public class Simple1 { /** Main entry point. */ public static void main(String args[]) throws ParseException { Simple1 parser = new Simple1(System.in); parser.Input(); } } PARSER_END(Simple1) /** Root production. */ void Input() : {} { MatchedBraces() ("\n"|"\r")* <EOF> } /** Brace matching production. */ void MatchedBraces() : {} { "{" [ MatchedBraces() ] "}" }
认真的看了javacc中的simpleExamples中的example1.jj以及readme文档,才刚刚开始理解Javacc中的简单语法。
Following this is a list of productions. In this example, there are
two productions, that define the non-terminals "Input" and
"MatchedBraces" respectively. In JavaCC grammars, non-terminals are
written and implemented (by JavaCC) as Java methods. When the
non-terminal is used on the left-hand side of a production, it is
considered to be declared and its syntax follows the Java syntax. On
the right-hand side its use is similar to a method call in Java.
对应simple1.jj中的代码如下
void Input() : {} { MatchedBraces() ("\n"|"\r")* <EOF> } /** Brace matching production. */ void MatchedBraces() : {} { "{" [ MatchedBraces() ] "}"
}
简单的翻译下:
接下来是一系列的产生式。在这个例子中,有分别定义非终结符“Input”和 “MatchedBraces”两个产生式。在JavaCC语法中,非终结符的申明和实现犹如java语言中方法。当该终结符出现在产生式的左边时,则该终结符被声明,该语法和java中语法相同。如果出现在右边,类似于java中方法的调用。
Each production defines its left-hand side non-terminal followed by a
colon. This is followed by a bunch of declarations and statements
within braces (in both cases in the above example, there are no
declarations and hence this appears as "{}") which are generated as
common declarations and statements into the generated method. This is
then followed by a set of expansions also enclosed within braces.
每个产生式定义了后面紧跟着冒号的“左边”非终结符。接着是一串带有大括号的declarations和statements(在上面的两个产生式中,都没有详细的declarations,所以在代码中仅仅有“{}”),
而在statements可以是一系列的表达式或declarations或statements。
Lexical tokens (regular expressions) in a JavaCC input grammar are
either simple strings ("{", "}", "\n", and "\r" in the above example),
or a more complex regular expression. In our example above, there is
one such regular expression "<EOF>" which is matched by the end of
file. All complex regular expressions are enclosed within angular
brackets.
Javacc中词法记号(正则表达式)或者是简单的字符串或者是一些复杂的正则表达式。在这个例子中,有这样一个正则表达式“<EOF>”,它表示文件的末尾。所有复杂的正则表达式以尖括号封闭。
The first production above says that the non-terminal "Input" expands
to the non-terminal "MethodBraces" followed by zero or more line
terminators ("\n" or "\r") and then the end of file.
该例子中的第一个表达式表明了:非终结符“Input”为在非终结符MethodBraces后跟了0或多个终结符“("\n" or "\r")”以及文件的结束。
The second production above says that the non-terminal "MatchedBraces"
expands to the token "{" followed by an optional nested expansion of
"MatchedBraces" followed by the token "}". Square brackets [...]
in a JavaCC input file indicate that the ... is optional.
第二个产生式表明:非终结符“MatchedBraces”为“{”后跟着可选可不选的“MatchedBraces”后再跟着“}”,中括号[]的的内的内容是可选的。
[...] may also be written as (...)?. These two forms are equivalent.
Other structures that may appear in expansions are:
e1 | e2 | e3 | ... : A choice of e1, e2, e3, etc.
( e )+ : One or more occurrences of e
( e )* : Zero or more occurrences of e
Note that these may be nested within each other, so we can have
something like:
(( e1 | e2 )* [ e3 ] ) | e4
[…]也可以写成(…)?。这两种格式是等价的,其他一些结构:如上。