ANTLR Reference书摘

Broadly speaking, if an application computes or “executes” sentences, we call that application an interpreter. Examples include calculators, configuration file readers, and Python interpreters. If we’re converting sentences from one language to another, we call that application a translator. Examples include Java to C# converters and compilers.

Programs that recognize languages are called parsers or syntax analyzers. Syntax refers to the rules governing language membership, and in this book we’re going to build ANTLR grammars to specify language syntax. A grammar is just a set of rules, each one expressing the structure of a phrase. The ANTLR tool translates grammars to parsers that look remarkably similar to what an experienced programmer might build by hand. (ANTLR is a program that writes other programs.) Grammars themselves follow the syntax of a language optimized for specifying other languages: ANTLR’s meta-language.


The process of grouping characters into words or symbols (tokens) is called lexical analysis or simply tokenizing. We call a program that tokenizes the input a lexer. The lexer can group related tokens into token classes, or token types, such as INT (integers), ID (identifiers), FLOAT (floating-point numbers), and so on. The lexer groups vocabulary symbols into types when the parser cares only about the type, not the individual symbols. Tokens consist of at least two pieces of information: the token type (identifying the lexical structure) and the text matched for that token by the lexer. 


The second stage is the actual parser and feeds off of these tokens to recognize the sentence structure, in this case an assignment statement. By default, ANTLR-generated parsers build a data structure called a parse tree or syntax tree that records how the parser recognized the structure of the input sentence and its component phrases. The following diagram illustrates the basic data flow of a language recognizer:


In Start ArrayInit Example:

ArrayInitParser.java This file contains the parser class definition specific to grammar ArrayInit that recognizes our array language syntax.It contains a method for each rule in the grammar as well as some support code.

ArrayInitLexer.java ANTLR automatically extracts a separate parser and lexer specification from our grammar.  This file contains the lexer class definition, which ANTLR generated by analyzing the lexical rules.

ArrayInit.tokens ANTLR assigns a token type number to each token we define and stores these values in this file. 

ArrayInitListener.java, ArrayInitBaseListener.java By default, ANTLR parsers build a tree from the input. By walking that tree, a tree walker can fire “events” (callbacks) to a listener object that we provide. ArrayInitListener is the interface that describes the callbacks we can implement. ArrayInitBaseListener is a set of empty default implementations. This class makes it easy for us to override just the callbacks we’re interested in.












你可能感兴趣的:(antlr)