dnc8371

ANTLR巨型教程

解析器是功能强大的工具，使用ANTLR，您可以编写可用于多种不同语言的各种解析器。

在本完整的教程中，我们将要：

解释基础 ：解析器是什么，解析器可以用于什么
了解如何设置要从Javascript，Python，Java和C＃中使用的ANTLR
讨论如何测试解析器
展示ANTLR中最先进，最有用的功能 ：您将学到解析所有可能的语言所需的一切
显示大量示例

也许您已经阅读了一些过于复杂或过于局部的教程，似乎以为您已经知道如何使用解析器。这不是那种教程。我们只希望您知道如何编码以及如何使用文本编辑器或IDE。而已。

在本教程的最后：

您将能够编写一个解析器以识别不同的格式和语言
您将能够创建构建词法分析器和解析器所需的所有规则
您将知道如何处理遇到的常见问题
您将了解错误，并且将知道如何通过测试语法来避免错误。

换句话说，我们将从头开始，到结束时，您将学到所有可能需要了解ANTLR的知识。

ANTLR Mega Tutorial Giant目录列表

什么是ANTLR？

ANTLR是解析器生成器，可帮助您创建解析器的工具。 解析器获取一段文本并将其转换为一个组织化的结构 ，例如抽象语法树（AST）。您可以将AST看作是描述代码内容的故事，也可以看作是通过将各个部分放在一起而创建的逻辑表示。

欧氏算法的AST的图形表示

获得AST所需要做的事情：

定义词法分析器和语法分析器
调用ANTLR：它将以您的目标语言（例如Java，Python，C＃，Javascript）生成一个词法分析器和解析器
使用生成的词法分析器和解析器：调用它们并传递代码以进行识别，然后它们会返回给您AST

因此，您需要首先为要分析的事物定义一个词法分析器和解析器语法。通常，“事物”是一种语言，但它也可以是数据格式，图表或任何以文本表示的结构。

正则表达式不够吗？

如果您是典型的程序员，您可能会问自己： 为什么我不能使用正则表达式 ？正则表达式非常有用，例如当您想在文本字符串中查找数字时，它也有很多限制。

最明显的是缺乏递归：除非您为每个级别手动编码，否则您无法在另一个表达式中找到一个（正则）表达式。很快就无法维持的事情。但是更大的问题是它并不是真正可扩展的：如果您只将几个正则表达式放在一起，就将创建一个脆弱的混乱，将很难维护。

使用正则表达式不是那么容易

您是否尝试过使用正则表达式解析HTML？这是一个可怕的想法，因为您冒着召唤克苏鲁的危险，但更重要的是， 它实际上并没有奏效 。你不相信我吗让我们看一下，您想要查找表的元素，因此尝试像这样的常规扩展：

(.*?)

。辉煌！你做到了！除非有人向其表中添加诸如style或id类的属性。没关系，您执行(.*?) ，但实际上您关心表中的数据，因此您需要解析tr和td ，但是它们已满标签。

因此，您也需要消除这种情况。而且甚至有人甚至敢使用<！—我的评论＆gtl->之类的评论。注释可以在任何地方使用，并且使用正则表达式不容易处理。是吗？

因此，您禁止Internet使用HTML中的注释：已解决问题。

或者，您也可以使用ANTLR，对您而言似乎更简单。

ANTLR与手动编写自己的解析器

好的，您确信需要一个解析器，但是为什么要使用像ANTLR这样的解析器生成器而不是构建自己的解析器呢？

ANTLR的主要优势是生产率

如果您实际上一直在使用解析器，因为您的语言或格式在不断发展，则您需要能够保持步伐，而如果您必须处理实现a的细节，则无法做到这一点。解析器。由于您不是为了解析而解析，因此您必须有机会专注于实现目标。而ANTLR使得快速，整洁地执行此操作变得更加容易。

其次，定义语法后，您可以要求ANTLR生成不同语言的多个解析器。例如，您可以使用C＃获得一个解析器，而使用Javascript获得一个解析器，以在桌面应用程序和Web应用程序中解析相同的语言。

有人认为，手动编写解析器可以使其更快，并且可以产生更好的错误消息。这有些道理，但以我的经验，ANTLR生成的解析器总是足够快。如果确实需要，您可以调整语法并通过处理语法来提高性能和错误处理。只要对语法感到满意，就可以这样做。

目录还是可以的

两个小注意事项：

在本教程的配套存储库中，您将找到所有带有测试的代码，即使我们在本文中没有看到它们
示例将使用不同的语言，但是知识通常适用于任何语言

设定

设定ANTLR
Javascript设置
Python设置
Java设置
C＃设定

初学者

词法分析器
创建语法
设计数据格式
Lexer规则
解析器规则
错误与调整

中级

用Javascript设置聊天项目
Antlr.js
HtmlChatListener.js
与听众合作
用语义谓词解决歧义
用Python继续聊天
与侦听器配合使用的Python方法
用Python测试
解析标记
词汇模式
解析器文法

高级

Java中的标记项目
主App.java
使用ANTLR转换代码
转换代码的喜悦与痛苦
高级测试
处理表情
解析电子表格
C＃中的电子表格项目
Excel注定了
测试一切

结束语

技巧和窍门
结论

设定

在本节中，我们准备使用ANTLR的开发环境：解析器生成器工具，每种语言的支持工具和运行时。

1.设置ANTLR

ANTLR实际上由两个主要部分组成：用于生成词法分析器和解析器的工具，以及运行它们所需的运行时。

语言工程师将只需要您使用该工具，而运行时将包含在使用您的语言的最终软件中。

无论您使用哪种语言，该工具始终是相同的：这是开发计算机上所需的Java程序。尽管每种语言的运行时都不同，但是开发人员和用户都必须可以使用它。

该工具的唯一要求是您已经安装了至少Java 1.7 。要安装Java程序，您需要从官方站点下载最新版本，当前版本为：

http://www.antlr.org/download/antlr-4.6-complete.jar

使用说明

将下载的工具复制到通常放置第三方Java库的位置（例如/usr/local/lib或C:\Program Files\Java\lib ）
将工具添加到您的CLASSPATH 。将其添加到您的启动脚本中（例如.bash_profile ）
（可选）还在您的启动脚本中添加别名，以简化ANTLR的使用

在Linux / Mac OS上执行说明

// 1.
sudo cp antlr-4.6-complete.jar /usr/local/lib/
// 2. and 3.
// add this to your .bash_profile
export CLASSPATH=".:/usr/local/lib/antlr-4.6-complete.jar:$CLASSPATH"
// simplify the use of the tool to generate lexer and parser
alias antlr4='java -Xmx500M -cp "/usr/local/lib/antlr-4.6-complete.jar:$CLASSPATH" org.antlr.v4.Tool'
// simplify the use of the tool to test the generated code
alias grun='java org.antlr.v4.gui.TestRig'

在Windows上执行说明

// 1.
Go to System Properties dialog > Environment variables
-> Create or append to the CLASSPATH variable
// 2. and 3. Option A: use doskey
doskey antlr4=java org.antlr.v4.Tool $*
doskey grun =java org.antlr.v4.gui.TestRig $*
// 2. and 3. Option B: use batch files
// create antlr4.bat
java org.antlr.v4.Tool %*
// create grun.bat
java org.antlr.v4.gui.TestRig %*
// put them in the system path or any of the directories included in %path%

典型工作流程

使用ANTLR时，首先要编写语法，即扩展名为.g4的文件，其中包含要分析的语言规则。然后，您可以使用antlr4程序来生成程序将实际使用的文件，例如词法分析器和解析器。

antlr4

运行antlr4时可以指定几个重要选项。

首先，您可以指定目标语言，以Python或JavaScript或任何其他不同于Java的目标（默认语言）生成解析器。其他的用于生成访问者和侦听器（不要担心，如果您不知道这些是什么，我们将在后面进行解释）。

缺省情况下，仅生成侦听器，因此要创建访问者，请使用-visitor命令行选项，如果不想生成-no-listener则使用-no-listener listener。也有相反的选项-no-visitor和-listener ，但它们是默认值。

antlr4 -visitor

您可以使用一个名为TestRig (的小实用工具来优化语法测试TestRig (尽管，如我们所见，它通常是grun的别名）。

grun

文件名是可选的，您可以代替分析在控制台上键入的输入。

如果要使用测试工具，则即使您的程序是用另一种语言编写的，也需要生成Java解析器。这可以通过选择与antlr4不同的选项来antlr4 。

手动测试语法初稿时，Grun非常有用。随着它变得更加稳定，您可能希望继续进行自动化测试（我们将看到如何编写它们）。

Grun还有一些有用的选项： -tokens ，显示检测到的令牌， -gui生成AST的图像。

2. Javascript设置

您可以将语法与Javascript文件放在同一文件夹中。包含语法的文件必须具有与语法相同的名称，该名称必须在文件顶部声明。

在下面的示例中，名称为Chat ，文件为Chat.g4 。

通过使用ANTLR4 Java程序指定正确的选项，我们可以创建相应的Javascript解析器。

antlr4 -Dlanguage=JavaScript Chat.g4

请注意，该选项区分大小写，因此请注意大写的“ S”。如果您输入有误，则会收到类似以下的消息。

error(31):  ANTLR cannot generate Javascript code as of version 4.6

ANTLR可以与node.js一起使用，也可以在浏览器中使用。对于浏览器，您需要使用webpack或require.js 。如果您不知道如何使用两者之一，可以查阅官方文档寻求帮助或阅读网络上的antlr教程。我们将使用node.js ，只需使用以下标准命令即可为之安装ANTLR运行时。

npm install antlr4

3. Python设置

有了语法后，请将其放在与Python文件相同的文件夹中。该文件必须具有与语法相同的名称，该名称必须在文件顶部声明。在下面的示例中，名称为Chat ，文件为Chat.g4 。

通过使用ANTLR4 Java程序指定正确的选项，我们可以简单地创建相应的Python解析器。对于Python，您还需要注意Python的版本2或3。

antlr4 -Dlanguage=Python3 Chat.g4

PyPi提供了运行时，因此您可以使用pio进行安装。

pip install antlr4-python3-runtime

同样，您只需要记住指定正确的python版本。

4. Java设定

要使用ANTLR设置我们的Java项目，您可以手动执行操作。或者您可以成为文明的人并使用Gradle或Maven。

另外，您可以在IDE中查看ANTLR插件。

4.1使用Gradle进行Java设置

这就是我通常设置Gradle项目的方式。

我使用Gradle插件调用ANTLR，也使用IDEA插件生成IntelliJ IDEA的配置。

dependencies {
  antlr "org.antlr:antlr4:4.5.1"
  compile "org.antlr:antlr4-runtime:4.5.1"
  testCompile 'junit:junit:4.12'
}
 
generateGrammarSource {
    maxHeapSize = "64m"
    arguments += ['-package', 'me.tomassetti.mylanguage']
    outputDirectory = new File("generated-src/antlr/main/me/tomassetti/mylanguage".toString())
}
compileJava.dependsOn generateGrammarSource
sourceSets {
    generated {
        java.srcDir 'generated-src/antlr/main/'
    }
}
compileJava.source sourceSets.generated.java, sourceSets.main.java
 
clean{
    delete "generated-src"
}
 
idea {
    module {
        sourceDirs += file("generated-src/antlr/main")
    }
}

我将语法放在src / main / antlr /下 ，并且gradle配置确保它们在与程序包相对应的目录中生成。例如，如果我希望解析器位于包me.tomassetti.mylanguage中，则必须将其生成到generate-src / antlr / main / me / tomassetti / mylanguage中 。

此时，我可以简单地运行：

# Linux/Mac
./gradlew generateGrammarSource
 
# Windows
gradlew generateGrammarSource

然后我从语法中生成了词法分析器和解析器。

然后我也可以运行：

# Linux/Mac
./gradlew idea
 
# Windows
gradlew idea

我已经准备好要打开一个IDEA项目。

4.2使用Maven进行Java设置

首先，我们将在POM中指定需要antlr4-runtime作为依赖项。我们还将使用Maven插件通过Maven运行ANTLR。

我们还可以指定是否使用ANTLR来生成访问者或侦听器。为此，我们定义了几个相应的属性。


  4.0.0
 
  [..]
 
  
    UTF-8
    true
    true
    
 
  
    
      org.antlr
      antlr4-runtime
      4.6
    
   [..]
  
 
  
    
      [..]
      
      
        org.antlr
        antlr4-maven-plugin
        4.6                
        
          
            
              antlr4
                        
          
        
      
      [..]

现在，您必须将语法的* .g4文件放在src/main/antlr4/me/tomassetti/examples/MarkupParser.

编写完语法后，您只需运行mvn package ，所有奇妙的事情就会发生：ANTLR被调用，它会生成词法分析器和解析器，并将它们与其余代码一起编译。

// use mwn to generate the package
mvn package

如果您从未使用过Maven，则可以查看Java目标的官方ANTLR文档或Maven网站来入门。

使用Java开发ANTLR语法有一个明显的优势：有多个IDE的插件，这是该工具的主要开发人员实际使用的语言。因此，它们是org.antlr.v4.gui.TestRig类的工具，可以轻松地集成到您的工作流中，如果您想轻松地可视化输入的AST，这些工具将非常有用。

5. C＃设置

支持.NET Framework和Mono 3.5，但不支持.NET Core。我们将使用Visual Studio创建我们的ANTLR项目，因为由C＃目标的同一作者为Visual Studio创建了一个不错的扩展，称为ANTLR语言支持 。您可以通过进入工具->扩展和更新来安装它。当您构建项目时，此扩展将自动生成解析器，词法分析器和访问者/侦听器。

此外，该扩展名将允许您使用众所周知的菜单添加新项目来创建新的语法文件。最后但并非最不重要的一点是，您可以在每个语法文件的属性中设置用于生成侦听器/访问者的选项。

另外，如果您更喜欢使用编辑器，则需要使用常规的Java工具生成所有内容。您可以通过指示正确的语言来做到这一点。在此示例中，语法称为“电子表格”。

antlr4 -Dlanguage=CSharp Spreadsheet.g4

请注意，CSharp中的“ S”为大写。

您仍然需要项目的ANTLR4运行时，并且可以使用良好的nu'nuget安装它。

初学者

在本节中，我们为使用ANTLR奠定了基础：什么是词法分析器和解析器，在语法中定义它们的语法以及可用于创建它们的策略。我们还将看到第一个示例，以展示如何使用所学知识。如果您不记得ANTLR的工作原理，可以回到本节。

6.词法分析器

在研究解析器之前，我们需要首先研究词法分析器，也称为令牌化器。它们基本上是解析器的第一个垫脚石，当然ANTLR也允许您构建它们。 词法分析器将各个字符转换为令牌 （解析器用来创建逻辑结构的原子）。

想象一下，此过程适用于自然语言，例如英语。您正在阅读单个字符，将它们放在一起直到形成一个单词，然后将不同的单词组合成一个句子。

让我们看下面的示例，并想象我们正在尝试解析数学运算。

437 + 734

词法分析器扫描文本，然后找到“ 4”，“ 3”，“ 7”，然后找到空格“”。因此，它知道第一个字符实际上代表一个数字。然后，它找到一个“ +”符号，因此知道它代表一个运算符，最后找到另一个数字。

它怎么知道的？因为我们告诉它。

/*
 * Parser Rules
 */
 
operation  : NUMBER '+' NUMBER ;
 
/*
 * Lexer Rules
 */
 
NUMBER     : [0-9]+ ;
 
WHITESPACE : ' ' -> skip ;

这不是一个完整的语法，但是我们已经可以看到词法分析器规则全部为大写，而解析器规则全部为小写。从技术上讲，关于大小写的规则仅适用于其名称的第一个字符，但通常为了清楚起见，它们全都为大写或小写。

规则通常按以下顺序编写：首先是解析器规则，然后是词法分析器规则，尽管在逻辑上它们是按相反的顺序应用的。同样重要的是要记住， 词法分析器规则是按照它们出现的顺序进行分析的 ，它们可能是不明确的。

典型的例子是标识符：在许多编程语言中，它可以是任何字母字符串，但是某些组合（例如“ class”或“ function”）被禁止，因为它们表示一个class或function 。因此，规则的顺序通过使用第一个匹配项来解决歧义，这就是为什么首先定义标识关键字（例如类或函数）的令牌，而最后一个用于标识符的令牌的原因。

规则的基本语法很容易： 有一个名称，一个冒号，该规则的定义和一个终止分号

NUMBER的定义包含一个典型的数字范围和一个“ +”符号，表示允许一个或多个匹配项。这些都是我认为您熟悉的非常典型的指示，否则，您可以阅读有关正则表达式的语法的更多信息。

最后，最有趣的部分是定义WHITESPACE令牌的词法分析器规则。这很有趣，因为它显示了如何指示ANTLR忽略某些内容。考虑一下忽略空白如何简化解析器规则：如果我们不能说忽略WHITESPACE，则必须将其包括在解析器的每个子规则之间，以便用户在所需的地方放置空格。像这样：

operation  : WHITESPACE* NUMBER WHITESPACE* '+' WHITESPACE* NUMBER;

注释通常也是如此：它们可以出现在任何地方，并且我们不想在语法的每个部分中都专门处理它们，因此我们只是忽略它们（至少在解析时）。

7.创建语法

现在，我们已经了解了规则的基本语法，下面我们来看看定义语法的两种不同方法：自顶向下和自底向上。

自上而下的方法

这种方法包括从以您的语言编写的文件的一般组织开始。

文件的主要部分是什么？他们的顺序是什么？每个部分中包含什么？

例如，Java文件可以分为三个部分：

包装声明
进口
类型定义

当您已经知道要为其设计语法的语言或格式时，此方法最有效。具有良好理论背景的人或喜欢从“大计划”入手的人可能会首选该策略。

使用这种方法时，首先要定义代表整个文件的规则。它可能会包括其他规则，以代表主要部分。然后，您定义这些规则，然后从最一般的抽象规则过渡到底层的实用规则。

自下而上的方法

自下而上的方法包括首先关注小元素：定义如何捕获令牌，如何定义基本表达式等等。然后，我们移至更高级别的构造，直到定义代表整个文件的规则。

我个人更喜欢从底层开始，这些基本项目是使用词法分析器进行分析的。然后您自然地从那里成长到结构，该结构由解析器处理。这种方法允许只关注语法的一小部分，为此建立语法，确保其按预期工作，然后继续进行下一个工作。

这种方法模仿了我们的学习方式。此外，从实际代码开始的好处是，在许多语言中，实际代码实际上是相当普遍的。实际上，大多数语言都具有标识符，注释，空格等内容。显然，您可能需要进行一些调整，例如HTML中的注释在功能上与C＃中的注释相同，但是具有不同的定界符。

自底向上方法的缺点在于解析器是您真正关心的东西。不要求您构建一个词法分析器，而是要求您构建一个可以提供特定功能的解析器。因此，如果您不了解程序的其余部分如何工作，那么从最后一部分词法分析器开始，您可能最终会进行一些重构。

8.设计数据格式

为新语言设计语法是困难的。您必须创建一种对用户来说简单而直观的语言，同时又要明确地使语法易于管理。它必须简洁，清晰，自然，并且不会妨碍用户。

因此，我们从有限的内容开始：一个简单的聊天程序的语法。

让我们从对目标的更好描述开始：

不会有段落，因此我们可以使用换行符作为消息之间的分隔符
我们要允许表情符号，提及和链接。我们将不支持HTML标签
由于我们的聊天将针对讨厌的青少年，因此我们希望为用户提供一种简单的方法来喊叫和设置文本颜色的格式。

最终，少年们可能会大喊大叫，全是粉红色。多么活着的时间。

9. Lexer规则

我们首先为聊天语言定义词法分析器规则。请记住，词法分析器规则实际上位于文件的末尾。

/*
 * Lexer Rules
 */
 
fragment A          : ('A'|'a') ;
fragment S          : ('S'|'s') ;
fragment Y          : ('Y'|'y') ;
fragment H          : ('H'|'h') ;
fragment O          : ('O'|'o') ;
fragment U          : ('U'|'u') ;
fragment T          : ('T'|'t') ;
 
fragment LOWERCASE  : [a-z] ;
fragment UPPERCASE  : [A-Z] ;
 
SAYS                : S A Y S ;
 
SHOUTS              : S H O U T S;
 
WORD                : (LOWERCASE | UPPERCASE | '_')+ ;
 
WHITESPACE          : (' ' | '\t') ;
 
NEWLINE             : ('\r'? '\n' | '\r')+ ;
 
TEXT                : ~[\])]+ ;

在此示例中，我们使用规则片段：它们是词法分析器规则的可重用构建块。您定义它们，然后在词法分析器规则中引用它们。如果定义它们但不将它们包括在词法分析器规则中，则它们根本无效。

我们为要在关键字中使用的字母定义一个片段。这是为什么？因为我们要支持不区分大小写的关键字。除了避免重复字符的情况以外，在处理浮点数时也使用它们。为了避免重复数字，请在点/逗号之前和之后。如下面的例子。

fragment DIGIT : [0-9] ;
NUMBER         : DIGIT+ ([.,] DIGIT+)? ;

TEXT令牌显示如何捕获所有内容，除了波浪号（'〜'）之后的字符以外。我们不包括右方括号']'，但是由于它是用于标识一组字符结尾的字符，因此必须在其前面加上反斜杠'\'来对其进行转义。

换行规则是用这种方式制定的，因为操作系统实际上指示换行的方式不同，有些包括carriage return ('\r') ，有些包括newline ('\n') ，或者二者结合。

10.解析器规则

我们继续解析器规则，这些规则是我们的程序将与之最直接交互的规则。

/*
 * Parser Rules
 */
 
chat                : line+ EOF ;
 
line                : name command message NEWLINE;
 
message             : (emoticon | link | color | mention | WORD | WHITESPACE)+ ;
 
name                : WORD ;
 
command             : (SAYS | SHOUTS) ':' WHITESPACE ;
                                        
emoticon            : ':' '-'? ')'
                    | ':' '-'? '('
                    ;
 
link                : '[' TEXT ']' '(' TEXT ')' ;
 
color               : '/' WORD '/' message '/';
 
mention             : '@' WORD ;

第一个有趣的部分是message ，与其包含的内容有关，不如说是它所代表的结构。我们说的是message可以是任何列出的规则中的任何顺序。这是解决空白时无需每次重复的简单方法。由于作为用户，我们发现空白不相关，因此我们看到类似WORD WORD mention ，但解析器实际上看到WORD WHITESPACE WORD WHITESPACE mention WHITESPACE 。

当您无法摆脱空白时，处理空白的另一种方法是更高级的：词法模式。基本上，它允许您指定两个词法分析器部分：一个用于结构化部分，另一个用于简单文本。这对于解析XML或HTML之类的内容很有用。我们将在稍后展示。

很明显，命令规则很明显，您只需要注意命令和冒号这两个选项之间不能有空格，但是之后需要一个WHITESPACE 。 表情符号规则显示了另一种表示多种选择的符号，您可以使用竖线字符“ |” 没有括号。我们仅支持带有或不带有中间线的两个表情符号，快乐和悲伤。

就像我们已经说过的那样，链接规则可能被认为是错误或执行不佳，实际上， TEXT捕获了除某些特殊字符之外的所有内容。您可能只想在括号内使用WORD和WHITESPACE，或者在方括号内强制使用正确的链接格式。另一方面，这允许用户在编写链接时犯错误，而不会使解析器抱怨。

您必须记住，解析器无法检查语义

例如，它不知道指示颜色的WORD是否实际代表有效颜色。也就是说，它不知道使用“ dog”是错误的，但是使用“ red”是正确的。必须通过程序的逻辑进行检查，该逻辑可以访问可用的颜色。您必须找到在语法和您自己的代码之间划分执行力的正确平衡。

解析器应仅检查语法。因此，经验法则是，如果有疑问，则让解析器将内容传递给程序。然后，在程序中，检查语义并确保规则实际上具有正确的含义。

让我们看一下规则颜色：它可以包含一条消息，它本身也可以是消息的一部分；这种歧义将通过使用的上下文来解决。

11.错误与调整

在尝试新语法之前，我们必须在文件开头添加一个名称。名称必须与文件名相同，文件扩展名应为.g4 。

grammar Chat;

您可以在官方文档中找到如何为您的平台安装所有内容。安装完所有内容后，我们创建语法，编译生成的Java代码，然后运行测试工具。

// lines preceded by $ are commands
// > are input to the tool
// - are output from the tool
$ antlr4 Chat.g4
$ javac Chat*.java
// grun is the testing tool, Chat is the name of the grammar, chat the rule that we want to parse
$ grun Chat chat
> john SAYS: hello @michael this will not work
// CTRL+D on Linux, CTRL+Z on Windows
> CTRL+D/CTRL+Z
- line 1:0 mismatched input 'john SAYS: hello @michael this will not work\n' expecting WORD

好的，它不起作用。为什么要期待WORD ？就在那！让我们尝试使用选项-tokens它可以识别的令牌，以找出-tokens 。

$ grun Chat chat -tokens
> john SAYS: hello @michael this will not work
- [@0,0:44='john SAYS: hello @michael this will not work\n',,1:0]
- [@1,45:44='',,2:0]

因此，它只能看到TEXT令牌。但是我们把它放在语法的末尾，会发生什么？问题在于它总是尝试匹配最大可能的令牌。所有这些文本都是有效的TEXT令牌。我们如何解决这个问题？有很多方法，第一种当然是摆脱该令牌。但是目前，我们将看到第二个最简单的方法。

[..]
 
link                : TEXT TEXT ;
 
[..]
 
TEXT                : ('['|'(') ~[\])]+ (']'|')');

我们更改了有问题的令牌，使其包含前面的括号或方括号。请注意，这并不完全相同，因为它允许两个系列的括号或方括号。但这是第一步，毕竟我们正在这里学习。

让我们检查一下是否可行：

$ grun Chat chat -tokens
> john SAYS: hello @michael this will not work
- [@0,0:3='john',,1:0]
- [@1,4:4=' ',,1:4]
- [@2,5:8='SAYS',,1:5]
- [@3,9:9=':',<':'>,1:9]
- [@4,10:10=' ',,1:10]
- [@5,11:15='hello',,1:11]
- [@6,16:16=' ',,1:16]
- [@7,17:17='@',<'@'>,1:17]
- [@8,18:24='michael',,1:18]
- [@9,25:25=' ',,1:25]
- [@10,26:29='this',,1:26]
- [@11,30:30=' ',,1:30]
- [@12,31:34='will',,1:31]
- [@13,35:35=' ',,1:35]
- [@14,36:38='not',,1:36]
- [@15,39:39=' ',,1:39]
- [@16,40:43='work',,1:40]
- [@17,44:44='\n',,1:44]
- [@18,45:44='',,2:0]

使用-gui选项，我们还可以拥有一个很好的，更易于理解的图形表示。

空中的点表示空白。

这行得通，但不是很聪明，不错或没有组织。但是不用担心，稍后我们将看到更好的方法。该解决方案的一个积极方面是，它可以显示另一个技巧。

TEXT                : ('['|'(') .*? (']'|')');

这是令牌TEXT的等效表示形式：“。” 匹配任何字符，“ *”表示可以随时重复前面的匹配，“？” 表示先前的比赛是非贪婪的。也就是说，前一个子规则匹配除其后的所有内容，从而允许匹配右括号或方括号。

中级

在本节中，我们将了解如何在程序中使用ANTLR，需要使用的库和函数，如何测试解析器等。我们了解什么是监听器以及如何使用监听器。通过查看更高级的概念（例如语义谓词），我们还基于对基础知识的了解。尽管我们的项目主要使用Javascript和Python，但该概念通常适用于每种语言。当您需要记住如何组织项目时，可以回到本节。

12.使用Java脚本设置聊天项目

在前面的部分中，我们逐段地介绍了如何为聊天程序构建语法。现在，让我们复制刚在Javascript文件的同一文件夹中创建的语法。

grammar Chat;
 
/*
 * Parser Rules
 */
 
chat                : line+ EOF ;
 
line                : name command message NEWLINE ;
 
message             : (emoticon | link | color | mention | WORD | WHITESPACE)+ ;
 
name                : WORD WHITESPACE;
 
command             : (SAYS | SHOUTS) ':' WHITESPACE ;
                                        
emoticon            : ':' '-'? ')'
                    | ':' '-'? '('
                    ;
 
link                : TEXT TEXT ;
 
color               : '/' WORD '/' message '/';
 
mention             : '@' WORD ;
 
 
/*
 * Lexer Rules
 */
 
fragment A          : ('A'|'a') ;
fragment S          : ('S'|'s') ;
fragment Y          : ('Y'|'y') ;
fragment H          : ('H'|'h') ;
fragment O          : ('O'|'o') ;
fragment U          : ('U'|'u') ;
fragment T          : ('T'|'t') ;
 
fragment LOWERCASE  : [a-z] ;
fragment UPPERCASE  : [A-Z] ;
 
SAYS                : S A Y S ;
 
SHOUTS              : S H O U T S ;
 
WORD                : (LOWERCASE | UPPERCASE | '_')+ ;
 
WHITESPACE          : (' ' | '\t')+ ;
 
NEWLINE             : ('\r'? '\n' | '\r')+ ;
 
TEXT                : ('['|'(') ~[\])]+ (']'|')');

通过使用ANTLR4 Java程序指定正确的选项，我们可以创建相应的Javascript解析器。

antlr4 -Dlanguage=JavaScript Chat.g4

现在，您将在文件夹中找到一些新文件，它们的名称如ChatLexer.js, ChatParser.js以及* .tokens文件，这些文件都不包含我们感兴趣的任何东西，除非您想了解ANTLR的内部工作原理。

您要查看的文件是ChatListener.js ，您不会对其进行任何修改，但是它包含我们将使用自己的侦听器覆盖的方法和函数。我们不会对其进行修改，因为每次重新生成语法时，更改都会被覆盖。

查看它，您可以看到几个输入/退出函数，每个解析器规则都有一对。当遇到与规则匹配的一段代码时，将调用这些函数。这是侦听器的默认实现，它使您可以在派生的侦听器上仅覆盖所需的功能，而其余部分保持不变。

var antlr4 = require('antlr4/index');
 
// This class defines a complete listener for a parse tree produced by ChatParser.
function ChatListener() {
    antlr4.tree.ParseTreeListener.call(this);
    return this;
}
 
ChatListener.prototype = Object.create(antlr4.tree.ParseTreeListener.prototype);
ChatListener.prototype.constructor = ChatListener;
 
// Enter a parse tree produced by ChatParser#chat.
ChatListener.prototype.enterChat = function(ctx) {
};
 
// Exit a parse tree produced by ChatParser#chat.
ChatListener.prototype.exitChat = function(ctx) {
};
 
[..]

创建Listener的替代方法是创建一个Visitor 。主要区别在于，您既无法控制侦听器的流程，也无法从其功能返回任何内容，而您既可以使用访问者来完成这两个操作。因此，如果您需要控制AST节点的输入方式或从其中几个节点收集信息，则可能需要使用访客。例如，这对于代码生成很有用，在代码生成中，创建新源代码所需的一些信息散布在许多部分。听者和访客都使用深度优先搜索。

深度优先搜索意味着当一个节点将被访问时，其子节点将被访问，如果一个子节点具有自己的子节点，则在继续第一个节点的其他子节点之前，将对其进行访问。下图将使您更容易理解该概念。

因此，对于侦听器，在与该节点的第一次相遇时将触发enter事件，并且在退出所有子节点之后将触发出口。在下图中，您可以看到在侦听器遇到线路节点时将触发哪些功能的示例（为简单起见，仅显示与线路相关的功能）。

对于标准的访问者，其行为将是相似的，当然，对于每个单个节点都只会触发单个访问事件。在下图中，您可以看到访问者遇到线路节点时将触发哪些功能的示例（为简单起见，仅显示与线路相关的功能）。

请记住， 这对于访问者的默认实现是正确的 ， 这是通过返回每个函数中每个节点的子代来完成的 。如果您忽略了访问者的方法，则有责任使访问者继续旅行或在此停留。

13. Antlr.js

终于到了看典型的ANTLR程序外观的时候了。

const http = require('http');
const antlr4 = require('antlr4/index');
const ChatLexer = require('./ChatLexer');
const ChatParser = require('./ChatParser');
const HtmlChatListener = require('./HtmlChatListener').HtmlChatListener;
 
http.createServer((req, res) => {
   
   res.writeHead(200, {
       'Content-Type': 'text/html',        
   });
 
   res.write('');
   
   var input = "john SHOUTS: hello @michael /pink/this will work/ :-) \n";
   var chars = new antlr4.InputStream(input);
   var lexer = new ChatLexer.ChatLexer(chars);
   var tokens  = new antlr4.CommonTokenStream(lexer);
   var parser = new ChatParser.ChatParser(tokens);
   parser.buildParseTrees = true;   
   var tree = parser.chat();   
   var htmlChat = new HtmlChatListener(res);
   antlr4.tree.ParseTreeWalker.DEFAULT.walk(htmlChat, tree);
   
   res.write('');
   res.end();
 
}).listen(1337);

在主文件的开头，我们导入（使用require ）必要的库和文件， antlr4 （运行时）和生成的解析器，以及稍后将要看到的侦听器。

为简单起见，我们从字符串中获取输入，而在实际情况下，它将来自编辑器。

第16-19行显示了每个ANTLR程序的基础：您从输入创建字符流，将其提供给词法分析器，然后将其转换为令牌，然后由解析器对其进行解释。

花一点时间思考一下是很有用的：词法分析器处理输入的字符，准确地说是输入的副本，而解析器处理解析器生成的标记。 词法分析器无法直接处理输入，解析器甚至看不到字符 。

记住这一点很重要，以防您需要执行一些高级操作（如操纵输入）。在这种情况下，输入是字符串，但当然可以是任何内容流。

第20行是多余的，因为该选项已经默认为true，但是在以后的运行时版本中可能会更改，因此最好指定它。

然后，在第21行，将树的根节点设置为聊天规则。您要调用解析器，指定一个通常是第一条规则的规则。但是，实际上您可以直接调用任何规则，例如color 。

通常，一旦从解析器中获取AST，我们就想使用侦听器或访问者来处理它。在这种情况下，我们指定一个侦听器。我们特定的侦听器采用一个参数：响应对象。我们希望使用它在响应中放入一些文本以发送给用户。设置好听众之后，我们最终与听众一起走到树上。

14. HtmlChatListener.js

我们继续看聊天项目的听众。

const antlr4 = require('antlr4/index');
const ChatLexer = require('./ChatLexer');
const ChatParser = require('./ChatParser');
var ChatListener = require('./ChatListener').ChatListener;
 
HtmlChatListener = function(res) {
    this.Res = res;    
    ChatListener.call(this); // inherit default listener
    return this;
};
 
// inherit default listener
HtmlChatListener.prototype = Object.create(ChatListener.prototype);
HtmlChatListener.prototype.constructor = HtmlChatListener;
 
// override default listener behavior
HtmlChatListener.prototype.enterName = function(ctx) {          
    this.Res.write("");    
};
 
HtmlChatListener.prototype.exitName = function(ctx) {      
    this.Res.write(ctx.WORD().getText());
    this.Res.write(" ");
}; 
 
HtmlChatListener.prototype.exitEmoticon = function(ctx) {      
    var emoticon = ctx.getText();        
    
    if(emoticon == ':-)' || emoticon == ':)')
    {
        this.Res.write("??");        
    }
    
    if(emoticon == ':-(' || emoticon == ':(')
    {
        this.Res.write("??");            
    }
}; 
 
HtmlChatListener.prototype.enterCommand = function(ctx) {          
    if(ctx.SAYS() != null)
        this.Res.write(ctx.SAYS().getText() + ':' + '');
 
    if(ctx.SHOUTS() != null)
        this.Res.write(ctx.SHOUTS().getText() + ':' + '
');
};
 
HtmlChatListener.prototype.exitLine = function(ctx) {              
    this.Res.write("");
};
 
exports.HtmlChatListener = HtmlChatListener;

After the requires function calls we make our HtmlChatListener to extend ChatListener. The interesting stuff starts at line 17.

The ctx argument is an instance of a specific class context for the node that we are entering/exiting. So for enterName is NameContext , for exitEmoticon is EmoticonContext , etc. This specific context will have the proper elements for the rule, that would make possible to easily access the respective tokens and subrules. For example, NameContext will contain fields like WORD() and WHITESPACE(); CommandContext will contain fields like WHITESPACE() , SAYS() and SHOUTS().

These functions, enter* and exit*, are called by the walker everytime the corresponding nodes are entered or exited while it's traversing the AST that represents the program newline. A listener allows you to execute some code, but it's important to remember that you can't stop the execution of the walker and the execution of the functions .

On line 18, we start by printing a strong tag because we want the name to be bold, then on exitName we take the text from the token WORD and close the tag. Note that we ignore the WHITESPACE token, nothing says that we have to show everything. In this case we could have done everything either on the enter or exit function.

On the function exitEmoticon we simply transform the emoticon text in an emoji character. We get the text of the whole rule because there are no tokens defined for this parser rule. On enterCommand , instead there could be any of two tokens SAYS or SHOUTS , so we check which one is defined. And then we alter the following text, by transforming in uppercase, if it's a SHOUT. Note that we close the p tag at the exit of the line rule, because the command, semantically speaking, alter all the text of the message.

All we have to do now is launching node, with nodejs antlr.js , and point our browser at its address, usually at http://localhost:1337/ and we will be greeted with the following image.

So all is good, we just have to add all the different listeners to handle the rest of the language. Let's start with color and message .

15. Working with a Listener

We have seen how to start defining a listener. Now let's get serious on see how to evolve in a complete, robust listener. Let's start by adding support for color and checking the results of our hard work.

HtmlChatListener.prototype.enterColor = function(ctx) {     
    var color = ctx.WORD().getText();         
    this.Res.write('');        
};
 
HtmlChatListener.prototype.exitColor = function(ctx) {          
    this.Res.write("");    
}; 
 
HtmlChatListener.prototype.exitMessage = function(ctx) {             
    this.Res.write(ctx.getText());
};
 
exports.HtmlChatListener = HtmlChatListener;

Except that it doesn't work. Or maybe it works too much: we are writing some part of message twice (“this will work”): first when we check the specific nodes, children of message , and then at the end.

Luckily with Javascript we can dynamically alter objects, so we can take advantage of this fact to change the *Context object themselves.

HtmlChatListener.prototype.exitColor = function(ctx) {         
    ctx.text += ctx.message().text;    
    ctx.text += '';
};
 
HtmlChatListener.prototype.exitEmoticon = function(ctx) {      
    var emoticon = ctx.getText();        
    
    if(emoticon == ':-)' || emoticon == ':)')
    {        
        ctx.text = "??";
    }
    
    if(emoticon == ':-(' || emoticon == ':(')
    {          
        ctx.text = "??";
    }
}; 
 
HtmlChatListener.prototype.exitMessage = function(ctx) {                
    var text = '';
 
    for (var index = 0; index <  ctx.children.length; index++ ) {
        if(ctx.children[index].text != null)
            text += ctx.children[index].text;
        else
            text += ctx.children[index].getText();
    }
 
    if(ctx.parentCtx instanceof ChatParser.ChatParser.LineContext == false)
    {
        ctx.text = text;        
    }
    else
    {
        this.Res.write(text);
        this.Res.write("");
    }
};

Only the modified parts are shown in the snippet above. We add a text field to every node that transforms its text, and then at the exit of every message we print the text if it's the primary message, the one that is directly child of the line rule. If it's a message, that is also a child of color, we add the text field to the node we are exiting and let color print it. We check this on line 30, where we look at the parent node to see if it's an instance of the object LineContext . This is also further evidence of how each ctx argument corresponds to the proper type.

Between lines 23 and 27 we can see another field of every node of the generated tree: children , which obviously it contains the children node. You can observe that if a field text exists we add it to the proper variable, otherwise we use the usual function to get the text of the node.

16. Solving Ambiguities with Semantic Predicates

So far we have seen how to build a parser for a chat language in Javascript. Let's continue working on this grammar but switch to python. Remember that all code is available in the repository . Before that, we have to solve an annoying problem: the TEXT token. The solution we have is terrible, and furthermore, if we tried to get the text of the token we would have to trim the edges, parentheses or square brackets. 所以，我们能做些什么？

We can use a particular feature of ANTLR called semantic predicates. As the name implies they are expressions that produce a boolean value. They selectively enable or disable the following rule and thus permit to solve ambiguities. Another reason that they could be used is to support different version of the same language, for instance a version with a new construct or an old without it.

Technically they are part of the larger group of actions , that allows to embed arbitrary code into the grammar. The downside is that the grammar is no more language independent , since the code in the action must be valid for the target language. For this reason, usually it's considered a good idea to only use semantic predicates, when they can't be avoided, and leave most of the code to the visitor/listener.

link                : '[' TEXT ']' '(' TEXT ')';
 
TEXT                : {self._input.LA(-1) == ord('[') or self._input.LA(-1) == ord('(')}? ~[\])]+ ;

We restored link to its original formulation, but we added a semantic predicate to the TEXT token, written inside curly brackets and followed by a question mark. We use self._input.LA(-1) to check the character before the current one, if this character is a square bracket or the open parenthesis, we activate the TEXT token. It's important to repeat that this must be valid code in our target language, it's going to end up in the generated Lexer or Parser, in our case in ChatLexer.py.

This matters not just for the syntax itself, but also because different targets might have different fields or methods, for instance LA returns an int in python, so we have to convert the char to a int .

Let's look at the equivalent form in other languages.

// C#. Notice that is .La and not .LA
TEXT : {_input.La(-1) == '[' || _input.La(-1) == '('}? ~[\])]+ ;
// Java
TEXT : {_input.LA(-1) == '[' || _input.LA(-1) == '('}? ~[\])]+ ;
// Javascript
TEXT : {this._input.LA(-1) == '[' || this._input.LA(-1) == '('}? ~[\])]+ ;

If you want to test for the preceding token, you can use the _input.LT(-1,) but you can only do that for parser rules. For example, if you want to enable the mention rule only if preceded by a WHITESPACE token.

// C#
mention: {_input.Lt(-1).Type == WHITESPACE}? '@' WORD ;
// Java
mention: {_input.LT(1).getType() == WHITESPACE}? '@' WORD ;
// Python
mention: {self._input.LT(-1).text == ' '}? '@' WORD ;
// Javascript
mention: {this._input.LT(1).text == ' '}? '@' WORD ;

17. Continuing the Chat in Python

Before seeing the Python example, we must modify our grammar and put the TEXT token before the WORD one. Otherwise ANTLR might assign the incorrect token, in cases where the characters between parentheses or brackets are all valid for WORD , for instance if it where [this](link) .

Using ANTLR in python is not more difficult than with any other platform, you just need to pay attention to the version of Python, 2 or 3.

antlr4 -Dlanguage=Python3 Chat.g4

就是这样。 So when you have run the command, inside the directory of your python project, there will be a newly generated parser and a lexer. You may find interesting to look at ChatLexer.py and in particular the function TEXT_sempred (sempred stands for sem antic pred icate).

def TEXT_sempred(self, localctx:RuleContext, predIndex:int):
    if predIndex == 0:
        return self._input.LA(-1) == ord('[') or self._input.LA(-1) == ord('(')

You can see our predicate right in the code. This also means that you have to check that the correct libraries, for the functions used in the predicate, are available to the lexer.

18. The Python Way of Working with a Listener

The main file of a Python project is very similar to a Javascript one, mutatis mutandis of course. That is to say we have to adapt libraries and functions to the proper version for a different language.

import sys
from antlr4 import *
from ChatLexer import ChatLexer
from ChatParser import ChatParser
from HtmlChatListener import HtmlChatListener
 
def main(argv):
    input = FileStream(argv[1])
    lexer = ChatLexer(input)
    stream = CommonTokenStream(lexer)
    parser = ChatParser(stream)
    tree = parser.chat()
 
    output = open("output.html","w")
    
    htmlChat = HtmlChatListener(output)
    walker = ParseTreeWalker()
    walker.walk(htmlChat, tree)
        
    output.close()      
 
if __name__ == '__main__':
    main(sys.argv)

We have also changed the input and output to become files, this avoid the need to launch a server in Python or the problem of using characters that are not supported in the terminal.

import sys
from antlr4 import *
from ChatParser import ChatParser
from ChatListener import ChatListener
 
class HtmlChatListener(ChatListener) :
    def __init__(self, output):
        self.output = output
        self.output.write('')
 
    def enterName(self, ctx:ChatParser.NameContext) :
        self.output.write("") 
 
    def exitName(self, ctx:ChatParser.NameContext) :
        self.output.write(ctx.WORD().getText()) 
        self.output.write(" ") 
 
    def enterColor(self, ctx:ChatParser.ColorContext) :
        color = ctx.WORD().getText()
        ctx.text = ''        
 
    def exitColor(self, ctx:ChatParser.ColorContext):         
        ctx.text += ctx.message().text
        ctx.text += ''
 
    def exitEmoticon(self, ctx:ChatParser.EmoticonContext) : 
        emoticon = ctx.getText()
 
        if emoticon == ':-)' or emoticon == ':)' :
            ctx.text = "??"
    
        if emoticon == ':-(' or emoticon == ':(' :
            ctx.text = "??"
 
    def enterLink(self, ctx:ChatParser.LinkContext):
        ctx.text = '%s' % (ctx.TEXT()[1], (ctx.TEXT()[0]))
 
    def exitMessage(self, ctx:ChatParser.MessageContext):
        text = ''
 
        for child in ctx.children:
            if hasattr(child, 'text'):
                text += child.text
            else:
                text += child.getText()
        
        if isinstance(ctx.parentCtx, ChatParser.LineContext) is False:
            ctx.text = text
        else:    
            self.output.write(text)
            self.output.write("") 
 
    def enterCommand(self, ctx:ChatParser.CommandContext):
        if ctx.SAYS() is not None :
            self.output.write(ctx.SAYS().getText() + ':' + '')
 
        if ctx.SHOUTS() is not None :
            self.output.write(ctx.SHOUTS().getText() + ':' + '')    
 
    def exitChat(self, ctx:ChatParser.ChatContext):
        self.output.write("")

Apart from lines 35-36, where we introduce support for links, there is nothing new. Though you might notice that Python syntax is cleaner and, while having dynamic typing, it is not loosely typed as Javascript. The different types of *Context objects are explicitly written out. If only Python tools were as easy to use as the language itself. But of course we cannot just fly over python like this, so we also introduce testing.

19. Testing with Python

While Visual Studio Code have a very nice extension for Python, that also supports unit testing, we are going to use the command line for the sake of compatibility.

python3 -m unittest discover -s . -p ChatTests.py

That's how you run the tests, but before that we have to write them. Actually, even before that, we have to write an ErrorListener to manage errors that we could find. While we could simply read the text outputted by the default error listener, there is an advantage in using our own implementation, namely that we can control more easily what happens.

import sys
from antlr4 import *
from ChatParser import ChatParser
from ChatListener import ChatListener
from antlr4.error.ErrorListener import *
import io
 
class ChatErrorListener(ErrorListener):
 
    def __init__(self, output):
        self.output = output        
        self._symbol = ''
    
    def syntaxError(self, recognizer, offendingSymbol, line, column, msg, e):        
        self.output.write(msg)
        self._symbol = offendingSymbol.text
 
    @property        
    def symbol(self):
        return self._symbol

Our class derives from ErrorListener and we simply have to implement syntaxError . Although we also add a property symbol to easily check which symbol might have caused an error.

from antlr4 import *
from ChatLexer import ChatLexer
from ChatParser import ChatParser
from HtmlChatListener import HtmlChatListener
from ChatErrorListener import ChatErrorListener
import unittest
import io
 
class TestChatParser(unittest.TestCase):
 
    def setup(self, text):        
        lexer = ChatLexer(InputStream(text))        
        stream = CommonTokenStream(lexer)
        parser = ChatParser(stream)
        
        self.output = io.StringIO()
        self.error = io.StringIO()
 
        parser.removeErrorListeners()        
        errorListener = ChatErrorListener(self.error)
        parser.addErrorListener(errorListener)  
 
        self.errorListener = errorListener              
        
        return parser
        
    def test_valid_name(self):
        parser = self.setup("John ")
        tree = parser.name()               
    
        htmlChat = HtmlChatListener(self.output)
        walker = ParseTreeWalker()
        walker.walk(htmlChat, tree)              
 
        # let's check that there aren't any symbols in errorListener         
        self.assertEqual(len(self.errorListener.symbol), 0)
 
    def test_invalid_name(self):
        parser = self.setup("Joh-")
        tree = parser.name()               
    
        htmlChat = HtmlChatListener(self.output)
        walker = ParseTreeWalker()
        walker.walk(htmlChat, tree)              
 
        # let's check the symbol in errorListener
        self.assertEqual(self.errorListener.symbol, '-')
 
if __name__ == '__main__':
    unittest.main()

The setup method is used to ensure that everything is properly set; on lines 19-21 we setup also our ChatErrorListener , but first we remove the default one, otherwise it would still output errors on the standard output. We are listening to errors in the parser, but we could also catch errors generated by the lexer. It depends on what you want to test. You may want to check both.

The two proper test methods checks for a valid and an invalid name. The checks are linked to the property symbol , that we have previously defined, if it's empty everything is fine, otherwise it contains the symbol that created the error. Notice that on line 28, there is a space at the end of the string, because we have defined the rule name to end with a WHITESPACE token.

20. Parsing Markup

ANTLR can parse many things, including binary data, in that case tokens are made up of non printable characters. But a more common problem is parsing markup languages such as XML or HTML. Markup is also a useful format to adopt for your own creations, because it allows to mix unstructured text content with structured annotations. They fundamentally represent a form of smart document, containing both text and structured data. The technical term that describe them is island languages . This type is not restricted to include only markup, and sometimes it's a matter of perspective.

For example, you may have to build a parser that ignores preprocessor directives. In that case, you have to find a way to distinguish proper code from directives, which obeys different rules.

In any case, the problem for parsing such languages is that there is a lot of text that we don't actually have to parse, but we cannot ignore or discard, because the text contain useful information for the user and it is a structural part of the document. The solution is lexical modes , a way to parse structured content inside a larger sea of free text.

21. Lexical Modes

We are going to see how to use lexical modes, by starting with a new grammar.

lexer grammar MarkupLexer;
 
OPEN                : '[' -> pushMode(BBCODE) ;
TEXT                : ~('[')+ ;
 
// Parsing content inside tags
mode BBCODE;
 
CLOSE               : ']' -> popMode ;
SLASH               : '/' ;
EQUALS              : '=' ;
STRING              : '"' .*? '"' ;
ID                  : LETTERS+ ;
WS                  : [ \t\r\n] -> skip ;
 
fragment LETTERS    : [a-zA-Z] ;

Looking at the first line you could notice a difference: we are defining a lexer grammar , instead of the usual (combined) grammar . You simply can't define a lexical mode together with a parser grammar . You can use lexical modes only in a lexer grammar, not in a combined grammar. The rest is not suprising, as you can see, we are defining a sort of BBCode markup, with tags delimited by square brackets.

On lines 3, 7 and 9 you will find basically all that you need to know about lexical modes. You define one or more tokens that can delimit the different modes and activate them.

The default mode is already implicitly defined, if you need to define yours you simply use mode followed by a name. Other than for markup languages, lexical modes are typically used to deal with string interpolation. When a string literal can contain more than simple text, but things like arbitrary expressions.

When we used a combined grammar we could define tokens implicitly: when in a parser rule we used a string like '=' that is what we did. Now that we are using separate lexer and parser grammars we cannot do that. That means that every single token has to be defined explicitly. So we have definitions like SLASH or EQUALS which typically could be just be directly used in a parser rule. The concept is simple: in the lexer grammar we need to define all tokens, because they cannot be defined later in the parser grammar.

22. Parser Grammars

We look at the other side of a lexer grammar, so to speak.

parser grammar MarkupParser;
 
options { tokenVocab=MarkupLexer; }
 
file        : element* ;
 
attribute   : ID '=' STRING ;
 
content     : TEXT ;
 
element     : (content | tag) ;
 
tag         : '[' ID attribute? ']' element* '[' '/' ID ']' ;

On the first line we define a parser grammar . Since the tokens we need are defined in the lexer grammar, we need to use an option to say to ANTLR where it can find them. This is not necessary in combined grammars, since the tokens are defined in the same file.

There are many other options available, in the documentation .

There is almost nothing else to add, except that we define a content rule so that we can manage more easily the text that we find later in the program.

I just want to say that, as you can see, we don't need to explicitly use the tokens everytime (es. SLASH), but instead we can use the corresponding text (es. '/').

ANTLR will automatically transform the text in the corresponding token, but this can happen only if they are already defined. In short, it is as if we had written:

tag : OPEN ID attribute? CLOSE element* OPEN SLASH ID CLOSE ;

But we could not have used the implicit way, if we hadn't already explicitly defined them in the lexer grammar. Another way to look at this is: when we define a combined grammar ANTLR defines for use all the tokens, that we have not explicitly defined ourselves. When we need to use a separate lexer and a parser grammar, we have to define explicitly every token ourselves. Once we have done that, we can use them in every way we want.

Before moving to actual Java code, let's see the AST for a sample input.

You can easily notice that the element rule is sort of transparent: where you would expect to find it there is always going to be a tag or content . So why did we define it? There are two advantages: avoid repetition in our grammar and simplify managing the results of the parsing. We avoid repetition because if we did not have the element rule we should repeat (content|tag) everywhere it is used. What if one day we add a new type of element? In addition to that it simplify the processing of the AST because it makes both the node represent tag and content extend a comment ancestor.

高级

In this section we deepen our understanding of ANTLR. We will look at more complex examples and situations we may have to handle in our parsing adventures. We will learn how to perform more adavanced testing, to catch more bugs and ensure a better quality for our code. We will see what a visitor is and how to use it. Finally, we will see how to deal with expressions and the complexity they bring.

You can come back to this section when you need to deal with complex parsing problems.

23. The Markup Project in Java

You can follow the instructions in Java Setup or just copy the antlr-java folder of the companion repository. Once the file pom.xml is properly configured, this is how you build and execute the application.

// use mwn to generate the package
mvn package
// every time you need to execute the application
java -cp target/markup-example-1.0-jar-with-dependencies.jar me.tomassetti.examples.MarkupParser.App

As you can see, it isn't any different from any typical Maven project, although it's indeed more complicated that a typical Javascript or Python project. Of course, if you use an IDE you don't need to do anything different from your typical workflow.

24. The Main App.java

We are going to see how to write a typical ANTLR application in Java.

package me.tomassetti.examples.MarkupParser;
import org.antlr.v4.runtime.*;
import org.antlr.v4.runtime.tree.*;
 
public class App 
{
    public static void main( String[] args )
    {
        ANTLRInputStream inputStream = new ANTLRInputStream(
            "I would like to [b][i]emphasize[/i][/b] this and [u]underline [b]that[/b][/u] ." +
            "Let's not forget to quote: [quote author=\"John\"]You're wrong![/quote]");
        MarkupLexer markupLexer = new MarkupLexer(inputStream);
        CommonTokenStream commonTokenStream = new CommonTokenStream(markupLexer);
        MarkupParser markupParser = new MarkupParser(commonTokenStream);
 
        MarkupParser.FileContext fileContext = markupParser.file();                
        MarkupVisitor visitor = new MarkupVisitor();                
        visitor.visit(fileContext);        
    }
}

At this point the main java file should not come as a surprise, the only new development is the visitor. Of course, there are the obvious little differences in the names of the ANTLR classes and such. This time we are building a visitor, whose main advantage is the chance to control the flow of the program. While we are still dealing with text, we don't want to display it, we want to transform it from pseudo-BBCode to pseudo-Markdown.

25. Transforming Code with ANTLR

The first issue to deal with our translation from pseudo-BBCode to pseudo-Markdown is a design decision. Our two languages are different and frankly neither of the two original one is that well designed.

BBCode was created as a safety precaution, to make possible to disallow the use of HTML but giove some of its power to users. Markdown was created to be an easy to read and write format, that could be translated into HTML. So they both mimic HTML, and you can actually use HTML in a Markdown document. Let's start to look into how messy would be a real conversion.

package me.tomassetti.examples.MarkupParser;
 
import org.antlr.v4.runtime.*;
import org.antlr.v4.runtime.misc.*;
import org.antlr.v4.runtime.tree.*;
 
public class MarkupVisitor extends MarkupParserBaseVisitor
{
    @Override
    public String visitFile(MarkupParser.FileContext context)
    {
         visitChildren(context);
         
         System.out.println("");
         
         return null;
    }
    
    @Override
    public String visitContent(MarkupParser.ContentContext context)
    {
        System.out.print(context.TEXT().getText());
        
        return visitChildren(context);
    }
}

The first version of our visitor prints all the text and ignore all the tags.

You can see how to control the flow, either by calling visitChildren , or any other visit* function, and deciding what to return. We just need to override the methods that we want to change. Otherwise, the default implementation would just do like visitContent , on line 23, it will visit the children nodes and allows the visitor to continue. Just like for a listener, the argument is the proper context type. If you want to stop the visitor just return null as on line 15.

26. Joy and Pain of Transforming Code

Transforming code, even at a very simple level, comes with some complications. Let's start easy with some basic visitor methods.

@Override
public String visitContent(MarkupParser.ContentContext context)    
{          
    return context.getText();        
}    
 
@Override
public String visitElement(MarkupParser.ElementContext context)
{
    if(context.parent instanceof MarkupParser.FileContext)
    {
        if(context.content() != null)            
            System.out.print(visitContent(context.content()));            
        if(context.tag() != null)
            System.out.print(visitTag(context.tag()));
    }    
 
    return null;
}

Before looking at the main method, let's look at the supporting ones. Foremost we have changed visitContent by making it return its text instead of printing it. Second, we have overridden the visitElement so that it prints the text of its child, but only if it's a top element, and not inside a tag . In both cases, it achieve this by calling the proper visit* method. It knows which one to call because it checks if it actually has a tag or content node.

@Override
public String visitTag(MarkupParser.TagContext context)    
{
    String text = "";
    String startDelimiter = "", endDelimiter = "";
 
    String id = context.ID(0).getText();
    
    switch(id)
    {
        case "b":
            startDelimiter = endDelimiter = "**";                
        break;
        case "u":
            startDelimiter = endDelimiter = "*";                
        break;
        case "quote":
            String attribute = context.attribute().STRING().getText();
            attribute = attribute.substring(1,attribute.length()-1);
            startDelimiter = System.lineSeparator() + "> ";
            endDelimiter = System.lineSeparator() + "> " + System.lineSeparator() + "> – "
                         + attribute + System.lineSeparator();
        break;
    } 
 
    text += startDelimiter;
 
    for (MarkupParser.ElementContext node: context.element())
    {                
        if(node.tag() != null)
            text += visitTag(node.tag());
        if(node.content() != null)
            text += visitContent(node.content());                
    }        
    
    text += endDelimiter;
    
    return text;        
}

VisitTag contains more code than every other method, because it can also contain other elements, including other tags that have to be managed themselves, and thus they cannot be simply printed. We save the content of the ID on line 5, of course we don't need to check that the corresponding end tag matches, because the parser will ensure that, as long as the input is well formed.

The first complication starts with at lines 14-15: as it often happens when transforming a language in a different one, there isn't a perfect correspondence between the two. While BBCode tries to be a smarter and safer replacement for HTML, Markdown want to accomplish the same objective of HTML, to create a structured document. So BBCode has an underline tag, while Markdown does not.

So we have to make a decision

Do we want to discard the information, or directly print HTML, or something else? We choose something else and instead convert the underline to an italic. That might seem completely arbitrary, and indeed there is an element of choice in this decision. But the conversion forces us to lose some information, and both are used for emphasis, so we choose the closer thing in the new language.

The following case, on lines 18-22, force us to make another choice. We can't maintain the information about the author of the quote in a structured way, so we choose to print the information in a way that will make sense to a human reader.

On lines 28-34 we do our “magic”: we visit the children and gather their text, then we close with the endDelimiter . Finally we return the text that we have created.

That's how the visitor works

every top element visit each child
- if it's a content node, it directly returns the text
- if it's a tag , it setups the correct delimiters and then it checks its children. It repeats step 2 for each children and then it returns the gathered text
it prints the returned text

It's obviously a simple example, but it show how you can have great freedom in managing the visitor once you have launched it. Together with the patterns that we have seen at the beginning of this section you can see all of the options: to return null to stop the visit, to return children to continue, to return something to perform an action ordered at an higher level of the tree.

27. Advanced Testing

The use of lexical modes permit to handle the parsing of island languages, but it complicates testing.

We are not going to show MarkupErrorListener.java because w edid not changed it; if you need you can see it on the repository.

You can run the tests by using the following command.

mvn test

Now we are going to look at the tests code. We are skipping the setup part, because that also is obvious, we just copy the process seen on the main file, but we simply add our error listener to intercept the errors.

// private variables inside the class AppTest
private MarkupErrorListener errorListener;
private MarkupLexer markupLexer;
 
public void testText()
{
    MarkupParser parser = setup("anything in here");
 
    MarkupParser.ContentContext context = parser.content();        
    
    assertEquals("",this.errorListener.getSymbol());
}
 
public void testInvalidText()
{
    MarkupParser parser = setup("[anything in here");
 
    MarkupParser.ContentContext context = parser.content();        
    
    assertEquals("[",this.errorListener.getSymbol());
}
 
public void testWrongMode()
{
    MarkupParser parser = setup("author=\"john\"");                
 
    MarkupParser.AttributeContext context = parser.attribute(); 
    TokenStream ts = parser.getTokenStream();        
    
    assertEquals(MarkupLexer.DEFAULT_MODE, markupLexer._mode);
    assertEquals(MarkupLexer.TEXT,ts.get(0).getType());        
    assertEquals("author=\"john\"",this.errorListener.getSymbol());
}
 
public void testAttribute()
{
    MarkupParser parser = setup("author=\"john\"");
    // we have to manually push the correct mode
    this.markupLexer.pushMode(MarkupLexer.BBCODE);
 
    MarkupParser.AttributeContext context = parser.attribute(); 
    TokenStream ts = parser.getTokenStream();        
    
    assertEquals(MarkupLexer.ID,ts.get(0).getType());
    assertEquals(MarkupLexer.EQUALS,ts.get(1).getType());
    assertEquals(MarkupLexer.STRING,ts.get(2).getType()); 
    
    assertEquals("",this.errorListener.getSymbol());
}
 
public void testInvalidAttribute()
{
    MarkupParser parser = setup("author=/\"john\"");
    // we have to manually push the correct mode
    this.markupLexer.pushMode(MarkupLexer.BBCODE);
    
    MarkupParser.AttributeContext context = parser.attribute();        
    
    assertEquals("/",this.errorListener.getSymbol());
}

The first two methods are exactly as before, we simply check that there are no errors, or that there is the correct one because the input itself is erroneous. On lines 30-32 things start to get interesting: the issue is that by testing the rules one by one we don't give the chance to the parser to switch automatically to the correct mode. So it remains always on the DEFAULT_MODE, which in our case makes everything looks like TEXT . This obviously makes the correct parsing of an attribute impossible.

The same lines shows also how you can check the current mode that you are in, and the exact type of the tokens that are found by the parser, which we use to confirm that indeed all is wrong in this case.

While we could use a string of text to trigger the correct mode, each time, that would make testing intertwined with several pieces of code, which is a no-no. So the solution is seen on line 39: we trigger the correct mode manually. Once you have done that, you can see that our attribute is recognized correctly.

28. Dealing with Expressions

So far we have written simple parser rules, now we are going to see one of the most challenging parts in analyzing a real (programming) language: expressions. While rules for statements are usually larger they are quite simple to deal with: you just need to write a rule that encapsulate the structure with the all the different optional parts. For instance a for statement can include all other kind of statements, but we can simply include them with something like statement*. An expression, instead, can be combined in many different ways.

An expression usually contains other expressions. For example the typical binary expression is composed by an expression on the left, an operator in the middle and another expression on the right. This can lead to ambiguities. Think, for example, at the expression 5 + 3 * 2 , for ANTLR this expression is ambiguous because there are two ways to parse it. It could either parse it as 5 + (3 * 2) or (5 +3) * 2.

Until this moment we have avoided the problem simply because markup constructs surround the object on which they are applied. So there is not ambiguity in choosing which one to apply first: it's the most external. Imagine if this expression was written as:

That would make obvious to ANTLR how to parse it.

These types of rules are called left-recursive rules. You might say: just parse whatever comes first. The problem with that is semantic: the addition comes first, but we know that multiplications have a precedence over additions. Traditionally the way to solve this problem was to create a complex cascade of specific expressions like this:

expression     : addition;
addition       : multiplication ('+' multiplication)* ;
multiplication : atom ('*' atom)* ;
atom           : NUMBER ;

This way ANTLR would have known to search first for a number, then for multiplications and finally for additions. This is cumbersome and also counterintuitive, because the last expression is the first to be actually recognized. Luckily ANTLR4 can create a similar structure automatically, so we can use a much more natural syntax .

expression : expression '*' expression
           | expression '+' expression                      
           | NUMBER
           ;

In practice ANTLR consider the order in which we defined the alternatives to decide the precedence. By writing the rule in this way we are telling to ANTLR that the multiplication has precedence on the addition.

29. Parsing Spreadsheets

Now we are prepared to create our last application, in C#. We are going to build the parser of an Excel-like application. In practice, we want to manage the expressions you write in the cells of a spreadsheet.

grammar Spreadsheet;
 
expression          : '(' expression ')'                        #parenthesisExp
                    | expression (ASTERISK|SLASH) expression    #mulDivExp
                    | expression (PLUS|MINUS) expression        #addSubExp
                    |   expression '^' expression  #powerExp
                    | NAME '(' expression ')'                   #functionExp
                    | NUMBER                                    #numericAtomExp
                    | ID                                        #idAtomExp
                    ;
 
fragment LETTER     : [a-zA-Z] ;
fragment DIGIT      : [0-9] ;
 
ASTERISK            : '*' ;
SLASH               : '/' ;
PLUS                : '+' ;
MINUS               : '-' ;
 
ID                  : LETTER DIGIT ;
 
NAME                : LETTER+ ;
 
NUMBER              : DIGIT+ ('.' DIGIT+)? ;
 
WHITESPACE          : ' ' -> skip;

With all the knowledge you have acquired so far everything should be clear, except for possibly three things:

why the parentheses are there,
what's the stuff on the right,
that thing on line 6.

The parentheses comes first because its only role is to give the user a way to override the precedence of operator, if it needs to do so. This graphical representation of the AST should make it clear.

The things on the right are labels , they are used to make ANTLR generate specific functions for the visitor or listener. So there will be a VisitFunctionExp , a VisitPowerExp , etc. This makes possible to avoid the use of a giant visitor for the expression rule.

The expression relative to exponentiation is different because there are two possible ways to act, to group them, when you meet two sequential expressions of the same type. The first one is to execute the one on the left first and then the one on the right, the second one is the inverse: this is called associativity . Usually the one that you want to use is left-associativity, which is the default option. Nonetheless exponentiation is right-associative , so we have to signal this to ANTLR.

Another way to look at this is: if there are two expressions of the same type, which one has the precedence: the left one or the right one? Again, an image is worth a thousand words.

We have also support for functions, alphanumeric variables that represents cells and real numbers.

30. The Spreadsheet Project in C#

You just need to follow the C# Setup : to install a nuget package for the runtime and an ANTLR4 extension for Visual Studio. The extension will automatically generate everything whenever you build your project: parser, listener and/or visitor.

After you have done that, you can also add grammar files just by using the usual menu Add -> New Item. Do exactly that to create a grammar called Spreadsheet.g4 and put in it the grammar we have just created. Now let's see the main Program.cs .

using System;
using Antlr4.Runtime;
 
namespace AntlrTutorial
{
    class Program
    {
        static void Main(string[] args)
        {
            string input = "log(10 + A1 * 35 + (5.4 - 7.4))";
 
            AntlrInputStream inputStream = new AntlrInputStream(input);
            SpreadsheetLexer spreadsheetLexer = new SpreadsheetLexer(inputStream);
            CommonTokenStream commonTokenStream = new CommonTokenStream(spreadsheetLexer);
            SpreadsheetParser spreadsheetParser = new SpreadsheetParser(commonTokenStream);
 
            SpreadsheetParser.ExpressionContext expressionContext = spreadsheetParser.expression();
            SpreadsheetVisitor visitor = new SpreadsheetVisitor();
            
            Console.WriteLine(visitor.Visit(expressionContext));
        }
    }
}

There is nothing to say, apart from that, of course, you have to pay attention to yet another slight variation in the naming of things: pay attention to the casing. For instance, AntlrInputStream , in the C# program, was ANTLRInputStream in the Java program.

Also you can notice that, this time, we output on the screen the result of our visitor, instead of writing the result on a file.

31. Excel is Doomed

We are going to take a look at our visitor for the Spreadsheet project.

public class SpreadsheetVisitor : SpreadsheetBaseVisitor
{
    private static DataRepository data = new DataRepository();
 
    public override double VisitNumericAtomExp(SpreadsheetParser.NumericAtomExpContext context)
    {            
        return double.Parse(context.NUMBER().GetText(), System.Globalization.CultureInfo.InvariantCulture);
    }
 
    public override double VisitIdAtomExp(SpreadsheetParser.IdAtomExpContext context)
    {
        String id = context.ID().GetText();
 
        return data[id];
    }
 
    public override double VisitParenthesisExp(SpreadsheetParser.ParenthesisExpContext context)
    {
        return Visit(context.expression());
    }
 
    public override double VisitMulDivExp(SpreadsheetParser.MulDivExpContext context)
    {
        double left = Visit(context.expression(0));
        double right = Visit(context.expression(1));
        double result = 0;
 
        if (context.ASTERISK() != null)
            result = left * right;
        if (context.SLASH() != null)
            result = left / right;
 
        return result;
    }
 
    [..]
 
    public override double VisitFunctionExp(SpreadsheetParser.FunctionExpContext context)
    {
        String name = context.NAME().GetText();
        double result = 0;
 
        switch(name)
        {
            case "sqrt":
                result = Math.Sqrt(Visit(context.expression()));
                break;
 
            case "log":
                result = Math.Log10(Visit(context.expression()));
                break;
        }
 
        return result;
    }
}

VisitNumeric and VisitIdAtom return the actual numbers that are represented either by the literal number or the variable. In a real scenario DataRepository would contain methods to access the data in the proper cell, but in our example is just a Dictionary with some keys and numbers. The other methods actually work in the same way: they visit/call the containing expression(s). The only difference is what they do with the results.

Some perform an operation on the result, the binary operations combine two results in the proper way and finally VisitParenthesisExp just reports the result higher on the chain. Math is simple, when it's done by a computer.

32. Testing Everything

Up until now we have only tested the parser rules, that is to say we have tested only if we have created the correct rule to parse our input. Now we are also going to test the visitor functions. This is the ideal chance because our visitor return values that we can check individually. In other occasions, for instance if your visitor prints something to the screen, you may want to rewrite the visitor to write on a stream. Then, at testing time, you can easily capture the output.

We are not going to show SpreadsheetErrorListener.cs because it's the same as the previous one we have already seen; if you need it you can see it on the repository.

To perform unit testing on Visual Studio you need to create a specific project inside the solution. You can choose different formats, we opt for the xUnit version. To run them there is an aptly named section “TEST” on the menu bar.

[Fact]
public void testExpressionPow()
{
    setup("5^3^2");
 
    PowerExpContext context = parser.expression() as PowerExpContext;
 
    CommonTokenStream ts = (CommonTokenStream)parser.InputStream;   
 
    Assert.Equal(SpreadsheetLexer.NUMBER, ts.Get(0).Type);
    Assert.Equal(SpreadsheetLexer.T__2, ts.Get(1).Type);
    Assert.Equal(SpreadsheetLexer.NUMBER, ts.Get(2).Type);
    Assert.Equal(SpreadsheetLexer.T__2, ts.Get(3).Type);
    Assert.Equal(SpreadsheetLexer.NUMBER, ts.Get(4).Type); 
}
 
[Fact]
public void testVisitPowerExp()
{
    setup("4^3^2");
 
    PowerExpContext context = parser.expression() as PowerExpContext;
 
    SpreadsheetVisitor visitor = new SpreadsheetVisitor();
    double result = visitor.VisitPowerExp(context);
 
    Assert.Equal(double.Parse("262144"), result);
}
 
[..]
 
[Fact]
public void testWrongVisitFunctionExp()
{
    setup("logga(100)");
 
    FunctionExpContext context = parser.expression() as FunctionExpContext;
    
    SpreadsheetVisitor visitor = new SpreadsheetVisitor();
    double result = visitor.VisitFunctionExp(context);
 
    CommonTokenStream ts = (CommonTokenStream)parser.InputStream;
 
    Assert.Equal(SpreadsheetLexer.NAME, ts.Get(0).Type);
    Assert.Equal(null, errorListener.Symbol);
    Assert.Equal(0, result);
}
 
[Fact]
public void testCompleteExp()
{
    setup("log(5+6*7/8)");
 
    ExpressionContext context = parser.expression();
 
    SpreadsheetVisitor visitor = new SpreadsheetVisitor();
    double result = visitor.Visit(context);
 
    Assert.Equal("1.01072386539177", result.ToString(System.Globalization.CultureInfo.GetCultureInfo("en-US").NumberFormat));            
}

The first test function is similar to the ones we have already seen; it checks that the corrects tokens are selected. On line 11 and 13 you may be surprised to see that weird token type, this happens because we didn't explicitly created one for the '^' symbol so one got automatically created for us. If you need you can see all the tokens by looking at the *.tokens file generated by ANTLR.

On line 25 we visit our test node and get the results, that we check on line 27. It's all very simple because our visitor is simple, while unit testing should always be easy and made up of small parts it really can't be easier than this.

The only thing to pay attention to it's related to the format of the number, it's not a problem here, but look at line 59, where we test the result of a whole expression. There we need to make sure that the correct format is selected, because different countries use different symbols as the decimal mark.

There are some things that depends on the cultural context

If your computer was already set to the American English Culture this wouldn't be necessary, but to guarantee the correct testing results for everybody we have to specify it. Keep that in mind if you are testing things that are culture-dependent: such as grouping of digits, temperatures, etc.

On line 44-46 you see than when we check for the wrong function the parser actually works. That's because indeed “logga” is syntactically valid as a function name, but it's not semantically correct. The function “logga” doesn't exists, so our program doesn't know what to do with it. So when we visit it we get 0 as a result. As you recall this was our choice: since we initialize the result to 0 and we don't have a default case in VisitFunctionExp. So if there no function the result remains 0. A possib alternative could be to throw an exception.

Final Remarks

In this section we see tips and tricks that never came up in our example, but can be useful in your programs. We suggest more resources you may find useful if you want to know more about ANTLR, both the practice and the theory, or you need to deal with the most complex problems.

33. Tips and Tricks

Let's see a few tricks that could be useful from time to time. These were never needed in our examples, but they have been quite useful in other scenarios.

Catchall Rule

The first one is the ANY lexer rule. This is simply a rule in the following format.

ANY : . ;

This is a catchall rule that should be put at the end of your grammar. It matches any character that didn't find its place during the parsing. So creating this rule can help you during development, when your grammar has still many holes that could cause distracting error messages. It's even useful during production, when it acts as a canary in the mines. If it shows up in your program you know that something is wrong.

频道

There is also something that we haven't talked about: channels . Their use case is usually handling comments. You don't really want to check for comments inside every of your statements or expressions, so you usually throw them way with -> skip . But there are some cases where you may want to preserve them, for instance if you are translating a program in another language. When this happens you use channels . There is already one called HIDDEN that you can use, but you can declare more of them at the top of your lexer grammar.

channels { UNIQUENAME }
// and you use them this way
COMMENTS : '//' ~[\r\n]+ -> channel(UNIQUENAME) ;

Rule Element Labels

There is another use of labels other than to distinguish among different cases of the same rule. They can be used to give a specific name, usually but not always of semantic value, to a common rule or parts of a rule. The format is label=rule , to be used inside another rule.

expression : left=expression (ASTERISK|SLASH) right=expression ;

This way left and right would become fields in the ExpressionContext nodes. And instead of using context.expression(0) , you could refer to the same entity using context.left .

Problematic Tokens

In many real languages some symbols are reused in different ways, some of which may lead to ambiguities. A common problematic example are the angle brackets, used both for bitshift expression and to delimit parameterized types.

// bitshift expression, it assigns to x the value of y shifted by three bits
x = y >> 3;
// parameterized types, it define x as a list of dictionaries
List> x;

The natural way of defining the bitshift operator token is as a single double angle brackets, '>>'. But this might lead to confusing a nested parameterized definition with the bitshift operator, for instance in the second example shown up here. While a simple way of solving the problem would be using semantic predicates, an excessive number of them would slow down the parsing phase. The solution is to avoid defining the bitshift operator token and instead using the angle brackets twice in the parser rule, so that the parser itself can choose the best candidate for every occasion.

// from this
RIGHT_SHIFT : '>>';
expression : ID RIGHT_SHIFT NUMBER;
// to this
expression : ID SHIFT SHIFT NUMBER;

34. Conclusions

We have learned a lot today:

what are a lexer and a parser
how to create lexer and parser rules
how to use ANTLR to generate parsers in Java, C#, Python and JavaScript
the fundamental kinds of problems you will encounter parsing and how to solve them
how to understand errors
how to test your parsers

That's all you need to know to use ANTLR on your own. And I mean literally, you may want to know more, but now you have solid basis to explore on your own.

Where to look if you need more information about ANTLR:

On this very website there is whole category dedicated to ANTLR .
The official ANTLR website is a good starting point to know the general status of the project, the specialized development tools and related project, like StringTemplate
The ANTLR documentation on GitHub ; especially useful are the information on targets and how to setup it on different languages .
The ANTLR 4.6 API ; it's related to the Java version, so there might be some differences in other languages, but it's the best place where to settle your doubts about the inner workings of this tool.
For the very interested in the science behind ANTLR4, there is an academic paper: Adaptive LL(*) Parsing: The Power of Dynamic Analysis
The Definitive ANTLR 4 Reference , by the man itself, Terence Parr , the creator of ANTLR. The resource you need if you want to know everything about ANTLR and a good deal about parsing languages in general.

Also the book it's only place where you can find and answer to question like these:

ANTLR v4 is the result of a minor detour (twenty-five years) I took in graduate
学校。 I guess I'm going to have to change my motto slightly.

Why program by hand in five days what you can spend twenty-five years of your
life automating?

We worked quite hard to build the largest tutorial on ANTLR: the mega-tutorial! A post over 13.000 words long, or more than 30 pages, to try answering all your questions about ANTLR. Missing something? Contact us and let us now, we are here to help

翻译自: https://www.javacodegeeks.com/2017/03/antlr-mega-tutorial.html

你可能感兴趣的:(python,java,人工智能,linux,编程语言)

android系统selinux中添加新属性property 辉色投像
1.定位/android/system/sepolicy/private/property_contexts声明属性开头：persist.charge声明属性类型：u:object_r:system_prop:s0图12.定位到android/system/sepolicy/public/domain.te删除neverallow{domain-init}default_prop:property
理解Gunicorn：Python WSGI服务器的基石范范0825 ipython linux 运维
理解Gunicorn：PythonWSGI服务器的基石介绍Gunicorn，全称GreenUnicorn，是一个为PythonWSGI（WebServerGatewayInterface）应用设计的高效、轻量级HTTP服务器。作为PythonWeb应用部署的常用工具，Gunicorn以其高性能和易用性著称。本文将介绍Gunicorn的基本概念、安装和配置，帮助初学者快速上手。1.什么是Gunico
Long类型前后端数据不一致 igotyback 前端
响应给前端的数据浏览器控制台中response中看到的Long类型的数据是正常的到前端数据不一致前后端数据类型不匹配是一个常见问题，尤其是当后端使用Java的Long类型（64位）与前端JavaScript的Number类型（最大安全整数为2^53-1，即16位）进行数据交互时，很容易出现精度丢失的问题。这是因为JavaScript中的Number类型无法安全地表示超过16位的整数。为了解决这个问
LocalDateTime 转 String igotyback java 开发语言
importjava.time.LocalDateTime;importjava.time.format.DateTimeFormatter;publicclassMain{publicstaticvoidmain(String[]args){//获取当前时间LocalDateTimenow=LocalDateTime.now();//定义日期格式化器DateTimeFormatterformat
Linux下QT开发的动态库界面弹出操作（SDL2） 13jjyao QT类 qt 开发语言 sdl2 linux
需求：操作系统为linux，开发框架为qt，做成需带界面的qt动态库，调用方为java等非qt程序难点：调用方为java等非qt程序，也就是说调用方肯定不带QApplication::exec()，缺少了这个，QTimer等事件和QT创建的窗口将不能弹出(包括opencv也是不能弹出)；这与qt调用本身qt库是有本质的区别的思路：1.调用方缺QApplication::exec()，那么我们在接口
【一起学Rust | 设计模式】习惯语法——使用借用类型作为参数、格式化拼接字符串、构造函数广龙宇一起学Rust #Rust设计模式 rust 设计模式开发语言
提示：文章写完后，目录可以自动生成，如何生成可参考右边的帮助文档文章目录前言一、使用借用类型作为参数二、格式化拼接字符串三、使用构造函数总结前言Rust不是传统的面向对象编程语言，它的所有特性，使其独一无二。因此，学习特定于Rust的设计模式是必要的。本系列文章为作者学习《Rust设计模式》的学习笔记以及自己的见解。因此，本系列文章的结构也与此书的结构相同（后续可能会调成结构），基本上分为三个部分
Python数据分析与可视化实战指南 William数据分析 python python 数据
在数据驱动的时代，Python因其简洁的语法、强大的库生态系统以及活跃的社区，成为了数据分析与可视化的首选语言。本文将通过一个详细的案例，带领大家学习如何使用Python进行数据分析，并通过可视化来直观呈现分析结果。一、环境准备1.1安装必要库在开始数据分析和可视化之前，我们需要安装一些常用的库。主要包括pandas、numpy、matplotlib和seaborn等。这些库分别用于数据处理、数学
python os.environ 江湖偌大 python 深度学习
os.environ['TF_CPP_MIN_LOG_LEVEL']='0'#默认值，输出所有信息os.environ['TF_CPP_MIN_LOG_LEVEL']='1'#屏蔽通知信息（INFO）os.environ['TF_CPP_MIN_LOG_LEVEL']='2'#屏蔽通知信息和警告信息（INFO\WARNING）os.environ['TF_CPP_MIN_LOG_LEVEL']='
Python中os.environ基本介绍及使用方法鹤冲天Pro #Python python 服务器开发语言
文章目录python中os.environos.environ简介os.environ进行环境变量的增删改查python中os.environ的使用详解1.简介2.key字段详解2.1常见key字段3.os.environ.get()用法4.环境变量的增删改查和判断是否存在4.1新增环境变量4.2更新环境变量4.3获取环境变量4.4删除环境变量4.5判断环境变量是否存在python中os.envi
Pyecharts数据可视化大屏：打造沉浸式数据分析体验我的运维人生信息可视化数据分析数据挖掘运维开发技术共享
Pyecharts数据可视化大屏：打造沉浸式数据分析体验在当今这个数据驱动的时代，如何将海量数据以直观、生动的方式展现出来，成为了数据分析师和企业决策者关注的焦点。Pyecharts，作为一款基于Python的开源数据可视化库，凭借其丰富的图表类型、灵活的配置选项以及高度的定制化能力，成为了构建数据可视化大屏的理想选择。本文将深入探讨如何利用Pyecharts打造数据可视化大屏，并通过实际代码案例
linux sdl windows.h,Windows下的SDL安装奔跑吧linux内核 linux sdl windows.h
首先你要下载并安装SDL开发包。如果装在C盘下，路径为C:\SDL1.2.5如果在WINDOWS下。你可以按以下步骤：1.打开VC++，点击"Tools",Options2,点击directories选项3.选择"Includefiles"增加一个新的路径。"C:\SDL1.2.5\include"4，现在选择"Libaryfiles“增加"C:\SDL1.2.5\lib"现在你可以开始编写你的第
Python教程：一文了解使用Python处理XPath 旦莫 Python进阶 python 开发语言
目录1.环境准备1.1安装lxml1.2验证安装2.XPath基础2.1什么是XPath？2.2XPath语法2.3示例XML文档3.使用lxml解析XML3.1解析XML文档3.2查看解析结果4.XPath查询4.1基本路径查询4.2使用属性查询4.3查询多个节点5.XPath的高级用法5.1使用逻辑运算符5.2使用函数6.实战案例6.1从网页抓取数据6.1.1安装Requests库6.1.2代
python os.environ_python os.environ 读取和设置环境变量 weixin_39605414 python os.environ
>>>importos>>>os.environ.keys()['LC_NUMERIC','GOPATH','GOROOT','GOBIN','LESSOPEN','SSH_CLIENT','LOGNAME','USER','HOME','LC_PAPER','PATH','DISPLAY','LANG','TERM','SHELL','J2REDIR','LC_MONETARY','QT_QPA
linux中sdl的使用教程,sdl使用入门 Melissa Corvinus linux中sdl的使用教程
本文通过一个简单示例讲解SDL的基本使用流程。示例中展示一个窗口，窗口里面有个随机颜色快随机移动。当我们鼠标点击关闭按钮时间窗口关闭。基本步骤如下：1.初始化SDL并创建一个窗口。SDL_Init()初始化SDL_CreateWindow()创建窗口2.纹理渲染存储RGB和存储纹理的区别：比如一个从左到右由红色渐变到蓝色的矩形，用存储RGB的话就需要把矩形中每个点的具体颜色值存储下来；而纹理只是一
PHP环境搭建详细教程好看资源平台前端 php
PHP是一个流行的服务器端脚本语言，广泛用于Web开发。为了使PHP能够在本地或服务器上运行，我们需要搭建一个合适的PHP环境。本教程将结合最新资料，介绍在不同操作系统上搭建PHP开发环境的多种方法，包括Windows、macOS和Linux系统的安装步骤，以及本地和Docker环境的配置。1.PHP环境搭建概述PHP环境的搭建主要分为以下几类：集成开发环境：例如XAMPP、WAMP、MAMP，这
使用 FinalShell 进行远程连接（ssh 远程连接 Linux 服务器）编程经验分享开发工具服务器 ssh linux
目录前言基本使用教程新建远程连接连接主机自定义命令路由追踪前言后端开发，必然需要和服务器打交道，部署应用，排查问题，查看运行日志等等。一般服务器都是集中部署在机房中，也有一些直接是云服务器，总而言之，程序员不可能直接和服务器直接操作，一般都是通过ssh连接来登录服务器。刚接触远程连接时，使用的是XSHELL来远程连接服务器，连接上就能够操作远程服务器了，但是仅用XSHELL并没有上传下载文件的功能
DIV+CSS+JavaScript技术制作网页（旅游主题网页设计与制作）云南大理 STU学生网页设计网页设计期末网页作业 html静态网页 html5期末大作业网页设计 web大作业
️精彩专栏推荐作者主页:【进入主页—获取更多源码】web前端期末大作业：【HTML5网页期末作业(1000套)】程序员有趣的告白方式：【HTML七夕情人节表白网页制作(110套)】文章目录二、网站介绍三、网站效果▶️1.视频演示2.图片演示四、网站代码HTML结构代码CSS样式代码五、更多源码二、网站介绍网站布局方面：计划采用目前主流的、能兼容各大主流浏览器、显示效果稳定的浮动网页布局结构。网站程
【华为OD机试真题2023B卷 JAVA&JS】We Are A Team 若博豆 java 算法华为 javascript
华为OD2023（B卷）机试题库全覆盖，刷题指南点这里WeAreATeam时间限制：1秒|内存限制：32768K|语言限制：不限题目描述：总共有n个人在机房，每个人有一个标号（1<=标号<=n），他们分成了多个团队，需要你根据收到的m条消息判定指定的两个人是否在一个团队中，具体的：1、消息构成为：abc，整数a、b分别代
探索OpenAI和LangChain的适配器集成：轻松切换模型提供商 nseejrukjhad langchain easyui 前端 python
#探索OpenAI和LangChain的适配器集成：轻松切换模型提供商##引言在人工智能和自然语言处理的世界中，OpenAI的模型提供了强大的能力。然而，随着技术的发展，许多人开始探索其他模型以满足特定需求。LangChain作为一个强大的工具，集成了多种模型提供商，通过提供适配器，简化了不同模型之间的转换。本篇文章将介绍如何使用LangChain的适配器与OpenAI集成，以便轻松切换模型提供商
使用Faiss进行高效相似度搜索 llzwxh888 faiss python
在现代AI应用中，快速和高效的相似度搜索是至关重要的。Faiss（FacebookAISimilaritySearch）是一个专门用于快速相似度搜索和聚类的库，特别适用于高维向量。本文将介绍如何使用Faiss来进行相似度搜索，并结合Python代码演示其基本用法。什么是Faiss？Faiss是一个由FacebookAIResearch团队开发的开源库，主要用于高维向量的相似性搜索和聚类。Faiss
python是什么意思中文-在python中%是什么意思编程大乐趣
Python中%有两种：1、数值运算：%代表取模，返回除法的余数。如：>>>7%212、%操作符（字符串格式化，stringformatting），说明如下：%[(name)][flags][width].[precision]typecode(name)为命名flags可以有+，-，''或0。+表示右对齐。-表示左对齐。''为一个空格，表示在正数的左侧填充一个空格，从而与负数对齐。0表示使用0填
深入理解 MultiQueryRetriever：提升向量数据库检索效果的强大工具 nseejrukjhad 数据库 python
深入理解MultiQueryRetriever：提升向量数据库检索效果的强大工具引言在人工智能和自然语言处理领域，高效准确的信息检索一直是一个关键挑战。传统的基于距离的向量数据库检索方法虽然广泛应用，但仍存在一些局限性。本文将介绍一种创新的解决方案：MultiQueryRetriever，它通过自动生成多个查询视角来增强检索效果，提高结果的相关性和多样性。MultiQueryRetriever的工
关于城市旅游的HTML网页设计——(旅游风景云南 5页)HTML+CSS+JavaScript 二挡起步 web前端期末大作业 javascript html css 旅游风景
⛵源码获取文末联系✈Web前端开发技术描述网页设计题材，DIV+CSS布局制作,HTML+CSS网页设计期末课程大作业|游景点介绍|旅游风景区|家乡介绍|等网站的设计与制作|HTML期末大学生网页设计作业，Web大学生网页HTML：结构CSS：样式在操作方面上运用了html5和css3，采用了div+css结构、表单、超链接、浮动、绝对定位、相对定位、字体样式、引用视频等基础知识JavaScrip
HTML网页设计制作大作业（div+css）云南我的家乡旅游景点带文字滚动二挡起步 web前端期末大作业 web设计网页规划与设计 html css javascript dreamweaver 前端
Web前端开发技术描述网页设计题材，DIV+CSS布局制作,HTML+CSS网页设计期末课程大作业游景点介绍|旅游风景区|家乡介绍|等网站的设计与制作HTML期末大学生网页设计作业HTML：结构CSS：样式在操作方面上运用了html5和css3，采用了div+css结构、表单、超链接、浮动、绝对定位、相对定位、字体样式、引用视频等基础知识JavaScript：做与用户的交互行为文章目录前端学习路线
Day1笔记-Python简介&标识符和关键字&输入输出 ~在杰难逃~ Python python 开发语言大数据数据分析数据挖掘
大家好，从今天开始呢，杰哥开展一个新的专栏，当然，数据分析部分也会不定时更新的，这个新的专栏主要是讲解一些Python的基础语法和知识，帮助0基础的小伙伴入门和学习Python，感兴趣的小伙伴可以开始认真学习啦！一、Python简介【了解】1.计算机工作原理编程语言就是用来定义计算机程序的形式语言。我们通过编程语言来编写程序代码，再通过语言处理程序执行向计算机发送指令，让计算机完成对应的工作，编程
python八股文面试题分享及解析(1) Shawn________ python
#1.'''a=1b=2不用中间变量交换a和b'''#1.a=1b=2a,b=b,aprint(a)print(b)结果：21#2.ll=[]foriinrange(3):ll.append({'num':i})print(11)结果:#[{'num':0},{'num':1},{'num':2}]#3.kk=[]a={'num':0}foriinrange(3):#0,12#可变类型，不仅仅改变
人工智能时代，程序员如何保持核心竞争力？ jmoych 人工智能
随着AIGC（如chatgpt、midjourney、claude等）大语言模型接二连三的涌现，AI辅助编程工具日益普及，程序员的工作方式正在发生深刻变革。有人担心AI可能取代部分编程工作，也有人认为AI是提高效率的得力助手。面对这一趋势,程序员应该如何应对?是专注于某个领域深耕细作，还是广泛学习以适应快速变化的技术环境?又或者，我们是否应该将重点转向AI无法轻易替代的软技能？让我们一起探讨程序员
每日算法&面试题，大厂特训二十八天——第二十天（树）肥学 ⚡算法题⚡面试题每日精进 java 算法数据结构
目录标题导读算法特训二十八天面试题点击直接资料领取导读肥友们为了更好的去帮助新同学适应算法和面试题，最近我们开始进行专项突击一步一步来。上一期我们完成了动态规划二十一天现在我们进行下一项对各类算法进行二十八天的一个小总结。还在等什么快来一起肥学进行二十八天挑战吧！！特别介绍小白练手专栏，适合刚入手的新人欢迎订阅编程小白进阶python有趣练手项目里面包括了像《机器人尬聊》《恶搞程序》这样的有趣文章
Python快速入门 —— 第三节：类与对象孤华暗香 Python快速入门 python 开发语言
第三节：类与对象目标：了解面向对象编程的基础概念，并学会如何定义类和创建对象。内容：类与对象：定义类：class关键字。类的构造函数：__init__()。类的属性和方法。对象的创建与使用。示例：classStudent:def__init__(self,name,age,major):self.name&#
libyuv之linux编译 jaronho Linux linux 运维服务器
文章目录一、下载源码二、编译源码三、注意事项1、银河麒麟系统（aarch64）（1）解决armv8-a+dotprod+i8mm指令集支持问题（2）解决armv9-a+sve2指令集支持问题一、下载源码到GitHub网站下载https://github.com/lemenkov/libyuv源码，或者用直接用git克隆到本地，如：gitclonehttps://github.com/lemenko
LeetCode[Math] - #66 Plus One Cwind java LeetCode 题解 Algorithm Math
原题链接：#66 Plus One 要求：给定一个用数字数组表示的非负整数，如num1 = {1, 2, 3, 9}, num2 = {9, 9}等，给这个数加上1。注意： 1. 数字的较高位存在数组的头上，即num1表示数字1239 2. 每一位（数组中的每个元素）的取值范围为0~9 难度：简单分析：题目比较简单，只须从数组
JQuery中$.ajax()方法参数详解 AILIKES JavaScript jsonp jquery Ajax json
url: 要求为String类型的参数，（默认为当前页地址）发送请求的地址。 type: 要求为String类型的参数，请求方式（post或get）默认为get。注意其他http请求方法，例如put和 delete也可以使用，但仅部分浏览器支持。 timeout: 要求为Number类型的参数，设置请求超时时间（毫秒）。此设置将覆盖$.ajaxSetup()方法的全局
JConsole & JVisualVM远程监视Webphere服务器JVM Kai_Ge JVisualVM JConsole Webphere
JConsole是JDK里自带的一个工具，可以监测Java程序运行时所有对象的申请、释放等动作，将内存管理的所有信息进行统计、分析、可视化。我们可以根据这些信息判断程序是否有内存泄漏问题。　　使用JConsole工具来分析WAS的JVM问题，需要进行相关的配置。　　首先我们看WAS服务器端的配置. 　　1、登录was控制台https://10.4.119.18
自定义annotation 120153216 annotation
Java annotation 自定义注释@interface的用法一、什么是注释说起注释，得先提一提什么是元数据(metadata)。所谓元数据就是数据的数据。也就是说，元数据是描述数据的。就象数据表中的字段一样，每个字段描述了这个字段下的数据的含义。而J2SE5.0中提供的注释就是java源代码的元数据，也就是说注释是描述java源
CentOS 5/6.X 使用 EPEL YUM源 2002wmj centos
CentOS 6.X 安装使用EPEL YUM源1. 查看操作系统版本[root@node1 ~]# uname -a Linux node1.test.com 2.6.32-358.el6.x86_64 #1 SMP Fri Feb 22 00:31:26 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux [root@node1 ~]#
在SQLSERVER中查找缺失和无用的索引SQL 357029540 SQL Server
--缺失的索引 SELECT avg_total_user_cost * avg_user_impact * ( user_scans + user_seeks ) AS PossibleImprovement , last_user_seek ,
Spring3 MVC 笔记（二） —json+rest优化 7454103 Spring3 MVC
接上次的 spring mvc 注解的一些详细信息！其实也是一些个人的学习笔记呵呵！
替换“\”的时候报错Unexpected internal error near index 1 \ ^ adminjun java “\替换”
发现还是有些东西没有刻子脑子里,,过段时间就没什么概念了,所以贴出来...以免再忘... 在拆分字符串时遇到通过 \ 来拆分，可是用所以想通过转义 \\ 来拆分的时候会报异常 public class Main { /*
POJ 1035 Spell checker(哈希表) aijuans 暴力求解--哈希表
/* 题意：输入字典，然后输入单词，判断字典中是否出现过该单词，或者是否进行删除、添加、替换操作，如果是，则输出对应的字典中的单词要求按照输入时候的排名输出题解：建立两个哈希表。一个存储字典和输入字典中单词的排名，一个进行最后输出的判重 */ #include <iostream> //#define using namespace std; const int HASH =
通过原型实现javascript Array的去重、最大值和最小值 ayaoxinchao JavaScript array prototype
用原型函数（prototype）可以定义一些很方便的自定义函数，实现各种自定义功能。本次主要是实现了Array的去重、获取最大值和最小值。实现代码如下： <script type="text/javascript"> Array.prototype.unique = function() { var a = {}; var le
UIWebView实现https双向认证请求 bewithme UIWebView https Objective-C
什么是HTTPS双向认证我已在先前的博文 ASIHTTPRequest实现https双向认证请求中有讲述，不理解的读者可以先复习一下。本文是用UIWebView来实现对需要客户端证书验证的服务请求，网上有些文章中有涉及到此内容，但都只言片语，没有讲完全，更没有完整的代码，让人困扰不已。但是此知
NoSQL数据库之Redis数据库管理(Redis高级应用之事务处理、持久化操作、pub_sub、虚拟内存) bijian1013 redis 数据库 NoSQL
3.事务处理 Redis对事务的支持目前不比较简单。Redis只能保证一个client发起的事务中的命令可以连续的执行，而中间不会插入其他client的命令。当一个client在一个连接中发出multi命令时，这个连接会进入一个事务上下文，该连接后续的命令不会立即执行，而是先放到一个队列中，当执行exec命令时，redis会顺序的执行队列中
各数据库分页sql备忘 bingyingao oracle sql 分页
ORACLE 下面这个效率很低 SELECT * FROM ( SELECT A.*, ROWNUM RN FROM (SELECT * FROM IPAY_RCD_FS_RETURN order by id desc) A ) WHERE RN <20; 下面这个效率很高 SELECT A.*, ROWNUM RN FROM (SELECT * FROM IPAY_RCD_
【Scala七】Scala核心一：函数 bit1129 scala
1. 如果函数体只有一行代码，则可以不用写{},比如 def print(x: Int) = println(x) 一行上的多条语句用分号隔开，则只有第一句属于方法体，例如 def printWithValue(x: Int) : String= println(x); "ABC" 上面的代码报错，因为，printWithValue的方法
了解GHC的factorial编译过程 bookjovi haskell
GHC相对其他主流语言的编译器或解释器还是比较复杂的，一部分原因是haskell本身的设计就不易于实现compiler，如lazy特性，static typed，类型推导等。关于GHC的内部实现有篇文章说的挺好，这里，文中在RTS一节中详细说了haskell的concurrent实现，里面提到了green thread，如果熟悉Go语言的话就会发现，ghc的concurrent实现和Go有点类
Java-Collections Framework学习与总结-LinkedHashMap BrokenDreams LinkedHashMap
前面总结了java.util.HashMap，了解了其内部由散列表实现，每个桶内是一个单向链表。那有没有双向链表的实现呢？双向链表的实现会具备什么特性呢？来看一下HashMap的一个子类——java.util.LinkedHashMap。
读《研磨设计模式》-代码笔记-抽象工厂模式-Abstract Factory bylijinnan abstract
声明：本文只为方便我个人查阅和理解，详细的分析以及源代码请移步原作者的博客http://chjavach.iteye.com/ package design.pattern; /* * Abstract Factory Pattern * 抽象工厂模式的目的是： * 通过在抽象工厂里面定义一组产品接口，方便地切换“产品簇” * 这些接口是相关或者相依赖的
压暗面部高光 cherishLC PS
方法一、压暗高光&重新着色当皮肤很油又使用闪光灯时，很容易在面部形成高光区域。下面讲一下我今天处理高光区域的心得：皮肤可以分为纹理和色彩两个属性。其中纹理主要由亮度通道（Lab模式的L通道）决定，色彩则由a、b通道确定。处理思路为在保持高光区域纹理的情况下，对高光区域着色。具体步骤为：降低高光区域的整体的亮度，再进行着色。如果想简化步骤，可以只进行着色（参看下面的步骤1
Java VisualVM监控远程JVM crabdave visualvm
Java VisualVM监控远程JVM JDK1.6开始自带的VisualVM就是不错的监控工具. 这个工具就在JAVA_HOME\bin\目录下的jvisualvm.exe, 双击这个文件就能看到界面通过JMX连接远程机器, 需要经过下面的配置: 1. 修改远程机器JDK配置文件 (我这里远程机器是linux).
Saiku去掉登录模块 daizj saiku 登录 olap BI
1、修改applicationContext-saiku-webapp.xml <security:intercept-url pattern="/rest/**" access="IS_AUTHENTICATED_ANONYMOUSLY" /> <security:intercept-url pattern=&qu
浅析 Flex中的Focus dsjt html Flex Flash
关键字：focus、 setFocus、 IFocusManager、KeyboardEvent 焦点、设置焦点、获得焦点、键盘事件一、无焦点的困扰——组件监听不到键盘事件原因：只有获得焦点的组件（确切说是InteractiveObject）才能监听到键盘事件的目标阶段；键盘事件（flash.events.KeyboardEvent）参与冒泡阶段，所以焦点组件的父项（以及它爸
Yii全局函数使用 dcj3sjt126com yii
由于YII致力于完美的整合第三方库，它并没有定义任何全局函数。yii中的每一个应用都需要全类别和对象范围。例如，Yii::app()->user;Yii::app()->params['name'];等等。我们可以自行设定全局函数，使得代码看起来更加简洁易用。(原文地址) 我们可以保存在globals.php在protected目录下。然后，在入口脚本index.php的，我们包括在
设计模式之单例模式二（解决无序写入的问题） come_for_dream 单例模式 volatile 乱序执行双重检验锁
在上篇文章中我们使用了双重检验锁的方式避免懒汉式单例模式下由于多线程造成的实例被多次创建的问题，但是因为由于JVM为了使得处理器内部的运算单元能充分利用，处理器可能会对输入代码进行乱序执行（Out Of Order Execute）优化，处理器会在计算之后将乱序执行的结果进行重组，保证该
程序员从初级到高级的蜕变 gcq511120594 框架工作 PHP android html5
软件开发是一个奇怪的行业，市场远远供不应求。这是一个已经存在多年的问题，而且随着时间的流逝，愈演愈烈。我们严重缺乏能够满足需求的人才。这个行业相当年轻。大多数软件项目是失败的。几乎所有的项目都会超出预算。我们解决问题的最佳指导方针可以归结为——“用一些通用方法去解决问题，当然这些方法常常不管用，于是，唯一能做的就是不断地尝试，逐个看看是否奏效”。现在我们把淫浸代码时间超过3年的开发人员称为
Reverse Linked List hcx2013 list
Reverse a singly linked list. /** * Definition for singly-linked list. * public class ListNode { * int val; * ListNode next; * ListNode(int x) { val = x; } * } */ p
Spring4.1新特性——数据库集成测试 jinnianshilongnian spring 4.1
目录 Spring4.1新特性——综述 Spring4.1新特性——Spring核心部分及其他 Spring4.1新特性——Spring缓存框架增强 Spring4.1新特性——异步调用和事件机制的异常处理 Spring4.1新特性——数据库集成测试脚本初始化 Spring4.1新特性——Spring MVC增强 Spring4.1新特性——页面自动化测试框架Spring MVC T
C# Ajax上传图片同时生成微缩图(附Demo) liyonghui160com
1.Ajax无刷新上传图片,详情请阅我的这篇文章。（jquery + c# ashx） 2.C#位图处理 System.Drawing。 3.最新demo支持IE7,IE8,Fir
Java list三种遍历方法性能比较 pda158 java
从c/c++语言转向java开发，学习java语言list遍历的三种方法，顺便测试各种遍历方法的性能，测试方法为在ArrayList中插入1千万条记录，然后遍历ArrayList，发现了一个奇怪的现象，测试代码例如以下： package com.hisense.tiger.list; import java.util.ArrayList; import java.util.Iterator;
300个涵盖IT各方面的免费资源（上）——商业与市场篇 shoothao seo 商业与市场 IT资源免费资源
A.网站模板+logo+服务器主机+发票生成 HTML5 UP:响应式的HTML5和CSS3网站模板。 Bootswatch:免费的Bootstrap主题。 Templated:收集了845个免费的CSS和HTML5网站模板。 Wordpress.org|Wordpress.com:可免费创建你的新网站。 Strikingly:关注领域中免费无限的移动优
localStorage、sessionStorage uule localStorage
W3School 例子 HTML5 提供了两种在客户端存储数据的新方法： localStorage - 没有时间限制的数据存储 sessionStorage - 针对一个 session 的数据存储之前，这些都是由 cookie 完成的。但是 cookie 不适合大量数据的存储，因为它们由每个对服务器的请求来传递，这使得 cookie 速度很慢而且效率也不