Antlr 4 的 备忘

overview

Antlr4 是一个强大的解析器的生成器,实现的词法/语法分析,可以用来读取、处理、执行或翻译结构化文本,ANTLR可以从语法上来生成一个可以构建和遍历解析树的解析器

additional

  1. 原生SqlBase只兼容大写,现兼容字母大小写

stage

  1. 引入sql,通过CharStreams.fromString(sql)将原生sql转为可识别的流:CharStreams
  2. 构造SqlBaseLexer词法分析器
  3. 构造Token流
  4. 生产最终SqlBaseParser对象
SqlBaseLexer lexer = new SqlBaseLexer(CharStreams.fromString(sql));
CommonTokenStream tokenStream = new CommonTokenStream(lexer);
SqlBaseParser parser = new SqlBaseParser(tokenStream);
ParseTreeWalker walker = new ParseTreeWalker();
MySqlBaseBaseListener mySqlBaseBaseListener = new MySqlBaseBaseListener();
walker.walk(mySqlBaseBaseListener, parser.statement());

code examples

create table

create table table1 (
gender string comment 'gender',
name string comment 'name',
age int comment 'age',
income double comment 'income'
) comment 'user info'
create table.png

对于非insert、select,核心为:

statement
    : query                                                            #statementDefault
    | USE db=identifier                                                #use
    | CREATE DATABASE (IF NOT EXISTS)? identifier
        (COMMENT comment=STRING)? locationSpec?
        (WITH DBPROPERTIES tablePropertyList)?                         #createDatabase
    | ALTER DATABASE identifier SET DBPROPERTIES tablePropertyList     #setDatabaseProperties
    | DROP DATABASE (IF EXISTS)? identifier (RESTRICT | CASCADE)?      #dropDatabase
    | createTableHeader ('(' colTypeList ')')? tableProvider
        ((OPTIONS options=tablePropertyList) |
        (PARTITIONED BY partitionColumnNames=identifierList) |
        bucketSpec |
        locationSpec |
        (COMMENT comment=STRING) |
        (TBLPROPERTIES tableProps=tablePropertyList))*
        (AS? query)?                                                   #createTable
    | createTableHeader ('(' columns=colTypeList ')')?

......

sub select

select 
name,
age,
sum(income) 
from 
(
select 
gender,
name,
age,
income 
from 
table1 
where 
name = 'allen'
) table2
group by 
name,age
select tree.png

对于查询来看,核心在于规则

querySpecification
    : (((SELECT kind=TRANSFORM '(' namedExpressionSeq ')'
        | kind=MAP namedExpressionSeq
        | kind=REDUCE namedExpressionSeq))
       inRowFormat=rowFormat?
       (RECORDWRITER recordWriter=STRING)?
       USING script=STRING
       (AS (identifierSeq | colTypeList | ('(' (identifierSeq | colTypeList) ')')))?
       outRowFormat=rowFormat?
       (RECORDREADER recordReader=STRING)?
       fromClause?
       (WHERE where=booleanExpression)?)
    | ((kind=SELECT (hints+=hint)* setQuantifier? namedExpressionSeq fromClause?
       | fromClause (kind=SELECT setQuantifier? namedExpressionSeq)?)
       lateralView*
       (WHERE where=booleanExpression)?
       aggregation?
       (HAVING having=booleanExpression)?
       windows?)
    ;

当然还有其他依赖,例如排序、聚合等等拓展Rule

some important rule instructions

匹配SqlBase.g4中sql的入口匹配规则,递归的遍历statement,以及其后的各个节点。在匹配过程中,碰到叶子节点,就将构造TreeNode

singleTableIdentifier
 : tableIdentifier EOF
 ;

匹配规则时(单表的标识符),则匹配TableIdentifier

singleTableIdentifier
 : tableIdentifier EOF
 ;

递归遍历对应的tableIdentifier,tableIdentifier的定义和遍历规则如下,当匹配到tableIdentifier,将直接生成TableIdentifier对象,而该对象是TreeNode的一种。

tableIdentifier
    : (db=identifier '.')? table=identifier
    ;

antlr additional rule example

singleStatement
    : statement EOF
    ;

如默认只解释一个sql语句,可以拓展为

multiStatement
    : statement SQL_SPLIT? | (statement SQL_SPLIT)+ EOF
    ;

SQL_SPLIT
    : ';'+ | ([\r\n]* ';'+ [\r\n]*)+
    ;

other

如何实现字段血缘关系?

你可能感兴趣的:(Antlr 4 的 备忘)