overview
Antlr4 是一个强大的解析器的生成器,实现的词法/语法分析,可以用来读取、处理、执行或翻译结构化文本,ANTLR可以从语法上来生成一个可以构建和遍历解析树的解析器
additional
- 原生SqlBase只兼容大写,现兼容字母大小写
stage
- 引入sql,通过
CharStreams.fromString(sql)
将原生sql转为可识别的流:CharStreams - 构造SqlBaseLexer词法分析器
- 构造Token流
- 生产最终SqlBaseParser对象
SqlBaseLexer lexer = new SqlBaseLexer(CharStreams.fromString(sql));
CommonTokenStream tokenStream = new CommonTokenStream(lexer);
SqlBaseParser parser = new SqlBaseParser(tokenStream);
ParseTreeWalker walker = new ParseTreeWalker();
MySqlBaseBaseListener mySqlBaseBaseListener = new MySqlBaseBaseListener();
walker.walk(mySqlBaseBaseListener, parser.statement());
code examples
create table
create table table1 (
gender string comment 'gender',
name string comment 'name',
age int comment 'age',
income double comment 'income'
) comment 'user info'
对于非insert、select,核心为:
statement
: query #statementDefault
| USE db=identifier #use
| CREATE DATABASE (IF NOT EXISTS)? identifier
(COMMENT comment=STRING)? locationSpec?
(WITH DBPROPERTIES tablePropertyList)? #createDatabase
| ALTER DATABASE identifier SET DBPROPERTIES tablePropertyList #setDatabaseProperties
| DROP DATABASE (IF EXISTS)? identifier (RESTRICT | CASCADE)? #dropDatabase
| createTableHeader ('(' colTypeList ')')? tableProvider
((OPTIONS options=tablePropertyList) |
(PARTITIONED BY partitionColumnNames=identifierList) |
bucketSpec |
locationSpec |
(COMMENT comment=STRING) |
(TBLPROPERTIES tableProps=tablePropertyList))*
(AS? query)? #createTable
| createTableHeader ('(' columns=colTypeList ')')?
......
sub select
select
name,
age,
sum(income)
from
(
select
gender,
name,
age,
income
from
table1
where
name = 'allen'
) table2
group by
name,age
对于查询来看,核心在于规则
querySpecification
: (((SELECT kind=TRANSFORM '(' namedExpressionSeq ')'
| kind=MAP namedExpressionSeq
| kind=REDUCE namedExpressionSeq))
inRowFormat=rowFormat?
(RECORDWRITER recordWriter=STRING)?
USING script=STRING
(AS (identifierSeq | colTypeList | ('(' (identifierSeq | colTypeList) ')')))?
outRowFormat=rowFormat?
(RECORDREADER recordReader=STRING)?
fromClause?
(WHERE where=booleanExpression)?)
| ((kind=SELECT (hints+=hint)* setQuantifier? namedExpressionSeq fromClause?
| fromClause (kind=SELECT setQuantifier? namedExpressionSeq)?)
lateralView*
(WHERE where=booleanExpression)?
aggregation?
(HAVING having=booleanExpression)?
windows?)
;
当然还有其他依赖,例如排序、聚合等等拓展Rule
some important rule instructions
匹配SqlBase.g4中sql的入口匹配规则,递归的遍历statement,以及其后的各个节点。在匹配过程中,碰到叶子节点,就将构造TreeNode
singleTableIdentifier
: tableIdentifier EOF
;
匹配规则时(单表的标识符),则匹配TableIdentifier
singleTableIdentifier
: tableIdentifier EOF
;
递归遍历对应的tableIdentifier,tableIdentifier的定义和遍历规则如下,当匹配到tableIdentifier,将直接生成TableIdentifier对象,而该对象是TreeNode的一种。
tableIdentifier
: (db=identifier '.')? table=identifier
;
antlr additional rule example
singleStatement
: statement EOF
;
如默认只解释一个sql语句,可以拓展为
multiStatement
: statement SQL_SPLIT? | (statement SQL_SPLIT)+ EOF
;
SQL_SPLIT
: ';'+ | ([\r\n]* ';'+ [\r\n]*)+
;
other
如何实现字段血缘关系?