SQL作为一种DSL(domain-specific language),可以理解为数据库的一种“编程语言”,与C、Java一样,真正执行这些文本字符串,需要先进行词法、语法分析,然后进行语义分析,编译器或者解释器才能将这些字符串转化为一系列确定的操作指令。
SQL解析引擎的作用就是词法、语法分析,将SQL解析成一颗抽象语法树AST,从而方便后续直接通过高级编程语言进行读取。当然与C、Java等编程语言相比,SQL相对来说简单很多,没有作用域、类、复杂的分支判断等。
抽象语法树 (Abstract Syntax Tree),简称 AST,它是源代码语法结构的一种抽象表示。它以树状的形式表现编程语言的语法结构,树上的每个节点都表示源代码中的一种结构。
ShardingSphere的解析引擎经过了三个版本的演化:
第一代SQL解析器:
sharding-jdbc在1.4.x 之前的版本使用的alibaba的druid(https://github.com/alibaba/druid),,druid) druid包含了一个手写的SQL解析器,优点是速度快,缺点是扩展不是很方便,只能通过修改源码。第二代 SQL 解析器
从 1.5.x 版本开始,ShardingSphere 重新实现了一个简化版 SQL 解析引擎。因为ShardingSphere 并不需要像druid那样将 SQL 转为完整的AST,所以采用对 SQL 半理解的方式,仅提炼数据分片需要关注的上下文,在满足需要的前提下,SQL 解析的性能和兼容性得到了进一步的提高。第三代 SQL 解析器
则从 3.0.x 版本开始,ShardingSphere统一将SQL解析器换成了基于antlr4实现,目的是为了更方便、更完整的支持SQL,例如对于复杂的表达式、递归、子查询等语句,因为后期ShardingSphere的定位已不仅仅是数据分片功能。
antlr4通过.g4文件定义解析词法和语法规则,ShardingSphere中将词法和语法文件进行了分离定义,例如mysql对应的g4文件,词法规则文件包括Alphabet.g4、Comments.g4、Keyword.g4、Literals.g4、MySQLKeyword.g4、Symbol.g4,语法规则文件有:BaseRule.g4、DALStatement.g4、DCLStatement.g4、DDLStatement.g4、DMLStatement.g4、RLStatement.g4、TCLStatement.g4,每个文件分别定义了一类关键字或者SQL类型规则。
ANTLR约定词法解析规则以大写字母开头,语法解析规则以小写字母开头,关于antlr4的使用方法可参见https://github.com/antlr/antlr4/blob/master/doc/index.md
Antlr4的g4文件
Keyword.g4它是一个纯词法规则文件,定义了SQL中通用的关键字
lexer grammar Keyword;
import Alphabet;
/* 跳过spaces, tabs, newlines */
WS
: [ \t\r\n] + ->skip
;
SELECT
: S E L E C T
;
INSERT
: I N S E R T
;
UPDATE
: U P D A T E
;
DELETE
: D E L E T E
;
CREATE
: C R E A T E
;
ALTER
: A L T E R
;
DROP
: D R O P
;
…
Symbol.g4定义了SQL中对应的计算、谓词运算符以及括号分号等标识符。
lexer grammar Symbol;
AND_: '&&';
OR_: '||';
NOT_: '!';
TILDE_: '~';
VERTICAL_BAR_: '|';
AMPERSAND_: '&';
SIGNED_LEFT_SHIFT_: '<<';
SIGNED_RIGHT_SHIFT_: '>>';
CARET_: '^';
MOD_: '%';
COLON_: ':';
PLUS_: '+';
MINUS_: '-';
ASTERISK_: '*';
SLASH_: '/';
BACKSLASH_: '\\';
…
Literals.g4定义SQL中的字面量值规则
lexer grammar Literals;
import Alphabet, Symbol;
IDENTIFIER_
: [A-Za-z_$0-9]*?[A-Za-z_$]+?[A-Za-z_$0-9]*
| BQ_ ~'`'+ BQ_
| (DQ_ ( '\\'. | '""' | ~('"'| '\\') )* DQ_)
;
STRING_
: (DQ_ ( '\\'. | '""' | ~('"'| '\\') )* DQ_)
| (SQ_ ('\\'. | '\'\'' | ~('\'' | '\\'))* SQ_)
;
NUMBER_
: INT_? DOT_? INT_ (E (PLUS_ | MINUS_)? INT_)?
;
…
MySQLKeyword.g4定义了MySQL中特有的关键字
lexer grammar MySQLKeyword;
import Alphabet;
USE
: U S E
;
DESCRIBE
: D E S C R I B E
;
SHOW
: S H O W
;
DATABASES
: D A T A B A S E S
;
DATABASE
: D A T A B A S E
;
SCHEMAS
: S C H E M A S
;
TABLES
: T A B L E S
;
TABLESPACE
: T A B L E S P A C E
;
COLUMNS
: C O L U M N S
;
FIELDS
: F I E L D S
;
…
Comments.g4定义SQL中注释词法规则
lexer grammar Comments;
import Symbol;
BLOCK_COMMENT: '/*' .*? '*/' -> channel(HIDDEN);
INLINE_COMMENT: (('-- ' | '#') ~[\r\n]* ('\r'? '\n' | EOF) | '--' ('\r'? '\n' | EOF)) -> channel(HIDDEN);
BaseRule.g4定义了SQL中的各类字面值语法规则
grammar BaseRule;
import Symbol, Keyword, MySQLKeyword, Literals;
parameterMarker
: QUESTION_
;
literals
: stringLiterals
| numberLiterals
| dateTimeLiterals
| hexadecimalLiterals
| bitValueLiterals
| booleanLiterals
| nullValueLiterals
;
stringLiterals
: characterSetName_? STRING_ collateClause_?
;
numberLiterals
: MINUS_? NUMBER_
;
dateTimeLiterals
: (DATE | TIME | TIMESTAMP) STRING_
| LBE_ identifier STRING_ RBE_
;
…
DMLStatement.g4定义DML语句的语法规则
grammar DMLStatement;
import Symbol, Keyword, MySQLKeyword, Literals, BaseRule;
insert
: INSERT insertSpecification_ INTO? tableName partitionNames_? (insertValuesClause | setAssignmentsClause | insertSelectClause) onDuplicateKeyClause?
;
insertSpecification_
: (LOW_PRIORITY | DELAYED | HIGH_PRIORITY)? IGNORE?
;
insertValuesClause
: columnNames? (VALUES | VALUE) assignmentValues (COMMA_ assignmentValues)*
;
insertSelectClause
: columnNames? select
;
onDuplicateKeyClause
: ON DUPLICATE KEY UPDATE assignment (COMMA_ assignment)*
;
replace
: REPLACE replaceSpecification_? INTO? tableName partitionNames_? (insertValuesClause | setAssignmentsClause | insertSelectClause)
;
replaceSpecification_
: LOW_PRIORITY | DELAYED
;
update
: UPDATE updateSpecification_ tableReferences setAssignmentsClause whereClause? orderByClause? limitClause?
;
updateSpecification_
: LOW_PRIORITY? IGNORE?
;
assignment
: columnName EQ_ assignmentValue
;
setAssignmentsClause
: SET assignment (COMMA_ assignment)*
;
assignmentValues
: LP_ assignmentValue (COMMA_ assignmentValue)* RP_
| LP_ RP_
;
assignmentValue
: expr | DEFAULT | blobValue
;
blobValue
: UL_BINARY STRING_
;
delete
: DELETE deleteSpecification_ (singleTableClause | multipleTablesClause) whereClause?
;
deleteSpecification_
: LOW_PRIORITY? QUICK? IGNORE?
;
DDLStatement.g4定义了DDL语句语法规则
grammar DDLStatement;
import Symbol, Keyword, MySQLKeyword, Literals, BaseRule, DMLStatement;
createTable
: CREATE createTableSpecification_? TABLE tableNotExistClause_ tableName (createDefinitionClause | createLikeClause)
;
alterTable
: ALTER TABLE tableName alterDefinitionClause?
;
dropTable
: DROP dropTableSpecification_ TABLE tableExistClause_ tableNames
;
dropIndex
: DROP INDEX dropIndexSpecification_? indexName (ON tableName)?
( ALGORITHM EQ_? (DEFAULT | INPLACE | COPY) | LOCK EQ_? (DEFAULT | NONE | SHARED | EXCLUSIVE) )*
;
truncateTable
: TRUNCATE TABLE? tableName
;
…
限于篇幅,其它的.g4文件这里就不贴其内容了,通过这些g4规则文件可以快速的得知目前ShardingSphere支持的SQL种类,对于不支持的,也可以通过修改或增加g4文件中规则进行扩展,这种方式要比druid在代码中写死的方式要灵活很多。不过这种自动生成的解析器相比手写解析器性能要低,官方文档给出的数据比第二代自研的 SQL 解析引擎慢 3-10 倍左右。
antlr4的github上其实提供了一个MySQL的g4文件https://github.com/antlr/grammars-v4/tree/master/sql/mysql/Positive-Technologies,如果不是像ShardingSphere这样打造一个通用性SQL处理工具,建议直接用这个即可。MySQL官方工具 mysql-workbench也采用了antlr4作为SQL解析工具, https://github.com/mysql/mysql-workbench/tree/8.0/library/parsers/grammars。
SQL种类
在看代码前,首先我们看下SQL的分类,因为ShardingSphere代码中很多地方都会根据这个分类来判断SQL的类型:
DML(Data Manipulation Language),数据操作类语句,包括select、insert、update、delete、selec for update、call
DAL(Data Administration Language,数据管理类语句,包括use、show databases、show tables、show colums、show createtable
DDL(Data Definition Language),数据定义类语句,包括create table、alter table、drop table、truncate table
TCL(Transaction Control Language),事务控制类语句,包括set transaction、set autocimmit、begin、commit、rollback、saveponit
DQL(Data Query Language),数据查询类语句,在ShardingSphere的antlr4文件中select属于DML,但部分类中如ShardingDQLResultMerger,将select又称为DQL。
RL(Replication Language),复制类数据,包括change master to、start slave、stop slave。
代码分析
回到org.apache.shardingsphere.underlying.route.DataNodeRouter#createRouteContext方法
private RouteContext createRouteContext(final String sql, final List
在5.x中,JDBC模式中parserEngine.parse方法是在ShardingSpherePreparedStatement的构造函数中进行调用;Proxy模式下是在MySQLComStmtPrepareExecutor#execute方法中进行调用。
org.apache.shardingsphere.driver.jdbc.core.statement.ShardingSpherePreparedStatement
private ShardingSpherePreparedStatement(final ShardingSphereConnection connection, final String sql,
final int resultSetType, final int resultSetConcurrency, final int resultSetHoldability, final boolean returnGeneratedKeys) throws SQLException {
if (Strings.isNullOrEmpty(sql)) {
throw new SQLException(SQLExceptionConstant.SQL_STRING_NULL_OR_EMPTY);
}
this.connection = connection;
schemaContexts = connection.getSchemaContexts();
this.sql = sql;
statements = new ArrayList<>();
parameterSets = new ArrayList<>();
sqlStatement = schemaContexts.getDefaultSchemaContext().getRuntimeContext().getSqlParserEngine().parse(sql, true);// 进行SQL解析
…
}
org.apache.shardingsphere.proxy.frontend.mysql.command.query.binary.prepare.MySQLComStmtPrepareExecutor
/**
* COM_STMT_PREPARE command executor for MySQL.
*/
public final class MySQLComStmtPrepareExecutor implements CommandExecutor {
private static final MySQLBinaryStatementRegistry PREPARED_STATEMENT_REGISTRY = MySQLBinaryStatementRegistry.getInstance();
private final MySQLComStmtPreparePacket packet;
private final SchemaContext schema;
public MySQLComStmtPrepareExecutor(final MySQLComStmtPreparePacket packet, final BackendConnection backendConnection) {
this.packet = packet;
schema = backendConnection.getSchema();
}
…
@Override
public Collection execute() {
Collection result = new LinkedList<>();
int currentSequenceId = 0;
SQLStatement sqlStatement = schema.getRuntimeContext().getSqlParserEngine().parse(packet.getSql(), true);
…
}
回到4.1.1版本,进入SQLParserEngine类中org.apache.shardingsphere.sql.parser.SQLParserEngine
public final class SQLParserEngine {
private final String databaseTypeName;
private final SQLParseResultCache cache = new SQLParseResultCache();
/**
* Parse SQL.
*
* @param sql SQL
* @param useCache use cache or not
* @return SQL statement
*/
public SQLStatement parse(final String sql, final boolean useCache) {
ParsingHook parsingHook = new SPIParsingHook();// SQL解析hook
parsingHook.start(sql);
try {
SQLStatement result = parse0(sql, useCache);//解析SQL
parsingHook.finishSuccess(result);
return result;
// CHECKSTYLE:OFF
} catch (final Exception ex) {
// CHECKSTYLE:ON
parsingHook.finishFailure(ex);
throw ex;
}
}
private SQLStatement parse0(final String sql, final boolean useCache) {
if (useCache) {
Optional cachedSQLStatement = cache.getSQLStatement(sql);
if (cachedSQLStatement.isPresent()) {// 如果缓存中有该SQL的解析结果,则直接复用
return cachedSQLStatement.get();
}
}
ParseTree parseTree = new SQLParserExecutor(databaseTypeName, sql).execute().getRootNode();// 解析SQL生成AST,ParseTree是antlr对应的解析树接口
SQLStatement result = (SQLStatement) ParseTreeVisitorFactory.newInstance(databaseTypeName, VisitorRule.valueOf(parseTree.getClass())).visit(parseTree); //通过访问者模式,将antlr的解析树转化为SQLStatement
if (useCache) {
cache.put(sql, result);
}
return result;
}
可以看到SQLParserEngine# parse方法操作有两个:1.创建SQLParserExecutor对象将SQL解析成antlr的ParseTree;2. 通过解析树访问器工厂类ParseTreeVisitorFactory创建ParseTreeVisitor实例将antlr的ParseTree对象转化为ShardingSphere自定义的SQLStatement对象。
接下来分别看下SQLParserExecutor类与ParseTreeVisitorFactory类:
org.apache.shardingsphere.sql.parser.core.parser.SQLParserExecutor
/**
* SQL parser executor.
*/
@RequiredArgsConstructor
public final class SQLParserExecutor {
private final String databaseTypeName;
private final String sql;
/**
* Execute to parse SQL.
*
* @return AST node
*/
public ParseASTNode execute() {
ParseASTNode result = towPhaseParse();
if (result.getRootNode() instanceof ErrorNode) {
throw new SQLParsingException(String.format("Unsupported SQL of `%s`", sql));
}
return result;
}
private ParseASTNode towPhaseParse() {//拼写错误,应该为twoPhaseParse,5.x中已修改
SQLParser sqlParser = SQLParserFactory.newInstance(databaseTypeName, sql);//创建该类型数据库对应的SQL解析器
try {
((Parser) sqlParser).setErrorHandler(new BailErrorStrategy());
((Parser) sqlParser).getInterpreter().setPredictionMode(PredictionMode.SLL);
return (ParseASTNode) sqlParser.parse();
} catch (final ParseCancellationException ex) {
((Parser) sqlParser).reset();
((Parser) sqlParser).setErrorHandler(new DefaultErrorStrategy());
((Parser) sqlParser).getInterpreter().setPredictionMode(PredictionMode.LL);
return (ParseASTNode) sqlParser.parse();
}
}
}
由上可看到,真正解析是由进入SQLParser 完成的,SQLParser 由其工厂类SQLParserFactory类负责创建。
这里采用了antlr4的Two-stage parsing提高解析性能,这算antlr4一个的标准写法 https://github.com/antlr/antlr4/issues/374#issuecomment-30952357,关于Two-stage parse更详细的介绍可参见https://www.antlr.org/papers/allstar-techreport.pdf。
org.apache.shardingsphere.sql.parser.core.parser.SQLParserFactory
/**
* SQL parser factory.
*/
@NoArgsConstructor(access = AccessLevel.PRIVATE)
public final class SQLParserFactory {
static {
NewInstanceServiceLoader.register(SQLParserConfiguration.class);
}
/**
* New instance of SQL parser.
*
* @param databaseTypeName name of database type
* @param sql SQL
* @return SQL parser
*/
public static SQLParser newInstance(final String databaseTypeName, final String sql) {
for (SQLParserConfiguration each : NewInstanceServiceLoader.newServiceInstances(SQLParserConfiguration.class)) {
if (each.getDatabaseTypeName().equals(databaseTypeName)) {// 创建对应数据库类型的SQL解析器
return createSQLParser(sql, each);
}
}
throw new UnsupportedOperationException(String.format("Cannot support database type '%s'", databaseTypeName));
}
@SneakyThrows
private static SQLParser createSQLParser(final String sql, final SQLParserConfiguration configuration) {// 根据SQLParserConfiguration里配置,创建对应的词法与语法解析器
Lexer lexer = (Lexer) configuration.getLexerClass().getConstructor(CharStream.class).newInstance(CharStreams.fromString(sql));
return configuration.getParserClass().getConstructor(TokenStream.class).newInstance(new CommonTokenStream(lexer));
}
}
由上可以看到SQLParser接口的实例创建需要知道对应数据库类型的Lexer和SQLParser类,这个对应关系封装在了SQLParserConfiguration接口中,SQLParserConfiguration接口针对不同的数据库类型提供了实现类。
以MySQL为例看下MySQLParserConfiguration类的实现:
org.apache.shardingsphere.sql.parser.mysql.MySQLParserConfiguration
/**
* SQL parser configuration for MySQL.
*/
public final class MySQLParserConfiguration implements SQLParserConfiguration {
@Override
public String getDatabaseTypeName() {
return "MySQL";
}
@Override
public Class extends SQLLexer> getLexerClass() {
return MySQLLexer.class;
}
@Override
public Class extends SQLParser> getParserClass() {
return MySQLParser.class;
}
@Override
public Class extends SQLVisitorFacade> getVisitorFacadeClass() {
return MySQLVisitorFacade.class;
}
}
可以看到其定义了MySQL使用的词法解析器MySQLLexer类,语法解析器MySQLParser类以及对应的访问器类MySQLVisitorFacade。
看下词法解析器MySQLLexer类:
org.apache.shardingsphere.sql.parser.mysql.lexer.MySQLLexer
/**
* SQL lexer for MySQL.
*/
public final class MySQLLexer extends MySQLStatementLexer implements SQLLexer {// MySQLStatementLexer是antlr根据.g4文件生成的MySQL词法解析器
public MySQLLexer(final CharStream input) {
super(input);
}
}
可以看到词法解析器就直接继承antlr自动生成的MySQLStatementLexer类,并没有额外的逻辑。
接着看下语法解析器MySQLParser:
org.apache.shardingsphere.sql.parser.mysql.parser.MySQLParser
/**
* SQL parser for MySQL.
*/
public final class MySQLParser extends MySQLStatementParser implements SQLParser {// MySQLStatementParser是antlr根据.g4文件生成的语法解析器
public MySQLParser(final TokenStream input) {
super(input);
}
@Override
public ASTNode parse() {
return new ParseASTNode(execute());
}// 根据antlr返回的ExecuteContext创建ParseASTNode对象
}
与MySQLLexer类似,MySQLParser继承自antlr自动生成的语法解析器MySQLStatementParser,在parse方法中调用MySQLStatementParser的execute方法得到antlr返回的ExecuteContext,然后基于此创建ParseASTNode对象返回。
接着看下MySQL对应的访问器MySQLVisitorFacade类
org.apache.shardingsphere.sql.parser.mysql.visitor.MySQLVisitorFacade
/**
* Visitor facade for MySQL.
*/
public final class MySQLVisitorFacade implements SQLVisitorFacade {
@Override
public Class extends DMLVisitor> getDMLVisitorClass() {
return MySQLDMLVisitor.class;
}
@Override
public Class extends DDLVisitor> getDDLVisitorClass() {
return MySQLDDLVisitor.class;
}
@Override
public Class extends TCLVisitor> getTCLVisitorClass() {
return MySQLTCLVisitor.class;
}
@Override
public Class extends DCLVisitor> getDCLVisitorClass() {
return MySQLDCLVisitor.class;
}
@Override
public Class extends DALVisitor> getDALVisitorClass() {
return MySQLDALVisitor.class;
}
@Override
public Class extends RLVisitor> getRLVisitorClass() {
return MySQLRLVisitor.class;
}
}
可以看到MySQLVisitorFacade提供了DML、DDL、DAL、DCL、RL对应的访问器Class信息,进入最常见的DML访问器类MySQLDMLVisitor类。
org.apache.shardingsphere.sql.parser.mysql.visitor.impl.MySQLDMLVisitor
/**
* DML visitor for MySQL.
*/
public final class MySQLDMLVisitor extends MySQLVisitor implements DMLVisitor {
…
// 遍历访问antlr生成的SelectClauseContext,创建SelectStatement
public ASTNode visitSelectClause(final SelectClauseContext ctx) {
SelectStatement result = new SelectStatement();
result.setProjections((ProjectionsSegment) visit(ctx.projections()));// 设置ProjectionsSegment
if (null != ctx.selectSpecification()) {
result.getProjections().setDistinctRow(isDistinct(ctx));
}
if (null != ctx.fromClause()) {
CollectionValue tableReferences = (CollectionValue) visit(ctx.fromClause());
for (TableReferenceSegment each : tableReferences.getValue()) {// 设置TableReferenceSegment
result.getTableReferences().add(each);
}
}
if (null != ctx.whereClause()) {// 设置WhereSegment
result.setWhere((WhereSegment) visit(ctx.whereClause()));
}
if (null != ctx.groupByClause()) {// 设置GroupBySegment
result.setGroupBy((GroupBySegment) visit(ctx.groupByClause()));
}
if (null != ctx.orderByClause()) {// 设置OrderBySegment
result.setOrderBy((OrderBySegment) visit(ctx.orderByClause()));
}
if (null != ctx.limitClause()) {// 设置LimitSegment
result.setLimit((LimitSegment) visit(ctx.limitClause()));
}
if (null != ctx.lockClause()) {// 设置LockSegment
result.setLock((LockSegment) visit(ctx.lockClause()));
}
return result;
}
…
@Override
public ASTNode visitWhereClause(final WhereClauseContext ctx) {// 访问WhereClauseContext创建WhereSegment
WhereSegment result = new WhereSegment(ctx.getStart().getStartIndex(), ctx.getStop().getStopIndex());
ASTNode segment = visit(ctx.expr());
if (segment instanceof OrPredicateSegment) {
result.getAndPredicates().addAll(((OrPredicateSegment) segment).getAndPredicates());
} else if (segment instanceof PredicateSegment) {
AndPredicate andPredicate = new AndPredicate();
andPredicate.getPredicates().add((PredicateSegment) segment);
result.getAndPredicates().add(andPredicate);
}
return result;
}
@Override
public ASTNode visitGroupByClause(final GroupByClauseContext ctx) {// 访问GroupByClauseContext创建GroupBySegment
Collection items = new LinkedList<>();
for (OrderByItemContext each : ctx.orderByItem()) {
items.add((OrderByItemSegment) visit(each));
}
return new GroupBySegment(ctx.getStart().getStartIndex(), ctx.getStop().getStopIndex(), items);
}
@Override
public ASTNode visitLimitClause(final LimitClauseContext ctx) {// 访问LimitClauseContext创建LimitSegment
if (null == ctx.limitOffset()) {
return new LimitSegment(ctx.getStart().getStartIndex(), ctx.getStop().getStopIndex(), null, (PaginationValueSegment) visit(ctx.limitRowCount()));
}
PaginationValueSegment rowCount;
PaginationValueSegment offset;
if (null != ctx.OFFSET()) {
rowCount = (PaginationValueSegment) visit(ctx.limitRowCount());
offset = (PaginationValueSegment) visit(ctx.limitOffset());
} else {
offset = (PaginationValueSegment) visit(ctx.limitOffset());
rowCount = (PaginationValueSegment) visit(ctx.limitRowCount());
}
return new LimitSegment(ctx.getStart().getStartIndex(), ctx.getStop().getStopIndex(), offset, rowCount);
}
}
可以看到MySQLDMLVisitor继承自MySQLVisitor, 前者负责访问DML相关的antlr生成的*Context类,然后读取相关信息构建对应的Segment类与SQLStatement类。
MySQLVisitor类则负责将DML、DDL、DAL等通用的一些解析操作,例如各类型的值、表名、列名等。
org.apache.shardingsphere.sql.parser.mysql.visitor.MySQLVisitor
public abstract class MySQLVisitor extends MySQLStatementBaseVisitor {
private int currentParameterIndex;
@Override
public final ASTNode visitParameterMarker(final ParameterMarkerContext ctx) {
return new ParameterMarkerValue(currentParameterIndex++);
}
…
@Override
public final ASTNode visitTableName(final TableNameContext ctx) {// 通过访问TableNameContext构建TableSegment
SimpleTableSegment result = new SimpleTableSegment(new TableNameSegment(ctx.getStart().getStartIndex(), ctx.getStop().getStopIndex(), (IdentifierValue) visit(ctx.name())));
OwnerContext owner = ctx.owner();
if (null != owner) {
result.setOwner(new OwnerSegment(owner.getStart().getStartIndex(), owner.getStop().getStopIndex(), (IdentifierValue) visit(owner.identifier())));
}
return result;
}
@Override
public final ASTNode visitColumnName(final ColumnNameContext ctx) {// 通过访问ColumnNameContext构建ColumnSegment
ColumnSegment result = new ColumnSegment(ctx.getStart().getStartIndex(), ctx.getStop().getStopIndex(), (IdentifierValue) visit(ctx.name()));
OwnerContext owner = ctx.owner();
if (null != owner) {
result.setOwner(new OwnerSegment(owner.getStart().getStartIndex(), owner.getStop().getStopIndex(), (IdentifierValue) visit(owner.identifier())));
}
return result;
}
…
}
MySQLVisitor的基类MySQLStatementBaseVisitor类是antlr根据g4文件自动生成访问器基类,这里就不展开介绍。
最后回到SQLParserEngine#parse0方法中,在得到ParseTree对象后,由ParseTreeVisitorFactory创建对应的访问器类进行visit操作,最后得到SQLStatement,看下ParseTreeVisitorFactory类:
org.apache.shardingsphere.sql.parser.core.visitor.ParseTreeVisitorFactory
/**
* Parse tree visitor factory.
*/
@NoArgsConstructor(access = AccessLevel.PRIVATE)
public final class ParseTreeVisitorFactory {
/**
* New instance of SQL visitor.
*
* @param databaseTypeName name of database type
* @param visitorRule visitor rule
* @return parse tree visitor
*/
public static ParseTreeVisitor newInstance(final String databaseTypeName, final VisitorRule visitorRule) {
for (SQLParserConfiguration each : NewInstanceServiceLoader.newServiceInstances(SQLParserConfiguration.class)) {
if (each.getDatabaseTypeName().equals(databaseTypeName)) {
return createParseTreeVisitor(each, visitorRule.getType());//创建解析树antlr ParseTree访问器
}
}
throw new UnsupportedOperationException(String.format("Cannot support database type '%s'", databaseTypeName));
}
@SneakyThrows
private static ParseTreeVisitor createParseTreeVisitor(final SQLParserConfiguration configuration, final SQLStatementType type) {
SQLVisitorFacade visitorFacade = configuration.getVisitorFacadeClass().getConstructor().newInstance();
switch (type) {
case DML:
return (ParseTreeVisitor) visitorFacade.getDMLVisitorClass().getConstructor().newInstance();
case DDL:
return (ParseTreeVisitor) visitorFacade.getDDLVisitorClass().getConstructor().newInstance();
case TCL:
return (ParseTreeVisitor) visitorFacade.getTCLVisitorClass().getConstructor().newInstance();
case DCL:
return (ParseTreeVisitor) visitorFacade.getDCLVisitorClass().getConstructor().newInstance();
case DAL:
return (ParseTreeVisitor) visitorFacade.getDALVisitorClass().getConstructor().newInstance();
case RL:
return (ParseTreeVisitor) visitorFacade.getRLVisitorClass().getConstructor().newInstance();
default:
throw new SQLParsingException("Can not support SQL statement type: `%s`", type);
}
}
}
可以看到ParseTreeVisitorFactory类中就是根据SQL的种类,然后调用SQLVisitorFacade方法得到获取到对应的ParseTreeVisitor实例。ParseTreeVisitor的实现类前面以MySQLDMLVisitor为例进行了介绍,这里就不重复了。
最后到回到org.apache.shardingsphere.underlying.route.DataNodeRouter#createRouteContext方法,在调用完 parserEngine.parse方法(前面已分析完)之后通过 SQLStatementContextFactory. newInstance方法将SQLStatement转换为SQLStatementContext对象。
private RouteContext createRouteContext(final String sql, final List
SQLStatementContext类相当于SQLStatement的二次处理类,它也是后续路由、改写等环节间传递的上下文对象,每种Context往往对应一个ContextEngine,与SQLStatement不同的是,这些Context对象已经包含了部分语义分析处理的逻辑,例如会根据需要生成衍生projection列,avg聚合函数会添加count、sum列,分页上下文时会添加生成修改后的offset和rowcount等。
org.apache.shardingsphere.sql.parser.binder.SQLStatementContextFactory
/**
* SQL statement context factory.
*/
@NoArgsConstructor(access = AccessLevel.PRIVATE)
public final class SQLStatementContextFactory {
/**
* Create SQL statement context.
*
* @param schemaMetaData table meta data
* @param sql SQL
* @param parameters SQL parameters
* @param sqlStatement SQL statement
* @return SQL statement context
*/
@SuppressWarnings("unchecked")
public static SQLStatementContext newInstance(final SchemaMetaData schemaMetaData, final String sql, final List
可以看到newInstance方法中会根据不同的SQL类型,创建对应的StatementContext实例,看下最常用的SelectStatementContext类
org.apache.shardingsphere.sql.parser.binder.statement.dml.SelectStatementContext
/**
* Select SQL statement context.
*/
@Getter
@ToString(callSuper = true)
public final class SelectStatementContext extends CommonSQLStatementContext implements TableAvailable, WhereAvailable {
private final TablesContext tablesContext;
private final ProjectionsContext projectionsContext;
private final GroupByContext groupByContext;
private final OrderByContext orderByContext;
private final PaginationContext paginationContext;
private final boolean containsSubquery;
…
public SelectStatementContext(final SchemaMetaData schemaMetaData, final String sql, final List parameters, final SelectStatement sqlStatement) {
super(sqlStatement);
tablesContext = new TablesContext(sqlStatement.getSimpleTableSegments());// 创建表名上下文
groupByContext = new GroupByContextEngine().createGroupByContext(sqlStatement);// 创建group by上下文
orderByContext = new OrderByContextEngine().createOrderBy(sqlStatement, groupByContext);// 创建order by上下文
projectionsContext = new ProjectionsContextEngine(schemaMetaData).createProjectionsContext(sql, sqlStatement, groupByContext, orderByContext);// 创建projection上下文
paginationContext = new PaginationContextEngine().createPaginationContext(sqlStatement, projectionsContext, parameters);// 创建分页上下文
containsSubquery = containsSubquery();
}…
}
可以看到在SelectStatementContext的构造函数中,创建了Select语句对应的所有上下文相关信息,包括projectionContext、tableContext、OrderByContext等。
排序上下文引擎类 org.apache.shardingsphere.sql.parser.binder.segment.select.orderby.engine.OrderByContextEngine
/**
* Order by context engine.
*/
public final class OrderByContextEngine {
/**
* Create order by context.
*
* @param selectStatement select statement
* @param groupByContext group by context
* @return order by context
*/
public OrderByContext createOrderBy(final SelectStatement selectStatement, final GroupByContext groupByContext) {
if (!selectStatement.getOrderBy().isPresent() || selectStatement.getOrderBy().get().getOrderByItems().isEmpty()) {
OrderByContext orderByItems = createOrderByContextForDistinctRowWithoutGroupBy(selectStatement, groupByContext);// 如果有distinct且没有group by,则需要添加order by从而实现流式查询进行优化
return null != orderByItems ? orderByItems : new OrderByContext(groupByContext.getItems(), !groupByContext.getItems().isEmpty());//如果有group by,生成group by列的对应的order by
}
List orderByItems = new LinkedList<>();
for (OrderByItemSegment each : selectStatement.getOrderBy().get().getOrderByItems()) {// 如果SQL本身就有order by,则按照原顺序生成
OrderByItem orderByItem = new OrderByItem(each);
if (each instanceof IndexOrderByItemSegment) {
orderByItem.setIndex(((IndexOrderByItemSegment) each).getColumnIndex());
}
orderByItems.add(orderByItem);
}
return new OrderByContext(orderByItems, false);
}
private OrderByContext createOrderByContextForDistinctRowWithoutGroupBy(final SelectStatement selectStatement, final GroupByContext groupByContext) {
if (groupByContext.getItems().isEmpty() && selectStatement.getProjections().isDistinctRow()) {// 没有group by但有distinct,,则添加各查询列的order by
int index = 0;
List orderByItems = new LinkedList<>();
for (ProjectionSegment projectionSegment : selectStatement.getProjections().getProjections()) {
if (projectionSegment instanceof ColumnProjectionSegment) {
ColumnProjectionSegment columnProjectionSegment = (ColumnProjectionSegment) projectionSegment;
ColumnOrderByItemSegment columnOrderByItemSegment = new ColumnOrderByItemSegment(columnProjectionSegment.getColumn(), OrderDirection.ASC);
OrderByItem item = new OrderByItem(columnOrderByItemSegment);
item.setIndex(index++);
orderByItems.add(item);
}
}
if (!orderByItems.isEmpty()) {
return new OrderByContext(orderByItems, true);
}
}
return null;
}
}
查询列上下文引擎类 org.apache.shardingsphere.sql.parser.binder.segment.select.projection.engine.ProjectionsContextEngine
/**
* Projections context engine.
*/
public final class ProjectionsContextEngine {
private final SchemaMetaData schemaMetaData;
private final ProjectionEngine projectionEngine;
public ProjectionsContextEngine(final SchemaMetaData schemaMetaData) {
this.schemaMetaData = schemaMetaData;
projectionEngine = new ProjectionEngine(schemaMetaData);
}
/**
* Create projections context.
*
* @param sql SQL
* @param selectStatement SQL statement
* @param groupByContext group by context
* @param orderByContext order by context
* @return projections context
*/
public ProjectionsContext createProjectionsContext(final String sql, final SelectStatement selectStatement, final GroupByContext groupByContext, final OrderByContext orderByContext) {
ProjectionsSegment projectionsSegment = selectStatement.getProjections();
Collection projections = getProjections(sql, selectStatement.getSimpleTableSegments(), projectionsSegment);
ProjectionsContext result = new ProjectionsContext(projectionsSegment.getStartIndex(), projectionsSegment.getStopIndex(), projectionsSegment.isDistinctRow(), projections);
// 如果group by和order by的列值在原SQL的projection中没有,则需要添加该衍生列
result.getProjections().addAll(getDerivedGroupByColumns(projections, groupByContext, selectStatement));// 添加group by对应的衍生projection
result.getProjections().addAll(getDerivedOrderByColumns(projections, orderByContext, selectStatement));// 添加order by对应的衍生projection
return result;
}
…
}
最后总结下SQL解析引擎的执行流程图:
[DB]表示不同的数据库都有对应的类,例如Lexer有MySQLStatementLexer、OracleStatementLexer、SQLServerStatementLexer、PostgreSQLLexer。