从源码看ShardingSphere设计-解析引擎篇

SQL作为一种DSL(domain-specific language),可以理解为数据库的一种“编程语言”,与C、Java一样,真正执行这些文本字符串,需要先进行词法、语法分析,然后进行语义分析,编译器或者解释器才能将这些字符串转化为一系列确定的操作指令。

SQL解析引擎的作用就是词法、语法分析,将SQL解析成一颗抽象语法树AST,从而方便后续直接通过高级编程语言进行读取。当然与C、Java等编程语言相比,SQL相对来说简单很多,没有作用域、类、复杂的分支判断等。

抽象语法树 (Abstract Syntax Tree),简称 AST,它是源代码语法结构的一种抽象表示。它以树状的形式表现编程语言的语法结构,树上的每个节点都表示源代码中的一种结构。

抽象语法树

ShardingSphere的解析引擎经过了三个版本的演化:

  • 第一代SQL解析器:
    sharding-jdbc在1.4.x 之前的版本使用的alibaba的druid(https://github.com/alibaba/druid),,druid) druid包含了一个手写的SQL解析器,优点是速度快,缺点是扩展不是很方便,只能通过修改源码。

  • 第二代 SQL 解析器
    从 1.5.x 版本开始,ShardingSphere 重新实现了一个简化版 SQL 解析引擎。因为ShardingSphere 并不需要像druid那样将 SQL 转为完整的AST,所以采用对 SQL 半理解的方式,仅提炼数据分片需要关注的上下文,在满足需要的前提下,SQL 解析的性能和兼容性得到了进一步的提高。

  • 第三代 SQL 解析器
    则从 3.0.x 版本开始,ShardingSphere统一将SQL解析器换成了基于antlr4实现,目的是为了更方便、更完整的支持SQL,例如对于复杂的表达式、递归、子查询等语句,因为后期ShardingSphere的定位已不仅仅是数据分片功能。

antlr4通过.g4文件定义解析词法和语法规则,ShardingSphere中将词法和语法文件进行了分离定义,例如mysql对应的g4文件,词法规则文件包括Alphabet.g4、Comments.g4、Keyword.g4、Literals.g4、MySQLKeyword.g4、Symbol.g4,语法规则文件有:BaseRule.g4、DALStatement.g4、DCLStatement.g4、DDLStatement.g4、DMLStatement.g4、RLStatement.g4、TCLStatement.g4,每个文件分别定义了一类关键字或者SQL类型规则。

ANTLR约定词法解析规则以大写字母开头,语法解析规则以小写字母开头,关于antlr4的使用方法可参见https://github.com/antlr/antlr4/blob/master/doc/index.md

Antlr4的g4文件

Keyword.g4它是一个纯词法规则文件,定义了SQL中通用的关键字

lexer grammar Keyword;

import Alphabet;
/* 跳过spaces, tabs, newlines */
WS
    : [ \t\r\n] + ->skip
    ;

SELECT
    : S E L E C T
    ;

INSERT
    : I N S E R T
    ;

UPDATE
    : U P D A T E
    ;

DELETE
    : D E L E T E
    ;

CREATE
    : C R E A T E
    ;

ALTER
    : A L T E R
    ;

DROP
    : D R O P
;
…

Symbol.g4定义了SQL中对应的计算、谓词运算符以及括号分号等标识符。

lexer grammar Symbol;

AND_:                '&&';
OR_:                 '||';
NOT_:                '!';
TILDE_:              '~';
VERTICAL_BAR_:       '|';
AMPERSAND_:          '&';
SIGNED_LEFT_SHIFT_:  '<<';
SIGNED_RIGHT_SHIFT_: '>>';
CARET_:              '^';
MOD_:                '%';
COLON_:              ':';
PLUS_:               '+';
MINUS_:              '-';
ASTERISK_:           '*';
SLASH_:              '/';
BACKSLASH_:          '\\';
…

Literals.g4定义SQL中的字面量值规则

lexer grammar Literals;

import Alphabet, Symbol;

IDENTIFIER_
    : [A-Za-z_$0-9]*?[A-Za-z_$]+?[A-Za-z_$0-9]*
    |  BQ_ ~'`'+ BQ_
    | (DQ_ ( '\\'. | '""' | ~('"'| '\\') )* DQ_)
    ;

STRING_ 
    : (DQ_ ( '\\'. | '""' | ~('"'| '\\') )* DQ_)
    | (SQ_ ('\\'. | '\'\'' | ~('\'' | '\\'))* SQ_)
    ;

NUMBER_
    : INT_? DOT_? INT_ (E (PLUS_ | MINUS_)? INT_)?
;
…

MySQLKeyword.g4定义了MySQL中特有的关键字

lexer grammar MySQLKeyword;

import Alphabet;

USE
    : U S E
    ;

DESCRIBE
    : D E S C R I B E
    ;

SHOW
    : S H O W
    ;

DATABASES
    : D A T A B A S E S
    ;

DATABASE
    : D A T A B A S E
    ;

SCHEMAS
    : S C H E M A S
    ;

TABLES
    : T A B L E S
    ;

TABLESPACE
    : T A B L E S P A C E
    ;

COLUMNS
    : C O L U M N S
    ;

FIELDS
    : F I E L D S
;
…

Comments.g4定义SQL中注释词法规则

lexer grammar Comments;

import Symbol;

BLOCK_COMMENT:  '/*' .*? '*/' -> channel(HIDDEN);
INLINE_COMMENT: (('-- ' | '#') ~[\r\n]* ('\r'? '\n' | EOF) | '--' ('\r'? '\n' | EOF)) -> channel(HIDDEN);

BaseRule.g4定义了SQL中的各类字面值语法规则

grammar BaseRule;

import Symbol, Keyword, MySQLKeyword, Literals;

parameterMarker
    : QUESTION_
    ;

literals
    : stringLiterals
    | numberLiterals
    | dateTimeLiterals
    | hexadecimalLiterals
    | bitValueLiterals
    | booleanLiterals
    | nullValueLiterals
    ;

stringLiterals
    : characterSetName_? STRING_ collateClause_?
    ;

numberLiterals
   : MINUS_? NUMBER_
   ;

dateTimeLiterals
    : (DATE | TIME | TIMESTAMP) STRING_
    | LBE_ identifier STRING_ RBE_
;
…

DMLStatement.g4定义DML语句的语法规则

grammar DMLStatement;

import Symbol, Keyword, MySQLKeyword, Literals, BaseRule;

insert
    : INSERT insertSpecification_ INTO? tableName partitionNames_? (insertValuesClause | setAssignmentsClause | insertSelectClause) onDuplicateKeyClause?
    ;

insertSpecification_
    : (LOW_PRIORITY | DELAYED | HIGH_PRIORITY)? IGNORE?
    ;

insertValuesClause
    : columnNames? (VALUES | VALUE) assignmentValues (COMMA_ assignmentValues)*
    ;

insertSelectClause
    : columnNames? select
    ;

onDuplicateKeyClause
    : ON DUPLICATE KEY UPDATE assignment (COMMA_ assignment)*
    ;

replace
    : REPLACE replaceSpecification_? INTO? tableName partitionNames_? (insertValuesClause | setAssignmentsClause | insertSelectClause)
    ;

replaceSpecification_
    : LOW_PRIORITY | DELAYED
    ;

update
    : UPDATE updateSpecification_ tableReferences setAssignmentsClause whereClause? orderByClause? limitClause?
    ;

updateSpecification_
    : LOW_PRIORITY? IGNORE?
    ;

assignment
    : columnName EQ_ assignmentValue
    ;

setAssignmentsClause
    : SET assignment (COMMA_ assignment)*
    ;

assignmentValues
    : LP_ assignmentValue (COMMA_ assignmentValue)* RP_
    | LP_ RP_
    ;

assignmentValue
    : expr | DEFAULT | blobValue
    ;

blobValue
    : UL_BINARY STRING_
    ;

delete
    : DELETE deleteSpecification_ (singleTableClause | multipleTablesClause) whereClause?
    ;

deleteSpecification_
    : LOW_PRIORITY? QUICK? IGNORE?
    ;

DDLStatement.g4定义了DDL语句语法规则

grammar DDLStatement;

import Symbol, Keyword, MySQLKeyword, Literals, BaseRule, DMLStatement;

createTable
    : CREATE createTableSpecification_? TABLE tableNotExistClause_ tableName (createDefinitionClause | createLikeClause)
    ;

alterTable
    : ALTER TABLE tableName alterDefinitionClause?
    ;

dropTable
    : DROP dropTableSpecification_ TABLE tableExistClause_ tableNames
    ;

dropIndex
    : DROP INDEX dropIndexSpecification_? indexName (ON tableName)?
    ( ALGORITHM EQ_? (DEFAULT | INPLACE | COPY) | LOCK EQ_? (DEFAULT | NONE | SHARED | EXCLUSIVE) )*
    ;

truncateTable
    : TRUNCATE TABLE? tableName
;
…

限于篇幅,其它的.g4文件这里就不贴其内容了,通过这些g4规则文件可以快速的得知目前ShardingSphere支持的SQL种类,对于不支持的,也可以通过修改或增加g4文件中规则进行扩展,这种方式要比druid在代码中写死的方式要灵活很多。不过这种自动生成的解析器相比手写解析器性能要低,官方文档给出的数据比第二代自研的 SQL 解析引擎慢 3-10 倍左右。

antlr4的github上其实提供了一个MySQL的g4文件https://github.com/antlr/grammars-v4/tree/master/sql/mysql/Positive-Technologies,如果不是像ShardingSphere这样打造一个通用性SQL处理工具,建议直接用这个即可。MySQL官方工具 mysql-workbench也采用了antlr4作为SQL解析工具, https://github.com/mysql/mysql-workbench/tree/8.0/library/parsers/grammars。

SQL种类

在看代码前,首先我们看下SQL的分类,因为ShardingSphere代码中很多地方都会根据这个分类来判断SQL的类型:

  • DML(Data Manipulation Language),数据操作类语句,包括select、insert、update、delete、selec for update、call

  • DAL(Data Administration Language,数据管理类语句,包括use、show databases、show tables、show colums、show createtable

  • DDL(Data Definition Language),数据定义类语句,包括create table、alter table、drop table、truncate table

  • TCL(Transaction Control Language),事务控制类语句,包括set transaction、set autocimmit、begin、commit、rollback、saveponit

  • DQL(Data Query Language),数据查询类语句,在ShardingSphere的antlr4文件中select属于DML,但部分类中如ShardingDQLResultMerger,将select又称为DQL。

  • RL(Replication Language),复制类数据,包括change master to、start slave、stop slave。

代码分析

回到org.apache.shardingsphere.underlying.route.DataNodeRouter#createRouteContext方法

    private RouteContext createRouteContext(final String sql, final List parameters, final boolean useCache) {
        SQLStatement sqlStatement = parserEngine.parse(sql, useCache);//解析SQL,生成SQL对应AST
        try {
            SQLStatementContext sqlStatementContext = SQLStatementContextFactory.newInstance(metaData.getSchema(), sql, parameters, sqlStatement);// 生成SQL Statement上下文,相当于一部分语义分析
            return new RouteContext(sqlStatementContext, parameters, new RouteResult());
            // TODO should pass parameters for master-slave
        } catch (final IndexOutOfBoundsException ex) {
            return new RouteContext(new CommonSQLStatementContext(sqlStatement), parameters, new RouteResult());
        }
    }
 
 

在5.x中,JDBC模式中parserEngine.parse方法是在ShardingSpherePreparedStatement的构造函数中进行调用;Proxy模式下是在MySQLComStmtPrepareExecutor#execute方法中进行调用。
org.apache.shardingsphere.driver.jdbc.core.statement.ShardingSpherePreparedStatement

    private ShardingSpherePreparedStatement(final ShardingSphereConnection connection, final String sql,
                                            final int resultSetType, final int resultSetConcurrency, final int resultSetHoldability, final boolean returnGeneratedKeys) throws SQLException {
        if (Strings.isNullOrEmpty(sql)) {
            throw new SQLException(SQLExceptionConstant.SQL_STRING_NULL_OR_EMPTY);
        }
        this.connection = connection;
        schemaContexts = connection.getSchemaContexts();
        this.sql = sql;
        statements = new ArrayList<>();
        parameterSets = new ArrayList<>();
        sqlStatement = schemaContexts.getDefaultSchemaContext().getRuntimeContext().getSqlParserEngine().parse(sql, true);// 进行SQL解析
…
    }

org.apache.shardingsphere.proxy.frontend.mysql.command.query.binary.prepare.MySQLComStmtPrepareExecutor

/**
 * COM_STMT_PREPARE command executor for MySQL.
 */
public final class MySQLComStmtPrepareExecutor implements CommandExecutor {
    
    private static final MySQLBinaryStatementRegistry PREPARED_STATEMENT_REGISTRY = MySQLBinaryStatementRegistry.getInstance();
    
    private final MySQLComStmtPreparePacket packet;
    
    private final SchemaContext schema;
    
    public MySQLComStmtPrepareExecutor(final MySQLComStmtPreparePacket packet, final BackendConnection backendConnection) {
        this.packet = packet;
        schema = backendConnection.getSchema();
    }
…
   @Override
    public Collection execute() {
        Collection result = new LinkedList<>();
        int currentSequenceId = 0;
        SQLStatement sqlStatement = schema.getRuntimeContext().getSqlParserEngine().parse(packet.getSql(), true);
…
}

回到4.1.1版本,进入SQLParserEngine类中org.apache.shardingsphere.sql.parser.SQLParserEngine

public final class SQLParserEngine {
    private final String databaseTypeName;
    private final SQLParseResultCache cache = new SQLParseResultCache();
    /**
     * Parse SQL.
     *
     * @param sql SQL
     * @param useCache use cache or not
     * @return SQL statement
     */
    public SQLStatement parse(final String sql, final boolean useCache) {
        ParsingHook parsingHook = new SPIParsingHook();// SQL解析hook
        parsingHook.start(sql);
        try {
            SQLStatement result = parse0(sql, useCache);//解析SQL
            parsingHook.finishSuccess(result);
            return result;
            // CHECKSTYLE:OFF
        } catch (final Exception ex) {
            // CHECKSTYLE:ON
            parsingHook.finishFailure(ex);
            throw ex;
        }
    }
    
    private SQLStatement parse0(final String sql, final boolean useCache) {
        if (useCache) {
            Optional cachedSQLStatement = cache.getSQLStatement(sql);
            if (cachedSQLStatement.isPresent()) {// 如果缓存中有该SQL的解析结果,则直接复用
                return cachedSQLStatement.get();
            }
        }
        ParseTree parseTree = new SQLParserExecutor(databaseTypeName, sql).execute().getRootNode();// 解析SQL生成AST,ParseTree是antlr对应的解析树接口
        SQLStatement result = (SQLStatement) ParseTreeVisitorFactory.newInstance(databaseTypeName, VisitorRule.valueOf(parseTree.getClass())).visit(parseTree); //通过访问者模式,将antlr的解析树转化为SQLStatement
        if (useCache) {
            cache.put(sql, result);
        }
        return result;
    }

可以看到SQLParserEngine# parse方法操作有两个:1.创建SQLParserExecutor对象将SQL解析成antlr的ParseTree;2. 通过解析树访问器工厂类ParseTreeVisitorFactory创建ParseTreeVisitor实例将antlr的ParseTree对象转化为ShardingSphere自定义的SQLStatement对象。

接下来分别看下SQLParserExecutor类与ParseTreeVisitorFactory类:
org.apache.shardingsphere.sql.parser.core.parser.SQLParserExecutor

/**
 * SQL parser executor.
 */
@RequiredArgsConstructor
public final class SQLParserExecutor {
    
    private final String databaseTypeName;
    
    private final String sql;
    
    /**
     * Execute to parse SQL.
     *
     * @return AST node
     */
    public ParseASTNode execute() {
        ParseASTNode result = towPhaseParse();
        if (result.getRootNode() instanceof ErrorNode) {
            throw new SQLParsingException(String.format("Unsupported SQL of `%s`", sql));
        }
        return result;
    }
    
    private ParseASTNode towPhaseParse() {//拼写错误,应该为twoPhaseParse,5.x中已修改
        SQLParser sqlParser = SQLParserFactory.newInstance(databaseTypeName, sql);//创建该类型数据库对应的SQL解析器
        try {
            ((Parser) sqlParser).setErrorHandler(new BailErrorStrategy());
            ((Parser) sqlParser).getInterpreter().setPredictionMode(PredictionMode.SLL);
            return (ParseASTNode) sqlParser.parse();
        } catch (final ParseCancellationException ex) {
            ((Parser) sqlParser).reset();
            ((Parser) sqlParser).setErrorHandler(new DefaultErrorStrategy());
            ((Parser) sqlParser).getInterpreter().setPredictionMode(PredictionMode.LL);
            return (ParseASTNode) sqlParser.parse();
        }
    }
}

由上可看到,真正解析是由进入SQLParser 完成的,SQLParser 由其工厂类SQLParserFactory类负责创建。

这里采用了antlr4的Two-stage parsing提高解析性能,这算antlr4一个的标准写法 https://github.com/antlr/antlr4/issues/374#issuecomment-30952357,关于Two-stage parse更详细的介绍可参见https://www.antlr.org/papers/allstar-techreport.pdf。

org.apache.shardingsphere.sql.parser.core.parser.SQLParserFactory

/**
 * SQL parser factory.
 */
@NoArgsConstructor(access = AccessLevel.PRIVATE)
public final class SQLParserFactory {
    
    static {
        NewInstanceServiceLoader.register(SQLParserConfiguration.class);
    }
    
    /** 
     * New instance of SQL parser.
     * 
     * @param databaseTypeName name of database type
     * @param sql SQL
     * @return SQL parser
     */
    public static SQLParser newInstance(final String databaseTypeName, final String sql) {
        for (SQLParserConfiguration each : NewInstanceServiceLoader.newServiceInstances(SQLParserConfiguration.class)) {
            if (each.getDatabaseTypeName().equals(databaseTypeName)) {// 创建对应数据库类型的SQL解析器
                return createSQLParser(sql, each);
            }
        }
        throw new UnsupportedOperationException(String.format("Cannot support database type '%s'", databaseTypeName));
    }
    
    @SneakyThrows
    private static SQLParser createSQLParser(final String sql, final SQLParserConfiguration configuration) {// 根据SQLParserConfiguration里配置,创建对应的词法与语法解析器
        Lexer lexer = (Lexer) configuration.getLexerClass().getConstructor(CharStream.class).newInstance(CharStreams.fromString(sql));
        return configuration.getParserClass().getConstructor(TokenStream.class).newInstance(new CommonTokenStream(lexer));
    }
}

由上可以看到SQLParser接口的实例创建需要知道对应数据库类型的Lexer和SQLParser类,这个对应关系封装在了SQLParserConfiguration接口中,SQLParserConfiguration接口针对不同的数据库类型提供了实现类。


以MySQL为例看下MySQLParserConfiguration类的实现:
org.apache.shardingsphere.sql.parser.mysql.MySQLParserConfiguration

/**
 * SQL parser configuration for MySQL.
 */
public final class MySQLParserConfiguration implements SQLParserConfiguration {
    
    @Override
    public String getDatabaseTypeName() {
        return "MySQL";
    }
    
    @Override
    public Class getLexerClass() {
        return MySQLLexer.class;
    }
    
    @Override
    public Class getParserClass() {
        return MySQLParser.class;
    }
    
    @Override
    public Class getVisitorFacadeClass() {
        return MySQLVisitorFacade.class;
    }
}

可以看到其定义了MySQL使用的词法解析器MySQLLexer类,语法解析器MySQLParser类以及对应的访问器类MySQLVisitorFacade。

看下词法解析器MySQLLexer类:
org.apache.shardingsphere.sql.parser.mysql.lexer.MySQLLexer

/**
 * SQL lexer for MySQL.
 */
public final class MySQLLexer extends MySQLStatementLexer implements SQLLexer {// MySQLStatementLexer是antlr根据.g4文件生成的MySQL词法解析器
    
    public MySQLLexer(final CharStream input) {
        super(input);
    }
}

可以看到词法解析器就直接继承antlr自动生成的MySQLStatementLexer类,并没有额外的逻辑。
接着看下语法解析器MySQLParser:
org.apache.shardingsphere.sql.parser.mysql.parser.MySQLParser


/**
 * SQL parser for MySQL.
 */
public final class MySQLParser extends MySQLStatementParser implements SQLParser {// MySQLStatementParser是antlr根据.g4文件生成的语法解析器
    
    public MySQLParser(final TokenStream input) {
        super(input);
    }
    
    @Override
    public ASTNode parse() {
        return new ParseASTNode(execute());
    }// 根据antlr返回的ExecuteContext创建ParseASTNode对象
}

与MySQLLexer类似,MySQLParser继承自antlr自动生成的语法解析器MySQLStatementParser,在parse方法中调用MySQLStatementParser的execute方法得到antlr返回的ExecuteContext,然后基于此创建ParseASTNode对象返回。

接着看下MySQL对应的访问器MySQLVisitorFacade类
org.apache.shardingsphere.sql.parser.mysql.visitor.MySQLVisitorFacade


/**
 * Visitor facade for MySQL.
 */
public final class MySQLVisitorFacade implements SQLVisitorFacade {
    
    @Override
    public Class getDMLVisitorClass() {
        return MySQLDMLVisitor.class;
    }
    
    @Override
    public Class getDDLVisitorClass() {
        return MySQLDDLVisitor.class;
    }
    
    @Override
    public Class getTCLVisitorClass() {
        return MySQLTCLVisitor.class;
    }
    
    @Override
    public Class getDCLVisitorClass() {
        return MySQLDCLVisitor.class;
    }
    
    @Override
    public Class getDALVisitorClass() {
        return MySQLDALVisitor.class;
    }
    
    @Override
    public Class getRLVisitorClass() {
        return MySQLRLVisitor.class;
    }
}

可以看到MySQLVisitorFacade提供了DML、DDL、DAL、DCL、RL对应的访问器Class信息,进入最常见的DML访问器类MySQLDMLVisitor类。

org.apache.shardingsphere.sql.parser.mysql.visitor.impl.MySQLDMLVisitor

/**
 * DML visitor for MySQL.
 */
public final class MySQLDMLVisitor extends MySQLVisitor implements DMLVisitor {
…
// 遍历访问antlr生成的SelectClauseContext,创建SelectStatement
    public ASTNode visitSelectClause(final SelectClauseContext ctx) {
        SelectStatement result = new SelectStatement();
        result.setProjections((ProjectionsSegment) visit(ctx.projections()));// 设置ProjectionsSegment
        if (null != ctx.selectSpecification()) {
            result.getProjections().setDistinctRow(isDistinct(ctx));
        }
        if (null != ctx.fromClause()) {
            CollectionValue tableReferences = (CollectionValue) visit(ctx.fromClause());
            for (TableReferenceSegment each : tableReferences.getValue()) {// 设置TableReferenceSegment
                result.getTableReferences().add(each);
            }
        }
        if (null != ctx.whereClause()) {// 设置WhereSegment
            result.setWhere((WhereSegment) visit(ctx.whereClause()));
        }
        if (null != ctx.groupByClause()) {// 设置GroupBySegment
            result.setGroupBy((GroupBySegment) visit(ctx.groupByClause()));
        }
        if (null != ctx.orderByClause()) {// 设置OrderBySegment
            result.setOrderBy((OrderBySegment) visit(ctx.orderByClause()));
        }
        if (null != ctx.limitClause()) {// 设置LimitSegment
            result.setLimit((LimitSegment) visit(ctx.limitClause()));
        }
        if (null != ctx.lockClause()) {// 设置LockSegment
            result.setLock((LockSegment) visit(ctx.lockClause()));
        }
        return result;
    }
…
@Override
    public ASTNode visitWhereClause(final WhereClauseContext ctx) {// 访问WhereClauseContext创建WhereSegment
        WhereSegment result = new WhereSegment(ctx.getStart().getStartIndex(), ctx.getStop().getStopIndex());
        ASTNode segment = visit(ctx.expr());
        if (segment instanceof OrPredicateSegment) {
            result.getAndPredicates().addAll(((OrPredicateSegment) segment).getAndPredicates());
        } else if (segment instanceof PredicateSegment) {
            AndPredicate andPredicate = new AndPredicate();
            andPredicate.getPredicates().add((PredicateSegment) segment);
            result.getAndPredicates().add(andPredicate);
        }
        return result;
    }
    
    @Override
    public ASTNode visitGroupByClause(final GroupByClauseContext ctx) {// 访问GroupByClauseContext创建GroupBySegment
        Collection items = new LinkedList<>();
        for (OrderByItemContext each : ctx.orderByItem()) {
            items.add((OrderByItemSegment) visit(each));
        }
        return new GroupBySegment(ctx.getStart().getStartIndex(), ctx.getStop().getStopIndex(), items);
    }
    
    @Override
    public ASTNode visitLimitClause(final LimitClauseContext ctx) {// 访问LimitClauseContext创建LimitSegment
        if (null == ctx.limitOffset()) {
            return new LimitSegment(ctx.getStart().getStartIndex(), ctx.getStop().getStopIndex(), null, (PaginationValueSegment) visit(ctx.limitRowCount()));
        }
        PaginationValueSegment rowCount;
        PaginationValueSegment offset;
        if (null != ctx.OFFSET()) {
            rowCount = (PaginationValueSegment) visit(ctx.limitRowCount());
            offset = (PaginationValueSegment) visit(ctx.limitOffset());
        } else {
            offset = (PaginationValueSegment) visit(ctx.limitOffset());
            rowCount = (PaginationValueSegment) visit(ctx.limitRowCount());
        }
        return new LimitSegment(ctx.getStart().getStartIndex(), ctx.getStop().getStopIndex(), offset, rowCount);
    }
}

可以看到MySQLDMLVisitor继承自MySQLVisitor, 前者负责访问DML相关的antlr生成的*Context类,然后读取相关信息构建对应的Segment类与SQLStatement类。
MySQLVisitor类则负责将DML、DDL、DAL等通用的一些解析操作,例如各类型的值、表名、列名等。
org.apache.shardingsphere.sql.parser.mysql.visitor.MySQLVisitor

    public abstract class MySQLVisitor extends MySQLStatementBaseVisitor {
    
    private int currentParameterIndex;
    
    @Override
    public final ASTNode visitParameterMarker(final ParameterMarkerContext ctx) {
        return new ParameterMarkerValue(currentParameterIndex++);
}
…
    @Override
    public final ASTNode visitTableName(final TableNameContext ctx) {// 通过访问TableNameContext构建TableSegment
        SimpleTableSegment result = new SimpleTableSegment(new TableNameSegment(ctx.getStart().getStartIndex(), ctx.getStop().getStopIndex(), (IdentifierValue) visit(ctx.name())));
        OwnerContext owner = ctx.owner();
        if (null != owner) {
            result.setOwner(new OwnerSegment(owner.getStart().getStartIndex(), owner.getStop().getStopIndex(), (IdentifierValue) visit(owner.identifier())));
        }
        return result;
    }
    
    @Override
    public final ASTNode visitColumnName(final ColumnNameContext ctx) {// 通过访问ColumnNameContext构建ColumnSegment
        ColumnSegment result = new ColumnSegment(ctx.getStart().getStartIndex(), ctx.getStop().getStopIndex(), (IdentifierValue) visit(ctx.name()));
        OwnerContext owner = ctx.owner();
        if (null != owner) {
            result.setOwner(new OwnerSegment(owner.getStart().getStartIndex(), owner.getStop().getStopIndex(), (IdentifierValue) visit(owner.identifier())));
        }
        return result;
}
…
}

MySQLVisitor的基类MySQLStatementBaseVisitor类是antlr根据g4文件自动生成访问器基类,这里就不展开介绍。

最后回到SQLParserEngine#parse0方法中,在得到ParseTree对象后,由ParseTreeVisitorFactory创建对应的访问器类进行visit操作,最后得到SQLStatement,看下ParseTreeVisitorFactory类:
org.apache.shardingsphere.sql.parser.core.visitor.ParseTreeVisitorFactory

/**
 * Parse tree visitor factory.
 */
@NoArgsConstructor(access = AccessLevel.PRIVATE)
public final class ParseTreeVisitorFactory {
    
    /** 
     * New instance of SQL visitor.
     * 
     * @param databaseTypeName name of database type
     * @param visitorRule visitor rule
     * @return parse tree visitor
     */
    public static ParseTreeVisitor newInstance(final String databaseTypeName, final VisitorRule visitorRule) {
        for (SQLParserConfiguration each : NewInstanceServiceLoader.newServiceInstances(SQLParserConfiguration.class)) {
            if (each.getDatabaseTypeName().equals(databaseTypeName)) {
                return createParseTreeVisitor(each, visitorRule.getType());//创建解析树antlr ParseTree访问器
            }
        }
        throw new UnsupportedOperationException(String.format("Cannot support database type '%s'", databaseTypeName));
    }
    
    @SneakyThrows
    private static ParseTreeVisitor createParseTreeVisitor(final SQLParserConfiguration configuration, final SQLStatementType type) {
        SQLVisitorFacade visitorFacade = configuration.getVisitorFacadeClass().getConstructor().newInstance();
        switch (type) {
            case DML:
                return (ParseTreeVisitor) visitorFacade.getDMLVisitorClass().getConstructor().newInstance();
            case DDL:
                return (ParseTreeVisitor) visitorFacade.getDDLVisitorClass().getConstructor().newInstance();
            case TCL:
                return (ParseTreeVisitor) visitorFacade.getTCLVisitorClass().getConstructor().newInstance();
            case DCL:
                return (ParseTreeVisitor) visitorFacade.getDCLVisitorClass().getConstructor().newInstance();
            case DAL:
                return (ParseTreeVisitor) visitorFacade.getDALVisitorClass().getConstructor().newInstance();
            case RL:
                return (ParseTreeVisitor) visitorFacade.getRLVisitorClass().getConstructor().newInstance();
            default:
                throw new SQLParsingException("Can not support SQL statement type: `%s`", type);
        }
    }
}

可以看到ParseTreeVisitorFactory类中就是根据SQL的种类,然后调用SQLVisitorFacade方法得到获取到对应的ParseTreeVisitor实例。ParseTreeVisitor的实现类前面以MySQLDMLVisitor为例进行了介绍,这里就不重复了。

最后到回到org.apache.shardingsphere.underlying.route.DataNodeRouter#createRouteContext方法,在调用完 parserEngine.parse方法(前面已分析完)之后通过 SQLStatementContextFactory. newInstance方法将SQLStatement转换为SQLStatementContext对象。

private RouteContext createRouteContext(final String sql, final List parameters, final boolean useCache) {
        SQLStatement sqlStatement = parserEngine.parse(sql, useCache);//解析SQL,生成SQL对应AST
        try {
            SQLStatementContext sqlStatementContext = SQLStatementContextFactory.newInstance(metaData.getSchema(), sql, parameters, sqlStatement);// 生成SQL Statement上下文,相当于一部分语义分析
            return new RouteContext(sqlStatementContext, parameters, new RouteResult());
            // TODO should pass parameters for master-slave
        } catch (final IndexOutOfBoundsException ex) {
            return new RouteContext(new CommonSQLStatementContext(sqlStatement), parameters, new RouteResult());
        }
    }
 
 

SQLStatementContext类相当于SQLStatement的二次处理类,它也是后续路由、改写等环节间传递的上下文对象,每种Context往往对应一个ContextEngine,与SQLStatement不同的是,这些Context对象已经包含了部分语义分析处理的逻辑,例如会根据需要生成衍生projection列,avg聚合函数会添加count、sum列,分页上下文时会添加生成修改后的offset和rowcount等。


org.apache.shardingsphere.sql.parser.binder.SQLStatementContextFactory

/**
 * SQL statement context factory.
 */
@NoArgsConstructor(access = AccessLevel.PRIVATE)
public final class SQLStatementContextFactory {
    
    /**
     * Create SQL statement context.
     *
     * @param schemaMetaData table meta data
     * @param sql SQL
     * @param parameters SQL parameters
     * @param sqlStatement SQL statement
     * @return SQL statement context
     */
    @SuppressWarnings("unchecked")
    public static SQLStatementContext newInstance(final SchemaMetaData schemaMetaData, final String sql, final List parameters, final SQLStatement sqlStatement) {
        if (sqlStatement instanceof DMLStatement) {
            return getDMLStatementContext(schemaMetaData, sql, parameters, (DMLStatement) sqlStatement);
        }
        if (sqlStatement instanceof DDLStatement) {
            return getDDLStatementContext((DDLStatement) sqlStatement);
        }
        if (sqlStatement instanceof DCLStatement) {
            return getDCLStatementContext((DCLStatement) sqlStatement);
        }
        if (sqlStatement instanceof DALStatement) {
            return getDALStatementContext((DALStatement) sqlStatement);
        }
        return new CommonSQLStatementContext(sqlStatement);
    }
    
    private static SQLStatementContext getDMLStatementContext(final SchemaMetaData schemaMetaData, final String sql, final List parameters, final DMLStatement sqlStatement) {
        if (sqlStatement instanceof SelectStatement) {
            return new SelectStatementContext(schemaMetaData, sql, parameters, (SelectStatement) sqlStatement);
        }
        if (sqlStatement instanceof UpdateStatement) {
            return new UpdateStatementContext((UpdateStatement) sqlStatement);
        }
        if (sqlStatement instanceof DeleteStatement) {
            return new DeleteStatementContext((DeleteStatement) sqlStatement);
        }
        if (sqlStatement instanceof InsertStatement) {
            return new InsertStatementContext(schemaMetaData, parameters, (InsertStatement) sqlStatement);
        }
        throw new UnsupportedOperationException(String.format("Unsupported SQL statement `%s`", sqlStatement.getClass().getSimpleName()));
}
…
}
 
 

可以看到newInstance方法中会根据不同的SQL类型,创建对应的StatementContext实例,看下最常用的SelectStatementContext类
org.apache.shardingsphere.sql.parser.binder.statement.dml.SelectStatementContext

/**
 * Select SQL statement context.
 */
@Getter
@ToString(callSuper = true)
public final class SelectStatementContext extends CommonSQLStatementContext implements TableAvailable, WhereAvailable {
    private final TablesContext tablesContext;
    
    private final ProjectionsContext projectionsContext;
    
    private final GroupByContext groupByContext;
    
    private final OrderByContext orderByContext;
    
    private final PaginationContext paginationContext;
    
    private final boolean containsSubquery;
…
    public SelectStatementContext(final SchemaMetaData schemaMetaData, final String sql, final List parameters, final SelectStatement sqlStatement) {
        super(sqlStatement);
        tablesContext = new TablesContext(sqlStatement.getSimpleTableSegments());// 创建表名上下文
        groupByContext = new GroupByContextEngine().createGroupByContext(sqlStatement);// 创建group by上下文
        orderByContext = new OrderByContextEngine().createOrderBy(sqlStatement, groupByContext);// 创建order by上下文
        projectionsContext = new ProjectionsContextEngine(schemaMetaData).createProjectionsContext(sql, sqlStatement, groupByContext, orderByContext);// 创建projection上下文
        paginationContext = new PaginationContextEngine().createPaginationContext(sqlStatement, projectionsContext, parameters);// 创建分页上下文
        containsSubquery = containsSubquery();
    }…
} 
 
 

可以看到在SelectStatementContext的构造函数中,创建了Select语句对应的所有上下文相关信息,包括projectionContext、tableContext、OrderByContext等。

排序上下文引擎类 org.apache.shardingsphere.sql.parser.binder.segment.select.orderby.engine.OrderByContextEngine

/**
 * Order by context engine.
 */
public final class OrderByContextEngine {
    /**
     * Create order by context.
     * 
     * @param selectStatement select statement
     * @param groupByContext group by context
     * @return order by context
     */
    public OrderByContext createOrderBy(final SelectStatement selectStatement, final GroupByContext groupByContext) {
        if (!selectStatement.getOrderBy().isPresent() || selectStatement.getOrderBy().get().getOrderByItems().isEmpty()) {
            OrderByContext orderByItems = createOrderByContextForDistinctRowWithoutGroupBy(selectStatement, groupByContext);// 如果有distinct且没有group by,则需要添加order by从而实现流式查询进行优化
            return null != orderByItems ? orderByItems : new OrderByContext(groupByContext.getItems(), !groupByContext.getItems().isEmpty());//如果有group by,生成group by列的对应的order by
        }
        List orderByItems = new LinkedList<>();
        for (OrderByItemSegment each : selectStatement.getOrderBy().get().getOrderByItems()) {// 如果SQL本身就有order by,则按照原顺序生成
            OrderByItem orderByItem = new OrderByItem(each);
            if (each instanceof IndexOrderByItemSegment) {
                orderByItem.setIndex(((IndexOrderByItemSegment) each).getColumnIndex());
            }
            orderByItems.add(orderByItem);
        }
        return new OrderByContext(orderByItems, false);
    }

    private OrderByContext createOrderByContextForDistinctRowWithoutGroupBy(final SelectStatement selectStatement, final GroupByContext groupByContext) {
        if (groupByContext.getItems().isEmpty() && selectStatement.getProjections().isDistinctRow()) {// 没有group by但有distinct,,则添加各查询列的order by
            int index = 0;
            List orderByItems = new LinkedList<>();
            for (ProjectionSegment projectionSegment : selectStatement.getProjections().getProjections()) {
                if (projectionSegment instanceof ColumnProjectionSegment) {
                    ColumnProjectionSegment columnProjectionSegment = (ColumnProjectionSegment) projectionSegment;
                    ColumnOrderByItemSegment columnOrderByItemSegment = new ColumnOrderByItemSegment(columnProjectionSegment.getColumn(), OrderDirection.ASC);
                    OrderByItem item = new OrderByItem(columnOrderByItemSegment);
                    item.setIndex(index++);
                    orderByItems.add(item);
                }
            }
            if (!orderByItems.isEmpty()) {
                return new OrderByContext(orderByItems, true);
            }
        }
        return null;
    }
}

查询列上下文引擎类 org.apache.shardingsphere.sql.parser.binder.segment.select.projection.engine.ProjectionsContextEngine

/**
 * Projections context engine.
 */
public final class ProjectionsContextEngine {
    
    private final SchemaMetaData schemaMetaData;
    
    private final ProjectionEngine projectionEngine;
    
    public ProjectionsContextEngine(final SchemaMetaData schemaMetaData) {
        this.schemaMetaData = schemaMetaData;
        projectionEngine = new ProjectionEngine(schemaMetaData);
    }
    
    /**
     * Create projections context.
     *
     * @param sql SQL
     * @param selectStatement SQL statement
     * @param groupByContext group by context
     * @param orderByContext order by context
     * @return projections context
     */
    public ProjectionsContext createProjectionsContext(final String sql, final SelectStatement selectStatement, final GroupByContext groupByContext, final OrderByContext orderByContext) {
        ProjectionsSegment projectionsSegment = selectStatement.getProjections();
        Collection projections = getProjections(sql, selectStatement.getSimpleTableSegments(), projectionsSegment);
        ProjectionsContext result = new ProjectionsContext(projectionsSegment.getStartIndex(), projectionsSegment.getStopIndex(), projectionsSegment.isDistinctRow(), projections);
        // 如果group by和order by的列值在原SQL的projection中没有,则需要添加该衍生列
        result.getProjections().addAll(getDerivedGroupByColumns(projections, groupByContext, selectStatement));// 添加group by对应的衍生projection
        result.getProjections().addAll(getDerivedOrderByColumns(projections, orderByContext, selectStatement));// 添加order by对应的衍生projection
        return result;
}
…
}

最后总结下SQL解析引擎的执行流程图:


SQL解析执行流程

[DB]表示不同的数据库都有对应的类,例如Lexer有MySQLStatementLexer、OracleStatementLexer、SQLServerStatementLexer、PostgreSQLLexer。

你可能感兴趣的:(从源码看ShardingSphere设计-解析引擎篇)