(四)PrestoDB源码解析(二)

(四)PrestoDB源码解析(二)

执行计划模块

概要

  1. PrestoDB对传入的SQL语句进行解析后将生成执行计划,本模块将解析PrestoDB如何生成执行计划。
  2. (a). 用户输入SQL语句,通过JDBC或者presto-cli客户端将SQL通过HTTP的形式传入Coordinator中的StatementResource接口的createQuery方法中,进入执行计划生成模块;
    (b).通过SQLQueryManager类的createQuery方法来实现完整的词法语法分析,通过sqlParser.createStatement方法来实现分析的,词法分析主要的作用是进行一些sql语句大小写,处理函数等词法解析,语法分析主要是形成一个该语句的所有方法的包装对象;
    ( c).通过Statement类型生成对应的工厂,在工厂中通过queryExecutionFactory.createQueryExecution方法中SqlQueryExecution类下的analyzer.analyze进行语义分析,语义分析的作用是得到该sql语句的一系列字段信息;
    (d).在createQueryExecution方法中SqlQueryExecution类下的logicalPlanner.plan(analysis)根据不同的类型进行执行计划的生成,并对执行计划进行优化和分段
  3. 整体流程(四)PrestoDB源码解析(二)_第1张图片

源码

  1. 进入语法分析主入口:SqlQueryManager类(语法分析在此类中进行)
package com.facebook.presto.execution;

import com.facebook.presto.ExceededCpuLimitException;
import com.facebook.presto.Session;
.............

//语法分析入口
@ThreadSafe
public class SqlQueryManager
        implements QueryManager
{
    private static final Logger log = Logger.get(SqlQueryManager.class);

    private final QueryPreparer queryPreparer;

    private final EmbedVersion embedVersion;
    private final ExecutorService queryExecutor;
    private final ThreadPoolExecutorMBean queryExecutorMBean;
    private final ResourceGroupManager resourceGroupManager;
    private final ClusterMemoryManager memoryManager;
    ...............

  1. 进入语法分析的主要方法:SqlQueryManager类的createQuery方法
   //分析词法和语法的方法
    @Override
    public ListenableFuture createQuery(QueryId queryId, SessionContext sessionContext, String query)
    {
        QueryCreationFuture queryCreationFuture = new QueryCreationFuture();
        queryExecutor.submit(embedVersion.embedVersion(() -> {
            try {
                //进行词法和语法分析
                createQueryInternal(queryId, sessionContext, query, resourceGroupManager);
                queryCreationFuture.set(null);
            }
            catch (Throwable e) {
                queryCreationFuture.setException(e);
            }
        }));
        return queryCreationFuture;
    }
  1. 进入语法词法分析具体方法:createQueryInternal
private  void createQueryInternal(QueryId queryId, SessionContext sessionContext, String query, ResourceGroupManager resourceGroupManager)
    {
        requireNonNull(queryId, "queryId is null");
        requireNonNull(sessionContext, "sessionFactory is null");
        requireNonNull(query, "query is null");
        checkArgument(!query.isEmpty(), "query must not be empty string");
        checkArgument(!queryTracker.tryGetQuery(queryId).isPresent(), "query %s already exists", queryId);

        Session session = null;
        SelectionContext selectionContext = null;
        QueryExecution queryExecution;
        PreparedQuery preparedQuery;
        Optional queryType = Optional.empty();
        try {
            clusterSizeMonitor.verifyInitialMinimumWorkersRequirement();

            if (query.length() > maxQueryLength) {
                int queryLength = query.length();
                query = query.substring(0, maxQueryLength);
                throw new PrestoException(QUERY_TEXT_TOO_LARGE, format("Query text length (%s) exceeds the maximum length (%s)", queryLength, maxQueryLength));
            }

            // decode session
            session = sessionSupplier.createSession(queryId, sessionContext);

            WarningCollector warningCollector = warningCollectorFactory.create();

            // prepare query
            //分析词法和语法的方法
            preparedQuery = queryPreparer.prepareQuery(session, query, warningCollector);

            // select resource group
            queryType = getQueryType(preparedQuery.getStatement().getClass());
            selectionContext = resourceGroupManager.selectGroup(new SelectionCriteria(
                    sessionContext.getIdentity().getPrincipal().isPresent(),
                    sessionContext.getIdentity().getUser(),
                    Optional.ofNullable(sessionContext.getSource()),
                    sessionContext.getClientTags(),
                    sessionContext.getResourceEstimates(),
                    queryType.map(Enum::name)));

            // apply system defaults for query
            session = sessionPropertyDefaults.newSessionWithDefaultProperties(session, queryType.map(Enum::name), selectionContext.getResourceGroupId());

            // mark existing transaction as active
            transactionManager.activateTransaction(session, isTransactionControlStatement(preparedQuery.getStatement()), accessControl);

            //根据Statement类型获取相应的执行工厂类
            // create query execution
            //executionFactories是一个map,里面定义了Statement类型和对应的工厂
            QueryExecutionFactory queryExecutionFactory = executionFactories.get(preparedQuery.getStatement().getClass());
            if (queryExecutionFactory == null) {
                throw new PrestoException(NOT_SUPPORTED, "Unsupported statement type: " + preparedQuery.getStatement().getClass().getSimpleName());
            }
            //执行语义分析
            queryExecution = queryExecutionFactory.createQueryExecution(
                    query,
                    session,
                    preparedQuery,
                    selectionContext.getResourceGroupId(),
                    warningCollector,
                    queryType);
        }
        catch (RuntimeException e) {
            // This is intentionally not a method, since after the state change listener is registered
            // it's not safe to do any of this, and we had bugs before where people reused this code in a method

            // if session creation failed, create a minimal session object
            if (session == null) {
                session = Session.builder(new SessionPropertyManager())
                        .setQueryId(queryId)
                        .setIdentity(sessionContext.getIdentity())
                        .setPath(new SqlPath(Optional.empty()))
                        .build();
            }
            QUERY_STATE_LOG.debug(e, "Query %s failed", session.getQueryId());

            // query failure fails the transaction
            session.getTransactionId().ifPresent(transactionManager::fail);

            QueryExecution execution = new FailedQueryExecution(
                    session,
                    query,
                    locationFactory.createQueryLocation(queryId),
                    Optional.ofNullable(selectionContext).map(SelectionContext::getResourceGroupId),
                    queryType,
                    queryExecutor,
                    e);

            try {
                queryTracker.addQuery(execution);

                BasicQueryInfo queryInfo = execution.getBasicQueryInfo();
                queryMonitor.queryCreatedEvent(queryInfo);
                queryMonitor.queryImmediateFailureEvent(queryInfo, toFailure(e));
                stats.queryQueued();
                stats.queryStarted();
                stats.queryStopped();
                stats.queryFinished(execution.getQueryInfo());
            }
            finally {
                // execution MUST be added to the expiration queue or there will be a leak
                queryTracker.expireQuery(queryId);
            }

            return;
        }

        queryMonitor.queryCreatedEvent(queryExecution.getBasicQueryInfo());

        queryExecution.addFinalQueryInfoListener(finalQueryInfo -> {
            try {
                stats.queryFinished(finalQueryInfo);
                queryMonitor.queryCompletedEvent(finalQueryInfo);
            }
            finally {
                // execution MUST be added to the expiration queue or there will be a leak
                queryTracker.expireQuery(queryId);
            }
        });

        addStatsListeners(queryExecution);

        if (!queryTracker.addQuery(queryExecution)) {
            // query already created, so just exit
            return;
        }

        // start the query in the background
        try {
            resourceGroupManager.submit(preparedQuery.getStatement(), queryExecution, selectionContext, queryExecutor);
        }
        catch (Throwable e) {
            failQuery(queryId, e);
        }
    }
  1. 进入createQueryInternal方法的queryPreparer.prepareQuery进行具体的分析
   public PreparedQuery prepareQuery(Session session, String query, WarningCollector warningCollector)
            throws ParsingException, PrestoException, SemanticException
    {
        //分析词法和语法的方法
        Statement wrappedStatement = sqlParser.createStatement(query, createParsingOptions(session, warningCollector));
        return prepareQuery(session, wrappedStatement, warningCollector);
    }
  1. 进入进行语义分析的方法queryExecutionFactory.createQueryExecution

    //以内部接口的形式定义
     interface QueryExecutionFactory
     {
         T createQueryExecution(
                 String query,
                 Session session,
                 PreparedQuery preparedQuery,
                 ResourceGroupId resourceGroup,
                 WarningCollector warningCollector,
                 Optional queryType);
     }
    

    该方法的实现类

 @Override
        public QueryExecution createQueryExecution(
                String query,
                Session session,
                PreparedQuery preparedQuery,
                ResourceGroupId resourceGroup,
                WarningCollector warningCollector,
                Optional queryType)
        {
            String executionPolicyName = SystemSessionProperties.getExecutionPolicy(session);
            ExecutionPolicy executionPolicy = executionPolicies.get(executionPolicyName);
            checkArgument(executionPolicy != null, "No execution policy %s", executionPolicy);

            SqlQueryExecution execution = new SqlQueryExecution(
                    query,
                    session,
                    locationFactory.createQueryLocation(session.getQueryId()),
                    resourceGroup,
                    queryType,
                    preparedQuery,
                    clusterSizeMonitor,
                    transactionManager,
                    metadata,
                    accessControl,
                    sqlParser,
                    splitManager,
                    nodePartitioningManager,
                    nodeScheduler,
                    planOptimizers,
                    planFragmenter,
                    remoteTaskFactory,
                    locationFactory,
                    scheduleSplitBatchSize,
                    queryExecutor,
                    schedulerExecutor,
                    failureDetector,
                    nodeTaskMap,
                    queryExplainer,
                    executionPolicy,
                    schedulerStats,
                    statsCalculator,
                    costCalculator,
                    warningCollector);

            return execution;
        }
    }
该方法初始化了SqlQueryExecution类
   private SqlQueryExecution(
            String query,
            Session session,
            URI self,
            ResourceGroupId resourceGroup,
            Optional queryType,
            PreparedQuery preparedQuery,
            ClusterSizeMonitor clusterSizeMonitor,
            TransactionManager transactionManager,
            Metadata metadata,
            AccessControl accessControl,
            SqlParser sqlParser,
            SplitManager splitManager,
            NodePartitioningManager nodePartitioningManager,
            NodeScheduler nodeScheduler,
            List planOptimizers,
            PlanFragmenter planFragmenter,
            RemoteTaskFactory remoteTaskFactory,
            LocationFactory locationFactory,
            int scheduleSplitBatchSize,
            ExecutorService queryExecutor,
            ScheduledExecutorService schedulerExecutor,
            FailureDetector failureDetector,
            NodeTaskMap nodeTaskMap,
            QueryExplainer queryExplainer,
            ExecutionPolicy executionPolicy,
            SplitSchedulerStats schedulerStats,
            StatsCalculator statsCalculator,
            CostCalculator costCalculator,
            WarningCollector warningCollector)
    {
        try (SetThreadName ignored = new SetThreadName("Query-%s", session.getQueryId())) {
            this.clusterSizeMonitor = requireNonNull(clusterSizeMonitor, "clusterSizeMonitor is null");
            this.metadata = requireNonNull(metadata, "metadata is null");
            this.sqlParser = requireNonNull(sqlParser, "sqlParser is null");
            this.splitManager = requireNonNull(splitManager, "splitManager is null");
            this.nodePartitioningManager = requireNonNull(nodePartitioningManager, "nodePartitioningManager is null");
            this.nodeScheduler = requireNonNull(nodeScheduler, "nodeScheduler is null");
            this.planOptimizers = requireNonNull(planOptimizers, "planOptimizers is null");
            this.planFragmenter = requireNonNull(planFragmenter, "planFragmenter is null");
            this.locationFactory = requireNonNull(locationFactory, "locationFactory is null");
            this.queryExecutor = requireNonNull(queryExecutor, "queryExecutor is null");
            this.schedulerExecutor = requireNonNull(schedulerExecutor, "schedulerExecutor is null");
            this.failureDetector = requireNonNull(failureDetector, "failureDetector is null");
            this.nodeTaskMap = requireNonNull(nodeTaskMap, "nodeTaskMap is null");
            this.executionPolicy = requireNonNull(executionPolicy, "executionPolicy is null");
            this.schedulerStats = requireNonNull(schedulerStats, "schedulerStats is null");
            this.statsCalculator = requireNonNull(statsCalculator, "statsCalculator is null");
            this.costCalculator = requireNonNull(costCalculator, "costCalculator is null");

            checkArgument(scheduleSplitBatchSize > 0, "scheduleSplitBatchSize must be greater than 0");
            this.scheduleSplitBatchSize = scheduleSplitBatchSize;

            requireNonNull(query, "query is null");
            requireNonNull(session, "session is null");
            requireNonNull(self, "self is null");
            this.stateMachine = QueryStateMachine.begin(
                    query,
                    session,
                    self,
                    resourceGroup,
                    queryType,
                    false,
                    transactionManager,
                    accessControl,
                    queryExecutor,
                    metadata,
                    warningCollector);

            // analyze query
            requireNonNull(preparedQuery, "preparedQuery is null");
            Analyzer analyzer = new Analyzer(
                    stateMachine.getSession(),
                    metadata,
                    sqlParser,
                    accessControl,
                    Optional.of(queryExplainer),
                    preparedQuery.getParameters(),
                    warningCollector);

            try {
                //构造函数里面就对传入的sql进行了语义的分析
                this.analysis = analyzer.analyze(preparedQuery.getStatement());
            }
            catch (RuntimeException e) {
                stateMachine.transitionToFailed(e);
                throw e;
            }

            stateMachine.setUpdateType(analysis.getUpdateType());

            // when the query finishes cache the final query info, and clear the reference to the output stage
            AtomicReference queryScheduler = this.queryScheduler;
            stateMachine.addStateChangeListener(state -> {
                if (!state.isDone()) {
                    return;
                }

                // query is now done, so abort any work that is still running
                SqlQueryScheduler scheduler = queryScheduler.get();
                if (scheduler != null) {
                    scheduler.abort();
                }
            });

            this.remoteTaskFactory = new MemoryTrackingRemoteTaskFactory(requireNonNull(remoteTaskFactory, "remoteTaskFactory is null"), stateMachine);
        }
    }
  1. 根据分析完的生成执行计划
 @Override
    public void start()
    {
        if (stateMachine.transitionToWaitingForResources()) {
            waitForMinimumWorkers();
        }
    }

    private void waitForMinimumWorkers()
    {
        ListenableFuture minimumWorkerFuture = clusterSizeMonitor.waitForMinimumWorkers();
        addSuccessCallback(minimumWorkerFuture, () -> queryExecutor.submit(this::startExecution));
        addExceptionCallback(minimumWorkerFuture, throwable -> queryExecutor.submit(() -> stateMachine.transitionToFailed(throwable)));
    }

    //开始执行一个查询(非DDL查询)
    private void startExecution()
    {
        try (SetThreadName ignored = new SetThreadName("Query-%s", stateMachine.getQueryId())) {
            try {
                // transition to planning
                if (!stateMachine.transitionToPlanning()) {
                    // query already started or finished
                    return;
                }

                // analyze query
                //生成查询执行计划
                PlanRoot plan = analyzeQuery();

                metadata.beginQuery(getSession(), plan.getConnectors());

                // plan distribution of query
                planDistribution(plan);

                // transition to starting
                if (!stateMachine.transitionToStarting()) {
                    // query already started or finished
                    return;
                }

                // if query is not finished, start the scheduler, otherwise cancel it
                SqlQueryScheduler scheduler = queryScheduler.get();

                if (!stateMachine.isDone()) {
                    scheduler.start();
                }
            }
            catch (Throwable e) {
                fail(e);
                throwIfInstanceOf(e, Error.class);
            }
        }
    }
 private PlanRoot analyzeQuery()
    {
        try {
            return doAnalyzeQuery();
        }
        catch (StackOverflowError e) {
            throw new PrestoException(NOT_SUPPORTED, "statement is too large (stack overflow during analysis)", e);
        }
    }

    private PlanRoot doAnalyzeQuery()
    {
        // time analysis phase
        stateMachine.beginAnalysis();

        // plan query
        PlanNodeIdAllocator idAllocator = new PlanNodeIdAllocator();
        LogicalPlanner logicalPlanner = new LogicalPlanner(false, stateMachine.getSession(), planOptimizers, idAllocator, metadata, sqlParser, statsCalculator, costCalculator, stateMachine.getWarningCollector());
        //生成查询执行计划
        Plan plan = logicalPlanner.plan(analysis);
        queryPlan.set(plan);

        // extract inputs
        List inputs = new InputExtractor(metadata, stateMachine.getSession()).extractInputs(plan.getRoot());
        stateMachine.setInputs(inputs);

        // extract output
        Optional output = new OutputExtractor().extractOutput(plan.getRoot());
        stateMachine.setOutput(output);

        // fragment the plan
        SubPlan fragmentedPlan = planFragmenter.createSubPlans(stateMachine.getSession(), plan, false, idAllocator, stateMachine.getWarningCollector());

        // record analysis time
        stateMachine.endAnalysis();

        boolean explainAnalyze = analysis.getStatement() instanceof Explain && ((Explain) analysis.getStatement()).isAnalyze();
        return new PlanRoot(fragmentedPlan, !explainAnalyze, extractConnectors(analysis));
    }

你可能感兴趣的:(数据库)