2021-10-14

第三章 A glance at SparkSQL

逻辑计划阶段目标是把 SQL 转成完整的逻辑算子树 LogicalPlan,其经历,构建未解析的逻辑算子树,绑定结点信息的逻辑算子树和优化后的逻辑算子树,

逻辑算子树完整生成后,开始执行物理计划阶段,生成 RDD,其经历,构建多棵物理算子树,选取物理算子树,最终得到 Prepared SparkPlan

-- InternalRow 体系 数据模型

-- TreeNode 体系 接口抽象算子树结点

    transformDown/transformUp/transformChildren

    Origin 行定位

-- Expression 体系 表达式运算结点

-- QueryPlan 体系

第四章 ANTLR

但为什么解析后的 AST 要这样设计?

第五章 LogicalPlan

1. QueryPlan

逻辑计划是 QueryPlan 抽象模型的一种实现,QueryPlan 主要包含:

-- 输入输出的一系列列名(比如 output: Seq[Attribute],Attribute 指代的就是列)

-- 属性信息(schema、aliasMap)

-- Constraints

2. AST -> Unresolved Logical Plan (AstBuilder.class)

SparkSession#sql -> ParseDriver#parsePlan

``` Scala

  override def parsePlan(sqlText: String): LogicalPlan = parse(sqlText) { parser =>

    astBuilder.visitSingleStatement(parser.singleStatement()) match {

      case plan: LogicalPlan => plan

      case _ =>

        val position = Origin(None, None)

        throw new ParseException(Option(sqlText), "Unsupported SQL statement", position, position)

    }

```

AstBuilder#visitSingleStatement 执行遍历 AST,直至 visitQuerySpecification,开始 Query 模块的 Logical Plan 构建

``` Scala

  /**

  * Create a logical plan using a query specification.

  */

  override def visitQuerySpecification(

      ctx: QuerySpecificationContext): LogicalPlan = withOrigin(ctx) {

    val from = OneRowRelation().optional(ctx.fromClause) {

      visitFromClause(ctx.fromClause)

    }

    withQuerySpecification(ctx, from)

  }

```

visitQuerySpecification 优先访问 From 子句,然后通过 withQuerySpecification 关联上聚合、 Filter 子句与 Project 投影子句,完成 Query 模块的 Logical Plan 构建

// 构建模型流程是怎么样的?

3. Unresolved Logical Plan -> Analyzed Logical Plan

Catalogy + Rule

以折叠常量计算

``` Scala

object ConstantFolding extends Rule[LogicalPlan] {

  def apply(plan: LogicalPlan): LogicalPlan = plan transform {

    case q: LogicalPlan => q transformExpressionsDown {

      // Skip redundant folding of literals. This rule is technically not necessary. Placing this

      // here avoids running the next rule for Literal values, which would create a new Literal

      // object and running eval unnecessarily.

      case l: Literal => l

      // Fold expressions that are foldable.

      case e if e.foldable => Literal.create(e.eval(EmptyRow), e.dataType)

    }

  }

}

```

你可能感兴趣的:(2021-10-14)