Spark SQL的执行计划

Spark SQL的架构

Spark SQL的执行计划_第1张图片

实例分析

spark-sql> explain extended select * from emp e inner join dept d on e.deptno = d.deptno where e.deptno > 10;
20/02/04 20:16:31 INFO CodeGenerator: Code generated in 22.286318 ms
== Parsed Logical Plan ==
'Project [*]
+- 'Filter ('e.deptno > 10)
   +- 'Join Inner, ('e.deptno = 'd.deptno)
      :- 'SubqueryAlias `e`
      :  +- 'UnresolvedRelation `emp`
      +- 'SubqueryAlias `d`
         +- 'UnresolvedRelation `dept`

== Analyzed Logical Plan ==
empno: int, ename: string, position: string, managerid: int, hiredate: string, salary: double, allowance: double, deptno: int, deptno: int, ename: string, dname: string, city: int
Project [empno#18, ename#19, position#20, managerid#21, hiredate#22, salary#23, allowance#24, deptno#25, deptno#26, ename#27, dname#28, city#29]
+- Filter (deptno#25 > 10)
   +- Join Inner, (deptno#25 = deptno#26)
      :- SubqueryAlias `e`
      :  +- SubqueryAlias `h_demo`.`emp`
      :     +- HiveTableRelation `h_demo`.`emp`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [empno#18, ename#19, position#20, managerid#21, hiredate#22, salary#23, allowance#24, deptno#25]
      +- SubqueryAlias `d`
         +- SubqueryAlias `h_demo`.`dept`
            +- HiveTableRelation `h_demo`.`dept`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [deptno#26, ename#27, dname#28, city#29]

== Optimized Logical Plan ==
Join Inner, (deptno#25 = deptno#26)
:- Filter (isnotnull(deptno#25) && (deptno#25 > 10))
:  +- HiveTableRelation `h_demo`.`emp`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [empno#18, ename#19, position#20, managerid#21, hiredate#22, salary#23, allowance#24, deptno#25]
+- Filter ((deptno#26 > 10) && isnotnull(deptno#26))
   +- HiveTableRelation `h_demo`.`dept`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [deptno#26, ename#27, dname#28, city#29]

== Physical Plan ==
*(2) BroadcastHashJoin [deptno#25], [deptno#26], Inner, BuildRight
:- *(2) Filter (isnotnull(deptno#25) && (deptno#25 > 10))
:  +- Scan hive h_demo.emp [empno#18, ename#19, position#20, managerid#21, hiredate#22, salary#23, allowance#24, deptno#25], HiveTableRelation `h_demo`.`emp`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [empno#18, ename#19, position#20, managerid#21, hiredate#22, salary#23, allowance#24, deptno#25]
+- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, false] as bigint)))
   +- *(1) Filter ((deptno#26 > 10) && isnotnull(deptno#26))
      +- Scan hive h_demo.dept [deptno#26, ename#27, dname#28, city#29], HiveTableRelation `h_demo`.`dept`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [deptno#26, ename#27, dname#28, city#29]
Time taken: 0.352 seconds, Fetched 1 row(s)
20/02/04 20:16:31 INFO SparkSQLCLIDriver: Time taken: 0.352 seconds, Fetched 1 row(s)

你可能感兴趣的:(Spark SQL的执行计划)