关于hive中的with as 语法

网络上搜索到的结果,都是说with as可以有缓存数据,减少表扫描,优化速度的作业,其实并不是这样,以下通过实验分析和证明

写一个很简单的语句:

with t as(
select regexp_replace(reflect("java.util.UUID", "randomUUID"), "-", "") AS id --生成一个随机id
,'andy' as name
) 
select * from t
union all
select * from t
;

执行结果:

关于hive中的with as 语法_第1张图片

 

结果中可以看到,产生了两个不一样的id,说明reflect函数被执行了两次,即with as中的子查询被执行了两次。

再来看下执行计划:

hive (bi)> 
         > explain 
         > with t as(
         > select regexp_replace(reflect("java.util.UUID", "randomUUID"), "-", "") AS id --生成一个随机id
         > ,'andy' as name
         > ) 
         > select * from t
         > union all
         > select * from t
         > ;
OK
Explain
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Spark
      DagName: root_20190620153535_78057156-80df-4c9b-8c8e-f896ad4d74ed:53
      Vertices:
        Map 1 
            Map Operator Tree:
                TableScan
                  alias: _dummy_table
                  Row Limit Per Split: 1
                  Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: COMPLETE
                  Select Operator
                    expressions: regexp_replace(reflect('java.util.UUID','randomUUID'), '-', '') (type: string), 'andy' (type: string)
                    outputColumnNames: _col0, _col1
                    Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: COMPLETE
                    File Output Operator
                      compressed: true
                      Statistics: Num rows: 2 Data size: 2 Basic stats: COMPLETE Column stats: COMPLETE
                      table:
                          input format: org.apache.hadoop.mapred.TextInputFormat
                          output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                          serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
        Map 2 
            Map Operator Tree:
                TableScan
                  alias: _dummy_table
                  Row Limit Per Split: 1
                  Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: COMPLETE
                  Select Operator
                    expressions: regexp_replace(reflect('java.util.UUID','randomUUID'), '-', '') (type: string), 'andy' (type: string)
                    outputColumnNames: _col0, _col1
                    Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: COMPLETE
                    File Output Operator
                      compressed: true
                      Statistics: Num rows: 2 Data size: 2 Basic stats: COMPLETE Column stats: COMPLETE
                      table:
                          input format: org.apache.hadoop.mapred.TextInputFormat
                          output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                          serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

Time taken: 0.063 seconds, Fetched: 50 row(s)

从执行计划中可以看到,产生了两个map。

所以说,with as是不能减少表扫描的

为了对比,将以上语句稍作修改,然后在oracle中执行。结果如下:

关于hive中的with as 语法_第2张图片

可以看到在oracle中,sys_guid()函数只被执行了一次。

 

 

你可能感兴趣的:(hive)