Hive顶着大数据数据仓库
的头衔,那么数据分析是在常用不过的功能了,所以,而数据分析钟,聚合函数和开窗函数是最重要的两环,聚合函数在博客Hive从入门到放弃——Hive 用户内置函数简介(十一)已经介绍,而开窗函数,因为器运用的灵活性,这里单独介绍一下;
有些新手会混淆聚合函数和开窗函数的区别,聚合函数
:目是是按照某些维度聚会,计算需求需要的度量值的统计指标;开窗函数
:选取的维度本身有不同的值,这些值作为一个分组,计算这些组内的度量值的统计指标;如一个记录全校不同班级的语文成绩的数据表,聚合函数基本用来统计全校平均分,总分,最高分,最低分等,每个班平均分,总分,最高分,最低分等;而开窗函数不仅可以做到校平均分,总分,最高分,最低分等,每个班平均分,总分,最高分,最低分等,还可以每班成绩排序,全校成绩排序,从一班开始累计人数等分析,更好的实现不同维度组的度量值的统计指标,下面进入开窗函数的正题;
这里我们新建一个表ods_sale_order_producttion_amount(ods产品每天销售总额)
,其中建表语句,数据加载,数据预览如下;
-- DDL建表
CREATE TABLE `ods_sale_order_producttion_amount`(
`month_key` string COMMENT '月份',
`date_key` string COMMENT '日期',
`production_amount` decimal(18,2) COMMENT '产品总值')
COMMENT 'ods产品每天销售总额'
PARTITIONED BY (
`event_year` int,
`event_week` int,
`event_day` string,
`event_hour` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
'hdfs://dw-test-cluster/hive/warehouse/ods/sale/ods_sale_order_producttion_amount'
TBLPROPERTIES ('parquet.compression'='snappy')
-- 装入数据
insert into ods_sale_order_producttion_amount partition(event_year=2020,event_week=30,event_day='20200731',event_hour='00')
select '202005','20200501',199000.00
union all
select '202005','20200502',185000.00
union all
select '202005','20200503',199000.00
union all
select '202005','20200504',138500.00
union all
select '202005','20200505',196540.00
union all
select '202005','20200506',138500.00
union all
select '202005','20200507',159840.00
union all
select '202005','20200508',189462.00
union all
select '202005','20200509',200000.00
union all
select '202005','20200510',198540.00
union all
select '202006','20200601',189000.00
union all
select '202006','20200602',185000.00
union all
select '202006','20200603',189000.00
union all
select '202006','20200604',158500.00
union all
select '202006','20200605',200140.00
union all
select '202006','20200606',158500.00
union all
select '202006','20200607',198420.00
union all
select '202006','20200608',158500.00
union all
select '202006','20200609',200100.00
union all
select '202006','20200610',135480.00
-- 数据预览
month_key date_key production_amount event_year event_week event_day event_hour
202005 20200501 199000.00 2020 30 20200731 00
202005 20200502 185000.00 2020 30 20200731 00
202005 20200503 199000.00 2020 30 20200731 00
202005 20200504 138500.00 2020 30 20200731 00
202005 20200505 196540.00 2020 30 20200731 00
202005 20200506 138500.00 2020 30 20200731 00
202005 20200507 159840.00 2020 30 20200731 00
202005 20200508 189462.00 2020 30 20200731 00
202005 20200509 200000.00 2020 30 20200731 00
202005 20200510 198540.00 2020 30 20200731 00
202006 20200601 189000.00 2020 30 20200731 00
202006 20200602 185000.00 2020 30 20200731 00
202006 20200603 189000.00 2020 30 20200731 00
202006 20200604 158500.00 2020 30 20200731 00
202006 20200605 200140.00 2020 30 20200731 00
202006 20200606 158500.00 2020 30 20200731 00
202006 20200607 198420.00 2020 30 20200731 00
202006 20200608 158500.00 2020 30 20200731 00
202006 20200609 200100.00 2020 30 20200731 00
202006 20200610 135480.00 2020 30 20200731 00
Time taken: 0.233 seconds, Fetched: 20 row(s)
ROW_NUMBER
:ROW_NUMBER() OVER([PARTITION BY col1.col2,…] order by col1.col2,… ASC|DESC)
等于
最终的组内参与排名的行总数;RANK
RANK() OVER([PARTITION BY col1.col2,…] order by col1.col2,… ASC|DESC)
会
跳过上组排名重复的个数,可能出现间断的名次,即最终的组内排名值等于
最终的组内参与排名的行总数;DENSE_RANK
DENSE_RANK() OVER([PARTITION BY col1.col2,…] order by col1.col2,… ASC|DESC)
不会
跳过上组排名重复的个数,即最终的组内排名小于等于
最终的组内参与排名的行总数;CUME_DIST
CUME_DIST() OVER([PARTITION BY col1.col2,…] order by col1.col2,… ASC|DESC)
(0,1](大于0,小于等于1)
,排序列的值一样,组内百分比排名一样,值的计算上等于小于等于该列组内RANK()值的个数/组内总行数,
,即最终的组内排名百分比总是等于1.0
;PERCENT_RANK
PERCENT_RANK() OVER([PARTITION BY col1.col2,…] order by col1.col2,… ASC|DESC)
[0,1](大于等于0,小于等于1)
,而且百分比排名总是从0开始排的,排序列的值一样,组内百分比排名一样,而且一定是从等于0开始往后排的,值的计算上等于(组内当前行的RANK值-1)/(组内总行数-1)
,即最终的组内排名百分比不一定等于1.0
,当组内最大的RANK值也发生了重复的时候,则不会等于1.0,否则等于1.0;NTILE
NTILE(num) OVER([PARTITION BY col1.col2,…] order by col1.col2,… ASC|DESC)
NTILE()
处理后的返回值的小于等于m即可;hive> set hive.cli.print.header=true;
hive> select
>
> ROW_NUMBER() OVER(PARTITION BY month_key order by production_amount DESC) row_number_result
> ,RANK() OVER(PARTITION BY month_key order by production_amount DESC) rank_result
> ,DENSE_RANK() OVER(PARTITION BY month_key order by production_amount DESC) dense_rank_result
> ,month_key
> ,date_key
> ,production_amount
> from ods_sale_order_producttion_amount where event_day='20200731';
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20200721163446_108a897b-ddf1-46ae-9bd8-a81a2afec8c9
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1592876386879_0123, Tracking URL = http://dw-test-cluster-007:8088/proxy/application_1592876386879_0123/
Kill Command = /usr/local/tools/hadoop/current//bin/hadoop job -kill job_1592876386879_0123
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2020-07-21 16:35:03,990 Stage-1 map = 0%, reduce = 0%
2020-07-21 16:35:12,456 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 4.76 sec
2020-07-21 16:35:17,720 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 6.95 sec
MapReduce Total cumulative CPU time: 6 seconds 950 msec
Ended Job = job_1592876386879_0123
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 6.95 sec HDFS Read: 14952 HDFS Write: 970 SUCCESS
Total MapReduce CPU Time Spent: 6 seconds 950 msec
OK
Time taken: 32.507 seconds, Fetched: 20 row(s)
为了方便看结果,我把结果拉到表格内来格式规范点,看起来舒服点,具体结果如表1;
row_number_result | rank_result | dense_rank_result | cume_dist_result | percent_rank_result | ntile2_result | ntile3_result | ntile4_result | ntile7_result | month_key | date_key | production_amount |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | 1 | 1 | 0.1 | 0.0 | 1 | 1 | 1 | 1 | 202006 | 20200605 | 200140 |
2 | 2 | 2 | 0.2 | 0.1111111111111111 | 1 | 1 | 1 | 1 | 202006 | 20200609 | 200100 |
3 | 3 | 3 | 0.3 | 0.2222222222222222 | 1 | 1 | 1 | 2 | 202006 | 20200607 | 198420 |
4 | 4 | 4 | 0.5 | 0.3333333333333333 | 1 | 1 | 2 | 2 | 202006 | 20200601 | 189000 |
5 | 4 | 4 | 0.5 | 0.3333333333333333 | 1 | 2 | 2 | 3 | 202006 | 20200603 | 189000 |
6 | 6 | 5 | 0.6 | 0.5555555555555556 | 2 | 2 | 2 | 3 | 202006 | 20200602 | 185000 |
7 | 7 | 6 | 0.9 | 0.6666666666666666 | 2 | 2 | 3 | 4 | 202006 | 20200604 | 158500 |
8 | 7 | 6 | 0.9 | 0.6666666666666666 | 2 | 3 | 3 | 5 | 202006 | 20200606 | 158500 |
9 | 7 | 6 | 0.9 | 0.6666666666666666 | 2 | 3 | 4 | 6 | 202006 | 20200608 | 158500 |
10 | 10 | 7 | 1.0 | 1.0 | 2 | 3 | 4 | 7 | 202006 | 20200610 | 135480 |
1 | 1 | 1 | 0.1 | 0.0 | 1 | 1 | 1 | 1 | 202005 | 20200509 | 200000 |
2 | 2 | 2 | 0.3 | 0.1111111111111111 | 1 | 1 | 1 | 1 | 202005 | 20200501 | 199000 |
3 | 2 | 2 | 0.3 | 0.1111111111111111 | 1 | 1 | 1 | 2 | 202005 | 20200503 | 199000 |
4 | 4 | 3 | 0.4 | 0.3333333333333333 | 1 | 1 | 2 | 2 | 202005 | 20200510 | 198540 |
5 | 5 | 4 | 0.5 | 0.4444444444444444 | 1 | 2 | 2 | 3 | 202005 | 20200505 | 196540 |
6 | 6 | 5 | 0.6 | 0.5555555555555556 | 2 | 2 | 2 | 3 | 202005 | 20200508 | 189462 |
7 | 7 | 6 | 0.7 | 0.6666666666666666 | 2 | 2 | 3 | 4 | 202005 | 20200502 | 185000 |
8 | 8 | 7 | 0.8 | 0.7777777777777778 | 2 | 3 | 3 | 5 | 202005 | 20200507 | 159840 |
9 | 9 | 8 | 1.0 | 0.8888888888888888 | 2 | 3 | 4 | 6 | 202005 | 20200504 | 138500 |
10 | 9 | 8 | 1.0 | 0.8888888888888888 | 2 | 3 | 4 | 7 | 202005 | 20200506 | 138500 |
COUNT
COUNT(*) OVER([PARTITION BY col1.col2,…] )
SUM
SUM(coln) OVER([PARTITION BY col1.col2,…] )
MIN
MIN(coln) OVER([PARTITION BY col1.col2,…] )
MAX
MAX(coln) OVER([PARTITION BY col1.col2,…] )
AVG
AVG(coln) OVER([PARTITION BY col1.col2,…] )
hive> set hive.cli.print.header=true;
hive> select month_key,date_key,production_amount
> ,count(3) over(partition by month_key) as `每月份数`
> ,sum(production_amount) OVER(partition by month_key) as `每月总和`
> ,min(production_amount) OVER(partition by month_key) as `每月最小收入`
> ,max(production_amount) OVER(partition by month_key) as `每月最大收入`
> ,avg(production_amount) OVER(partition by month_key) as `每月平均收入`
> from dw.ods_sale_order_producttion_amount
> ;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20200729105357_e57ed4d0-f210-431c-8889-5a138cbb0032
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1592876386879_0139, Tracking URL = http://dw-test-cluster-007:8088/proxy/application_1592876386879_0139/
Kill Command = /usr/local/tools/hadoop/current//bin/hadoop job -kill job_1592876386879_0139
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2020-07-29 10:54:10,089 Stage-1 map = 0%, reduce = 0%
2020-07-29 10:54:19,185 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 5.93 sec
2020-07-29 10:54:25,548 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 8.35 sec
MapReduce Total cumulative CPU time: 8 seconds 350 msec
Ended Job = job_1592876386879_0139
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 8.35 sec HDFS Read: 18912 HDFS Write: 1807 SUCCESS
Total MapReduce CPU Time Spent: 8 seconds 350 msec
OK
month_key date_key production_amount 每月份数 每月总和 每月最小收入 每月最大收入 每月平均收入
202005 20200501 199000.00 10 1804382.00 138500.00 200000.00 180438.200000
202005 20200510 198540.00 10 1804382.00 138500.00 200000.00 180438.200000
202005 20200509 200000.00 10 1804382.00 138500.00 200000.00 180438.200000
202005 20200508 189462.00 10 1804382.00 138500.00 200000.00 180438.200000
202005 20200507 159840.00 10 1804382.00 138500.00 200000.00 180438.200000
202005 20200506 138500.00 10 1804382.00 138500.00 200000.00 180438.200000
202005 20200505 196540.00 10 1804382.00 138500.00 200000.00 180438.200000
202005 20200504 138500.00 10 1804382.00 138500.00 200000.00 180438.200000
202005 20200503 199000.00 10 1804382.00 138500.00 200000.00 180438.200000
202005 20200502 185000.00 10 1804382.00 138500.00 200000.00 180438.200000
202006 20200610 135480.00 10 1772640.00 135480.00 200140.00 177264.000000
202006 20200609 200100.00 10 1772640.00 135480.00 200140.00 177264.000000
202006 20200608 158500.00 10 1772640.00 135480.00 200140.00 177264.000000
202006 20200607 198420.00 10 1772640.00 135480.00 200140.00 177264.000000
202006 20200606 158500.00 10 1772640.00 135480.00 200140.00 177264.000000
202006 20200605 200140.00 10 1772640.00 135480.00 200140.00 177264.000000
202006 20200604 158500.00 10 1772640.00 135480.00 200140.00 177264.000000
202006 20200603 189000.00 10 1772640.00 135480.00 200140.00 177264.000000
202006 20200602 185000.00 10 1772640.00 135480.00 200140.00 177264.000000
202006 20200601 189000.00 10 1772640.00 135480.00 200140.00 177264.000000
Time taken: 29.364 seconds, Fetched: 20 row(s)
LEAD
LEAD(col,n,DEFAULT)
用于统计窗口内往下第n行值,第一个参数为列名,第二个参数为往下第n行(可选,默认为1),第三个参数为默认值(当往下第n行为NULL时候,取默认值,如不指定,则为NULL)LAG
LAG(col,n,DEFAULT)
,用于统计窗口内往上第n行值,第一个参数为列名,第二个参数为往上第n行(可选,默认为1),第三个参数为默认值(当往上第n行为NULL时候,取默认值,如不指定,则为NULL)FIRST_VALUE
FIRST_VALUE(coln) over (partition by col1,col2 order by col1,col2)
LAST_VALUE
LAST_VALUE(coln) over (partition by col1,col2 order by col1,col2)
截止到当前行
最后一行值;hive> set hive.cli.print.header=true;
hive> select month_key,date_key,production_amount
> ,FIRST_VALUE(production_amount) OVER(partition by month_key order by date_key) as `月初数据`
> ,LAST_VALUE(production_amount) OVER(partition by month_key) as `总月末数据`
> ,LAST_VALUE(production_amount) OVER(partition by month_key order by date_key) as `截至当前月末数据`
> ,LAG(date_key) OVER(partition by month_key order by date_key) as `LAG默认参数`
> ,LAG(date_key,1) OVER(partition by month_key order by date_key) as `LAG取date_key值向上移动1位`
> ,LAG(date_key,2,'19000101') OVER(partition by month_key order by date_key) as `LAG取date_key值向上移动2位,取到null用19000101代替`
> ,LEAD(date_key) OVER(partition by month_key order by date_key) as `LEAD默认参数`
> ,LEAD(date_key,1) OVER(partition by month_key order by date_key) as `LEAD取date_key值向上移动1位`
> ,LEAD(date_key,2,'19000101') OVER(partition by month_key order by date_key) as `LEAD取date_key值向上移动1位,取到null用19000101代替`
> from dw.ods_sale_order_producttion_amount
> ;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20200729113052_9e57c0fe-5c46-4589-9406-f394787885bc
Total jobs = 2
Launching Job 1 out of 2
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1592876386879_0144, Tracking URL = http://dw-test-cluster-007:8088/proxy/application_1592876386879_0144/
Kill Command = /usr/local/tools/hadoop/current//bin/hadoop job -kill job_1592876386879_0144
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2020-07-29 11:30:59,931 Stage-1 map = 0%, reduce = 0%
2020-07-29 11:31:06,206 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 5.27 sec
2020-07-29 11:31:12,498 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 7.62 sec
MapReduce Total cumulative CPU time: 7 seconds 620 msec
Ended Job = job_1592876386879_0144
Launching Job 2 out of 2
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1592876386879_0145, Tracking URL = http://dw-test-cluster-007:8088/proxy/application_1592876386879_0145/
Kill Command = /usr/local/tools/hadoop/current//bin/hadoop job -kill job_1592876386879_0145
Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 1
2020-07-29 11:31:24,159 Stage-2 map = 0%, reduce = 0%
2020-07-29 11:31:30,442 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 2.7 sec
2020-07-29 11:31:36,698 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 4.97 sec
MapReduce Total cumulative CPU time: 4 seconds 970 msec
Ended Job = job_1592876386879_0145
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 7.62 sec HDFS Read: 20982 HDFS Write: 2084 SUCCESS
Stage-Stage-2: Map: 1 Reduce: 1 Cumulative CPU: 4.97 sec HDFS Read: 21609 HDFS Write: 2499 SUCCESS
Total MapReduce CPU Time Spent: 12 seconds 590 msec
OK
month_key date_key production_amount 月初数据 总月末数据 截至当前月末数据 lag默认参数 lag取date_key值向上移动1位 lag取date_key值向上移动2位,取到null用19000101代替 lead默认参数 lead取date_key值向上移动1位 lead取date_key值向上移动1位,取到null用19000101代替
202005 20200501 199000.00 199000.00 185000.00 199000.00 NULL NULL 19000101 20200502 20200502 20200503
202005 20200510 198540.00 199000.00 185000.00 198540.00 20200509 20200509 20200508 NULL NULL 19000101
202005 20200509 200000.00 199000.00 185000.00 200000.00 20200508 20200508 20200507 20200510 20200510 19000101
202005 20200508 189462.00 199000.00 185000.00 189462.00 20200507 20200507 20200506 20200509 20200509 20200510
202005 20200507 159840.00 199000.00 185000.00 159840.00 20200506 20200506 20200505 20200508 20200508 20200509
202005 20200506 138500.00 199000.00 185000.00 138500.00 20200505 20200505 20200504 20200507 20200507 20200508
202005 20200505 196540.00 199000.00 185000.00 196540.00 20200504 20200504 20200503 20200506 20200506 20200507
202005 20200504 138500.00 199000.00 185000.00 138500.00 20200503 20200503 20200502 20200505 20200505 20200506
202005 20200503 199000.00 199000.00 185000.00 199000.00 20200502 20200502 20200501 20200504 20200504 20200505
202005 20200502 185000.00 199000.00 185000.00 185000.00 20200501 20200501 19000101 20200503 20200503 20200504
202006 20200610 135480.00 189000.00 189000.00 135480.00 20200609 20200609 20200608 NULL NULL 19000101
202006 20200609 200100.00 189000.00 189000.00 200100.00 20200608 20200608 20200607 20200610 20200610 19000101
202006 20200608 158500.00 189000.00 189000.00 158500.00 20200607 20200607 20200606 20200609 20200609 20200610
202006 20200607 198420.00 189000.00 189000.00 198420.00 20200606 20200606 20200605 20200608 20200608 20200609
202006 20200606 158500.00 189000.00 189000.00 158500.00 20200605 20200605 20200604 20200607 20200607 20200608
202006 20200605 200140.00 189000.00 189000.00 200140.00 20200604 20200604 20200603 20200606 20200606 20200607
202006 20200604 158500.00 189000.00 189000.00 158500.00 20200603 20200603 20200602 20200605 20200605 20200606
202006 20200603 189000.00 189000.00 189000.00 189000.00 20200602 20200602 20200601 20200604 20200604 20200605
202006 20200602 185000.00 189000.00 189000.00 185000.00 20200601 20200601 19000101 20200603 20200603 20200604
202006 20200601 189000.00 189000.00 189000.00 189000.00 NULL NULL 19000101 20200602 20200602 20200603
Time taken: 45.744 seconds, Fetched: 20 row(s)
LAG和LEAD默认的第二个参数是1,第三个参数是null
OVER(partition by col1 order by col2),加了更细力度order by的列后,就可以做到截止到目前列的数据,这个在扩展中可以详细讲;
另一种情况的运用,就是累计每个月截止到目前行的业绩总和,平均业绩,最大业绩等等,这里以求总和为例子,就是我们经常说的MTD业绩,如按照业绩从小到大排列,求截至目前行的总值,具体如下图所示:
这里可以结合RANGE
或者ROWS
的来使用,RANGE
和ROWS
还是有点区别的,只不过说这个列子不明显,可参考下图的一个列子,RANGE
会把相同partition和order by的值显示为一样的最终结果,但是ROWS
不会,Hive Cli代码和效果如下;
hive> set hive.cli.print.header=true;
hive> select month_key,date_key,production_amount
> ,sum(production_amount) OVER(partition by month_key order by production_amount RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as `RANGE每月总和`
> ,sum(production_amount) OVER(partition by month_key order by production_amount ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as `ROWS每月总和`
>
> ,min(production_amount) OVER(partition by month_key order by production_amount RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as `RANGE每月最小收入`
> ,max(production_amount) OVER(partition by month_key order by production_amount RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as `RANGE每月最大收入`
> ,avg(production_amount) OVER(partition by month_key order by production_amount RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as `RANGE每月平均收入`
> ,min(production_amount) OVER(partition by month_key order by production_amount ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as `ROWS每月最小收入`
> ,max(production_amount) OVER(partition by month_key order by production_amount ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as `ROWS每月最大收入`
> ,avg(production_amount) OVER(partition by month_key order by production_amount ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as `ROWS每月平均收入`
> from dw.ods_sale_order_producttion_amount
> ;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20200729151835_4e818a96-4447-4fad-aae5-8e14371e2ba6
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1592876386879_0152, Tracking URL = http://dw-test-cluster-007:8088/proxy/application_1592876386879_0152/
Kill Command = /usr/local/tools/hadoop/current//bin/hadoop job -kill job_1592876386879_0152
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2020-07-29 15:18:43,001 Stage-1 map = 0%, reduce = 0%
2020-07-29 15:18:53,853 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 5.91 sec
2020-07-29 15:19:01,158 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 8.56 sec
MapReduce Total cumulative CPU time: 8 seconds 560 msec
Ended Job = job_1592876386879_0152
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 8.56 sec HDFS Read: 22462 HDFS Write: 2646 SUCCESS
Total MapReduce CPU Time Spent: 8 seconds 560 msec
OK
month_key date_key production_amount range每月总和 rows每月总和 range每月最小收入 range每月最大收入 range每月平均收入 rows每月最小收入 rows每月最大收入 rows每月平均收入
202005 20200504 138500.00 277000.00 138500.00 138500.00 138500.00 138500.000000 138500.00 138500.00 138500.000000
202005 20200506 138500.00 277000.00 277000.00 138500.00 138500.00 138500.000000 138500.00 138500.00 138500.000000
202005 20200507 159840.00 436840.00 436840.00 138500.00 159840.00 145613.333333 138500.00 159840.00 145613.333333
202005 20200502 185000.00 621840.00 621840.00 138500.00 185000.00 155460.000000 138500.00 185000.00 155460.000000
202005 20200508 189462.00 811302.00 811302.00 138500.00 189462.00 162260.400000 138500.00 189462.00 162260.400000
202005 20200505 196540.00 1007842.00 1007842.00 138500.00 196540.00 167973.666667 138500.00 196540.00 167973.666667
202005 20200510 198540.00 1206382.00 1206382.00 138500.00 198540.00 172340.285714 138500.00 198540.00 172340.285714
202005 20200501 199000.00 1604382.00 1405382.00 138500.00 199000.00 178264.666667 138500.00 199000.00 175672.750000
202005 20200503 199000.00 1604382.00 1604382.00 138500.00 199000.00 178264.666667 138500.00 199000.00 178264.666667
202005 20200509 200000.00 1804382.00 1804382.00 138500.00 200000.00 180438.200000 138500.00 200000.00 180438.200000
202006 20200610 135480.00 135480.00 135480.00 135480.00 135480.00 135480.000000 135480.00 135480.00 135480.000000
202006 20200604 158500.00 610980.00 293980.00 135480.00 158500.00 152745.000000 135480.00 158500.00 146990.000000
202006 20200606 158500.00 610980.00 452480.00 135480.00 158500.00 152745.000000 135480.00 158500.00 150826.666667
202006 20200608 158500.00 610980.00 610980.00 135480.00 158500.00 152745.000000 135480.00 158500.00 152745.000000
202006 20200602 185000.00 795980.00 795980.00 135480.00 185000.00 159196.000000 135480.00 185000.00 159196.000000
202006 20200601 189000.00 1173980.00 984980.00 135480.00 189000.00 167711.428571 135480.00 189000.00 164163.333333
202006 20200603 189000.00 1173980.00 1173980.00 135480.00 189000.00 167711.428571 135480.00 189000.00 167711.428571
202006 20200607 198420.00 1372400.00 1372400.00 135480.00 198420.00 171550.000000 135480.00 198420.00 171550.000000
202006 20200609 200100.00 1572500.00 1572500.00 135480.00 200100.00 174722.222222 135480.00 200100.00 174722.222222
202006 20200605 200140.00 1772640.00 1772640.00 135480.00 200140.00 177264.000000 135480.00 200140.00 177264.000000
Time taken: 26.882 seconds, Fetched: 20 row(s)
rows和range后面的常用参数如下表2:
UNBOUNDED PRECEDING | The window starts at the first row of the partition. |
UNBOUNDED FOLLOWING | The window ends at the last row of the partition. |
CURRENT ROW | window begins at the current row or ends at the current row |
n PRECEDING or n FOLLOWING | The window starts or ends n rows before or after the current row.for example,ROWS BETWEEN Unbounded preceding AND 1 Preceding 1,means that the window goes from the first row of the partition to the row that stands (in the ordered set) immediatly before the current row… |
Hive开窗函数也支持OLAP,根据不同维度上钻和下钻的指标统计,达到数据分析的效果,常用的OLAP函数如下;
GROUPING SETS GROUPING__ID
hive> set hive.cli.print.header=true;
hive> select month_key,date_key
> ,sum(production_amount) `生产总值`
> ,GROUPING__ID
> from dw.ods_sale_order_producttion_amount
> group by month_key,date_key
> grouping sets(month_key,date_key)
> order by GROUPING__ID
> ;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20200729154417_180a7883-ed66-49e3-a52e-bdebfbf61507
Total jobs = 2
Launching Job 1 out of 2
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1592876386879_0156, Tracking URL = http://dw-test-cluster-007:8088/proxy/application_1592876386879_0156/
Kill Command = /usr/local/tools/hadoop/current//bin/hadoop job -kill job_1592876386879_0156
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2020-07-29 15:44:24,616 Stage-1 map = 0%, reduce = 0%
2020-07-29 15:44:32,981 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 5.61 sec
2020-07-29 15:44:38,197 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 7.62 sec
MapReduce Total cumulative CPU time: 7 seconds 620 msec
Ended Job = job_1592876386879_0156
Launching Job 2 out of 2
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1592876386879_0157, Tracking URL = http://dw-test-cluster-007:8088/proxy/application_1592876386879_0157/
Kill Command = /usr/local/tools/hadoop/current//bin/hadoop job -kill job_1592876386879_0157
Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 1
2020-07-29 15:44:49,818 Stage-2 map = 0%, reduce = 0%
2020-07-29 15:44:56,078 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 2.34 sec
2020-07-29 15:45:02,288 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 4.51 sec
MapReduce Total cumulative CPU time: 4 seconds 510 msec
Ended Job = job_1592876386879_0157
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 7.62 sec HDFS Read: 11977 HDFS Write: 796 SUCCESS
Stage-Stage-2: Map: 1 Reduce: 1 Cumulative CPU: 4.51 sec HDFS Read: 7133 HDFS Write: 877 SUCCESS
Total MapReduce CPU Time Spent: 12 seconds 130 msec
OK
month_key date_key 生产总值 grouping__id
202005 NULL 1804382.00 1
202006 NULL 1772640.00 1
NULL 20200601 189000.00 2
NULL 20200610 135480.00 2
NULL 20200609 200100.00 2
NULL 20200608 158500.00 2
NULL 20200607 198420.00 2
NULL 20200606 158500.00 2
NULL 20200605 200140.00 2
NULL 20200604 158500.00 2
NULL 20200603 189000.00 2
NULL 20200602 185000.00 2
NULL 20200510 198540.00 2
NULL 20200509 200000.00 2
NULL 20200508 189462.00 2
NULL 20200507 159840.00 2
NULL 20200506 138500.00 2
NULL 20200505 196540.00 2
NULL 20200504 138500.00 2
NULL 20200503 199000.00 2
NULL 20200502 185000.00 2
NULL 20200501 199000.00 2
Time taken: 45.797 seconds, Fetched: 22 row(s)
结果分析:统计不同层次维度的度量值,第1列是按照month_key进行分组,第2列是按照date_key进行分组,统计各自的生产总值,注意GROUPING__ID
表组别,是两个_
隔开,根据grouping sets(month_key,date_key)
来标识顺序,1是代表month_key,2是代表date_key。
CUBE
hive> set hive.cli.print.header=true;
hive> select month_key,date_key
> ,sum(production_amount) `生产总值`
> ,GROUPING__ID
> from dw.ods_sale_order_producttion_amount
> group by month_key,date_key
> WITH CUBE
> ORDER BY GROUPING__ID;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20200729155154_4b3bf262-89b1-4834-a6b9-a8ed0a9eafda
Total jobs = 2
Launching Job 1 out of 2
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1592876386879_0159, Tracking URL = http://dw-test-cluster-007:8088/proxy/application_1592876386879_0159/
Kill Command = /usr/local/tools/hadoop/current//bin/hadoop job -kill job_1592876386879_0159
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2020-07-29 15:52:14,628 Stage-1 map = 0%, reduce = 0%
2020-07-29 15:52:20,895 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 5.17 sec
2020-07-29 15:52:27,147 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 7.0 sec
MapReduce Total cumulative CPU time: 7 seconds 0 msec
Ended Job = job_1592876386879_0159
Launching Job 2 out of 2
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1592876386879_0161, Tracking URL = http://dw-test-cluster-007:8088/proxy/application_1592876386879_0161/
Kill Command = /usr/local/tools/hadoop/current//bin/hadoop job -kill job_1592876386879_0161
Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 1
2020-07-29 15:53:01,847 Stage-2 map = 0%, reduce = 0%
2020-07-29 15:53:08,105 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 2.92 sec
2020-07-29 15:53:14,350 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 5.13 sec
MapReduce Total cumulative CPU time: 5 seconds 130 msec
Ended Job = job_1592876386879_0161
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 7.0 sec HDFS Read: 12026 HDFS Write: 1599 SUCCESS
Stage-Stage-2: Map: 1 Reduce: 1 Cumulative CPU: 5.13 sec HDFS Read: 7936 HDFS Write: 1708 SUCCESS
Total MapReduce CPU Time Spent: 12 seconds 130 msec
OK
month_key date_key 生产总值 grouping__id
202005 20200501 199000.00 0
202006 20200610 135480.00 0
202006 20200608 158500.00 0
202006 20200607 198420.00 0
202006 20200606 158500.00 0
202006 20200605 200140.00 0
202006 20200604 158500.00 0
202006 20200603 189000.00 0
202006 20200602 185000.00 0
202006 20200601 189000.00 0
202006 20200609 200100.00 0
202005 20200510 198540.00 0
202005 20200509 200000.00 0
202005 20200508 189462.00 0
202005 20200507 159840.00 0
202005 20200506 138500.00 0
202005 20200505 196540.00 0
202005 20200504 138500.00 0
202005 20200503 199000.00 0
202005 20200502 185000.00 0
202005 NULL 1804382.00 1
202006 NULL 1772640.00 1
NULL 20200610 135480.00 2
NULL 20200609 200100.00 2
NULL 20200608 158500.00 2
NULL 20200607 198420.00 2
NULL 20200606 158500.00 2
NULL 20200605 200140.00 2
NULL 20200604 158500.00 2
NULL 20200603 189000.00 2
NULL 20200602 185000.00 2
NULL 20200601 189000.00 2
NULL 20200510 198540.00 2
NULL 20200509 200000.00 2
NULL 20200508 189462.00 2
NULL 20200507 159840.00 2
NULL 20200506 138500.00 2
NULL 20200505 196540.00 2
NULL 20200504 138500.00 2
NULL 20200503 199000.00 2
NULL 20200502 185000.00 2
NULL 20200501 199000.00 2
NULL NULL 3577022.00 3
Time taken: 80.686 seconds, Fetched: 43 row(s)
结果分析:根据GROUP BY的维度的所有组合进行聚合,统计不同维度的值;
ROLLUP
-- month_key在最左侧
hive> set hive.cli.print.header=true;
hive> select month_key,date_key
> ,sum(production_amount) `生产总值`
> ,GROUPING__ID
> from dw.ods_sale_order_producttion_amount
> group by month_key,date_key
> WITH ROLLUP
> ORDER BY GROUPING__ID;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20200729155836_b7c2669b-9763-469f-8337-03cc5b71f68e
Total jobs = 2
Launching Job 1 out of 2
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1592876386879_0163, Tracking URL = http://dw-test-cluster-007:8088/proxy/application_1592876386879_0163/
Kill Command = /usr/local/tools/hadoop/current//bin/hadoop job -kill job_1592876386879_0163
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2020-07-29 15:58:56,098 Stage-1 map = 0%, reduce = 0%
2020-07-29 15:59:03,410 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 5.12 sec
2020-07-29 15:59:08,635 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 7.25 sec
MapReduce Total cumulative CPU time: 7 seconds 250 msec
Ended Job = job_1592876386879_0163
Launching Job 2 out of 2
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1592876386879_0165, Tracking URL = http://dw-test-cluster-007:8088/proxy/application_1592876386879_0165/
Kill Command = /usr/local/tools/hadoop/current//bin/hadoop job -kill job_1592876386879_0165
Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 1
2020-07-29 15:59:42,370 Stage-2 map = 0%, reduce = 0%
2020-07-29 15:59:48,660 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 2.13 sec
2020-07-29 15:59:53,878 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 4.1 sec
MapReduce Total cumulative CPU time: 4 seconds 100 msec
Ended Job = job_1592876386879_0165
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 7.25 sec HDFS Read: 12022 HDFS Write: 959 SUCCESS
Stage-Stage-2: Map: 1 Reduce: 1 Cumulative CPU: 4.1 sec HDFS Read: 7296 HDFS Write: 988 SUCCESS
Total MapReduce CPU Time Spent: 11 seconds 350 msec
OK
month_key date_key 生产总值 grouping__id
202006 20200610 135480.00 0
202006 20200609 200100.00 0
202006 20200608 158500.00 0
202006 20200607 198420.00 0
202006 20200606 158500.00 0
202006 20200605 200140.00 0
202006 20200604 158500.00 0
202006 20200603 189000.00 0
202006 20200602 185000.00 0
202006 20200601 189000.00 0
202005 20200510 198540.00 0
202005 20200509 200000.00 0
202005 20200508 189462.00 0
202005 20200507 159840.00 0
202005 20200506 138500.00 0
202005 20200505 196540.00 0
202005 20200504 138500.00 0
202005 20200503 199000.00 0
202005 20200502 185000.00 0
202005 20200501 199000.00 0
202005 NULL 1804382.00 1
202006 NULL 1772640.00 1
NULL NULL 3577022.00 3
Time taken: 78.751 seconds, Fetched: 23 row(s)
--date_key在最左侧
hive> set hive.cli.print.header=true;
hive> select month_key,date_key
> ,sum(production_amount) `生产总值`
> ,GROUPING__ID
> from dw.ods_sale_order_producttion_amount
> group by date_key,month_key
> WITH ROLLUP
> ORDER BY GROUPING__ID;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20200729160309_ad0a688d-edd1-4017-ab14-3407c90d2054
Total jobs = 2
Launching Job 1 out of 2
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1592876386879_0166, Tracking URL = http://dw-test-cluster-007:8088/proxy/application_1592876386879_0166/
Kill Command = /usr/local/tools/hadoop/current//bin/hadoop job -kill job_1592876386879_0166
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2020-07-29 16:03:19,204 Stage-1 map = 0%, reduce = 0%
2020-07-29 16:03:25,534 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 4.87 sec
2020-07-29 16:03:30,752 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 6.77 sec
MapReduce Total cumulative CPU time: 6 seconds 770 msec
Ended Job = job_1592876386879_0166
Launching Job 2 out of 2
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1592876386879_0167, Tracking URL = http://dw-test-cluster-007:8088/proxy/application_1592876386879_0167/
Kill Command = /usr/local/tools/hadoop/current//bin/hadoop job -kill job_1592876386879_0167
Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 1
2020-07-29 16:03:41,661 Stage-2 map = 0%, reduce = 0%
2020-07-29 16:03:47,991 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 3.47 sec
2020-07-29 16:03:53,245 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 5.59 sec
MapReduce Total cumulative CPU time: 5 seconds 590 msec
Ended Job = job_1592876386879_0167
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 6.77 sec HDFS Read: 11941 HDFS Write: 1539 SUCCESS
Stage-Stage-2: Map: 1 Reduce: 1 Cumulative CPU: 5.59 sec HDFS Read: 7876 HDFS Write: 1638 SUCCESS
Total MapReduce CPU Time Spent: 12 seconds 360 msec
OK
month_key date_key 生产总值 grouping__id
202006 20200610 135480.00 0
202006 20200609 200100.00 0
202006 20200608 158500.00 0
202006 20200607 198420.00 0
202006 20200606 158500.00 0
202006 20200605 200140.00 0
202006 20200604 158500.00 0
202006 20200603 189000.00 0
202006 20200602 185000.00 0
202006 20200601 189000.00 0
202005 20200510 198540.00 0
202005 20200509 200000.00 0
202005 20200508 189462.00 0
202005 20200507 159840.00 0
202005 20200506 138500.00 0
202005 20200505 196540.00 0
202005 20200504 138500.00 0
202005 20200503 199000.00 0
202005 20200502 185000.00 0
202005 20200501 199000.00 0
NULL 20200505 196540.00 1
NULL 20200510 198540.00 1
NULL 20200610 135480.00 1
NULL 20200502 185000.00 1
NULL 20200609 200100.00 1
NULL 20200509 200000.00 1
NULL 20200608 158500.00 1
NULL 20200504 138500.00 1
NULL 20200607 198420.00 1
NULL 20200508 189462.00 1
NULL 20200606 158500.00 1
NULL 20200605 200140.00 1
NULL 20200507 159840.00 1
NULL 20200604 158500.00 1
NULL 20200503 199000.00 1
NULL 20200603 189000.00 1
NULL 20200506 138500.00 1
NULL 20200602 185000.00 1
NULL 20200501 199000.00 1
NULL 20200601 189000.00 1
NULL NULL 3577022.00 3
Time taken: 44.948 seconds, Fetched: 41 row(s)
结果分析:以最左侧的维度为主,从该维度进行层级聚合,即只会返回group by 后面第一个维度的所有聚合情况;
窗口函数还支持复杂的数学分析函数,如相关和线性回归函数
,标准差和方差函数
等,由于不太常用,这里就不一一列举了,给一个传送门,用到的时候可以去参考函数形式,改写成HiveQL就行了,传送门:复杂窗口函数;