hive窗口函数之sum,avg,min,max

在hive的统计分析中,其实窗口函数还是比较常用也重要的。
今天整理下hive中窗口函数的sum,avg,min,max,后续再整理其他常用的。

首先模拟创建一张通话记录表:字段有主叫号码,主叫时间,通话时长

> create table `call_test` (
    `pone_number` string,
    `createtime` string,   --day 
    `call_minute` int
    );
OK
Time taken: 0.369 seconds

查看下表结构

> desc call_test;
OK
pone_number         	string              	                    
createtime          	string              	                    
call_minute         	int                 	                    
Time taken: 0.149 seconds, Fetched: 3 row(s)

插入模拟数据

insert into call_test values('18600000000', '2018-12-10 13:00:00', 1);
insert into call_test values('18600000000', '2018-12-11 13:00:00', 6);
insert into call_test values('18600000000', '2018-12-12 13:00:00', 8);
insert into call_test values('18600000000', '2018-12-13 13:00:00', 4);
insert into call_test values('18600000000', '2018-12-14 13:00:00', 7);
insert into call_test values('18600000000', '2018-12-15 13:00:00', 1);
insert into call_test values('18600000000', '2018-12-16 13:00:00', 6);
insert into call_test values('18600000000', '2018-12-17 13:00:00', 8);
insert into call_test values('18600000000', '2018-12-18 13:00:00', 2);
insert into call_test values('18600000000', '2018-12-19 13:00:00', 4);
insert into call_test values('18600000000', '2018-12-20 13:00:00', 7);
insert into call_test values('18600000000', '2018-12-21 13:00:00', 1);
insert into call_test values('18600000000', '2018-12-22 13:00:00', 6);
insert into call_test values('18600000000', '2018-12-23 13:00:00', 8);
insert into call_test values('15600000000', '2018-12-10 13:00:00', 2);
insert into call_test values('15600000000', '2018-12-11 13:00:00', 4);
insert into call_test values('15600000000', '2018-12-12 13:00:00', 7);
insert into call_test values('15600000000', '2018-12-13 13:00:00', 1);
insert into call_test values('15600000000', '2018-12-14 13:00:00', 6);
insert into call_test values('15600000000', '2018-12-15 13:00:00', 8);
insert into call_test values('15600000000', '2018-12-16 13:00:00', 2);
insert into call_test values('15600000000', '2018-12-17 13:00:00', 4);
insert into call_test values('15600000000', '2018-12-18 13:00:00', 7);

SUM — 注意,结果和ORDER BY相关,默认为升序

> select pone_number,
createtime,
call_minute,
sum(call_minute) OVER(partition by pone_number order by createtime) as call_minute1, -- 默认为从起点到当前行
sum(call_minute) OVER(partition by pone_number order by createtime rows between unbounded preceding and current row) as call_minute2, --从起点到当前行,结果同call_minute1 
sum(call_minute) OVER(partition by pone_number) as call_minute3,--分组内所有行
sum(call_minute) OVER(partition by pone_number order by createtime rows between 3 preceding and current row) as call_minute4,   --当前行+往前3行
sum(call_minute) OVER(partition by pone_number order by createtime rows between 3 preceding and 1 following) as call_minute5,    --当前行+往前3行+往后1行
sum(call_minute) OVER(partition by pone_number order by createtime rows between current row and unbounded following) as call_minute6   ---当前行+往后所有行  
FROM call_test;
Query ID = hdfs_20181211000153_8870b5b2-ecaf-46aa-90f2-49a73e9e4ddf
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id application_1541064601030_38864)

----------------------------------------------------------------------------------------------
        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED  
----------------------------------------------------------------------------------------------
Map 1 .......... container     SUCCEEDED      1          1        0        0       0       0  
Reducer 2 ...... container     SUCCEEDED      1          1        0        0       0       0  
Reducer 3 ...... container     SUCCEEDED      1          1        0        0       0       0  
----------------------------------------------------------------------------------------------
VERTICES: 03/03  [==========================>>] 100%  ELAPSED TIME: 0.66 s     
----------------------------------------------------------------------------------------------
OK
+--------------+----------------------+--------------+---------------+---------------+---------------+---------------+---------------+---------------+--+
| pone_number  |      createtime      | call_minute  | call_minute1  | call_minute2  | call_minute3  | call_minute4  | call_minute5  | call_minute6  |
+--------------+----------------------+--------------+---------------+---------------+---------------+---------------+---------------+---------------+--+
| 15600000000  | 2018-12-14 13:00:00  | 6            | 20            | 20            | 41            | 18            | 26            | 27            |
| 15600000000  | 2018-12-13 13:00:00  | 1            | 14            | 14            | 41            | 14            | 20            | 28            |
| 15600000000  | 2018-12-12 13:00:00  | 7            | 13            | 13            | 41            | 13            | 14            | 35            |
| 15600000000  | 2018-12-11 13:00:00  | 4            | 6             | 6             | 41            | 6             | 13            | 39            |
| 15600000000  | 2018-12-10 13:00:00  | 2            | 2             | 2             | 41            | 2             | 6             | 41            |
| 15600000000  | 2018-12-18 13:00:00  | 7            | 41            | 41            | 41            | 21            | 21            | 7             |
| 15600000000  | 2018-12-17 13:00:00  | 4            | 34            | 34            | 41            | 20            | 27            | 11            |
| 15600000000  | 2018-12-16 13:00:00  | 2            | 30            | 30            | 41            | 17            | 21            | 13            |
| 15600000000  | 2018-12-15 13:00:00  | 8            | 28            | 28            | 41            | 22            | 24            | 21            |
| 18600000000  | 2018-12-23 13:00:00  | 8            | 69            | 69            | 69            | 22            | 22            | 8             |
| 18600000000  | 2018-12-22 13:00:00  | 6            | 61            | 61            | 69            | 18            | 26            | 14            |
| 18600000000  | 2018-12-21 13:00:00  | 1            | 55            | 55            | 69            | 14            | 20            | 15            |
| 18600000000  | 2018-12-20 13:00:00  | 7            | 54            | 54            | 69            | 21            | 22            | 22            |
| 18600000000  | 2018-12-19 13:00:00  | 4            | 47            | 47            | 69            | 20            | 27            | 26            |
| 18600000000  | 2018-12-18 13:00:00  | 2            | 43            | 43            | 69            | 17            | 21            | 28            |
| 18600000000  | 2018-12-17 13:00:00  | 8            | 41            | 41            | 69            | 22            | 24            | 36            |
| 18600000000  | 2018-12-16 13:00:00  | 6            | 33            | 33            | 69            | 18            | 26            | 42            |
| 18600000000  | 2018-12-15 13:00:00  | 1            | 27            | 27            | 69            | 20            | 26            | 43            |
| 18600000000  | 2018-12-14 13:00:00  | 7            | 26            | 26            | 69            | 25            | 26            | 50            |
| 18600000000  | 2018-12-13 13:00:00  | 4            | 19            | 19            | 69            | 19            | 26            | 54            |
| 18600000000  | 2018-12-11 13:00:00  | 6            | 7             | 7             | 69            | 7             | 15            | 68            |
| 18600000000  | 2018-12-10 13:00:00  | 1            | 1             | 1             | 69            | 1             | 7             | 69            |
| 18600000000  | 2018-12-12 13:00:00  | 8            | 15            | 15            | 69            | 15            | 19            | 62            |
+--------------+----------------------+--------------+---------------+---------------+---------------+---------------+---------------+---------------+--+
Time taken: 1.14 seconds, Fetched: 23 row(s)

解释:
call_minute1: 分组内从起点到当前行的call_minute累积,如,11号的call_minute1=10号的call_minute+11号的call_minute, 12号=10号+11号+12号
call_minute2: 同call_minute1
call_minute3: 分组内(call_minute1)所有的call_minute累加
call_minute4: 分组内当前行+往前3行,如,11号=10号+11号, 12号=10号+11号+12号, 13号=10号+11号+12号+13号, 14号=11号+12号+13号+14号
call_minute5: 分组内当前行+往前3行+往后1行,如,14号=11号+12号+13号+14号+15号
call_minute6: 分组内当前行+往后所有行,如,13号=13号+14号+15号+16号,14号=14号+15号+16号

如果不指定rows between,默认为从起点到当前行;
如果不指定order by,则将分组内所有值累加;
关键是理解rows between含义,也叫做window子句:
preceding:往前
following:往后
current row:当前行
unbounded:起点,unbounded preceding 表示从前面的起点, unbounded following:表示到后面的终点

其他avg,min,max,和sum用法一样。

AVG

> select pone_number,
createtime,
call_minute,
round(avg(call_minute) OVER(partition by pone_number order by createtime), 2) as call_minute1, -- 默认为从起点到当前行
round(avg(call_minute) OVER(partition by pone_number order by createtime rows between unbounded preceding and current row), 2) as call_minute2, --从起点到当前行,结果同call_minute1 
round(avg(call_minute) OVER(partition by pone_number), 2) as call_minute3,--分组内所有行
round(avg(call_minute) OVER(partition by pone_number order by createtime rows between 3 preceding and current row), 2) as call_minute4,   --当前行+往前3行
round(avg(call_minute) OVER(partition by pone_number order by createtime rows between 3 preceding and 1 following), 2) as call_minute5,    --当前行+往前3行+往后1行
round(avg(call_minute) OVER(partition by pone_number order by createtime rows between current row and unbounded following), 2) as call_minute6   ---当前行+往后所有行  
FROM call_test; 
Query ID = hdfs_20181211000203_53ab6fb6-628c-4ac8-81aa-244c73b701f0
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id application_1541064601030_38864)

----------------------------------------------------------------------------------------------
        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED  
----------------------------------------------------------------------------------------------
Map 1 .......... container     SUCCEEDED      1          1        0        0       0       0  
Reducer 2 ...... container     SUCCEEDED      1          1        0        0       0       0  
Reducer 3 ...... container     SUCCEEDED      1          1        0        0       0       0  
----------------------------------------------------------------------------------------------
VERTICES: 03/03  [==========================>>] 100%  ELAPSED TIME: 4.04 s     
----------------------------------------------------------------------------------------------
OK
+--------------+----------------------+--------------+---------------+---------------+---------------+---------------+---------------+---------------+--+
| pone_number  |      createtime      | call_minute  | call_minute1  | call_minute2  | call_minute3  | call_minute4  | call_minute5  | call_minute6  |
+--------------+----------------------+--------------+---------------+---------------+---------------+---------------+---------------+---------------+--+
| 15600000000  | 2018-12-14 13:00:00  | 6            | 4.0           | 4.0           | 4.56          | 4.5           | 5.2           | 5.4           |
| 15600000000  | 2018-12-13 13:00:00  | 1            | 3.5           | 3.5           | 4.56          | 3.5           | 4.0           | 4.67          |
| 15600000000  | 2018-12-12 13:00:00  | 7            | 4.33          | 4.33          | 4.56          | 4.33          | 3.5           | 5.0           |
| 15600000000  | 2018-12-11 13:00:00  | 4            | 3.0           | 3.0           | 4.56          | 3.0           | 4.33          | 4.88          |
| 15600000000  | 2018-12-10 13:00:00  | 2            | 2.0           | 2.0           | 4.56          | 2.0           | 3.0           | 4.56          |
| 15600000000  | 2018-12-18 13:00:00  | 7            | 4.56          | 4.56          | 4.56          | 5.25          | 5.25          | 7.0           |
| 15600000000  | 2018-12-17 13:00:00  | 4            | 4.25          | 4.25          | 4.56          | 5.0           | 5.4           | 5.5           |
| 15600000000  | 2018-12-16 13:00:00  | 2            | 4.29          | 4.29          | 4.56          | 4.25          | 4.2           | 4.33          |
| 15600000000  | 2018-12-15 13:00:00  | 8            | 4.67          | 4.67          | 4.56          | 5.5           | 4.8           | 5.25          |
| 18600000000  | 2018-12-23 13:00:00  | 8            | 4.93          | 4.93          | 4.93          | 5.5           | 5.5           | 8.0           |
| 18600000000  | 2018-12-22 13:00:00  | 6            | 4.69          | 4.69          | 4.93          | 4.5           | 5.2           | 7.0           |
| 18600000000  | 2018-12-21 13:00:00  | 1            | 4.58          | 4.58          | 4.93          | 3.5           | 4.0           | 5.0           |
| 18600000000  | 2018-12-20 13:00:00  | 7            | 4.91          | 4.91          | 4.93          | 5.25          | 4.4           | 5.5           |
| 18600000000  | 2018-12-19 13:00:00  | 4            | 4.7           | 4.7           | 4.93          | 5.0           | 5.4           | 5.2           |
| 18600000000  | 2018-12-18 13:00:00  | 2            | 4.78          | 4.78          | 4.93          | 4.25          | 4.2           | 4.67          |
| 18600000000  | 2018-12-17 13:00:00  | 8            | 5.13          | 5.13          | 4.93          | 5.5           | 4.8           | 5.14          |
| 18600000000  | 2018-12-16 13:00:00  | 6            | 4.71          | 4.71          | 4.93          | 4.5           | 5.2           | 5.25          |
| 18600000000  | 2018-12-15 13:00:00  | 1            | 4.5           | 4.5           | 4.93          | 5.0           | 5.2           | 4.78          |
| 18600000000  | 2018-12-14 13:00:00  | 7            | 5.2           | 5.2           | 4.93          | 6.25          | 5.2           | 5.0           |
| 18600000000  | 2018-12-13 13:00:00  | 4            | 4.75          | 4.75          | 4.93          | 4.75          | 5.2           | 4.91          |
| 18600000000  | 2018-12-11 13:00:00  | 6            | 3.5           | 3.5           | 4.93          | 3.5           | 5.0           | 5.23          |
| 18600000000  | 2018-12-10 13:00:00  | 1            | 1.0           | 1.0           | 4.93          | 1.0           | 3.5           | 4.93          |
| 18600000000  | 2018-12-12 13:00:00  | 8            | 5.0           | 5.0           | 4.93          | 5.0           | 4.75          | 5.17          |
+--------------+----------------------+--------------+---------------+---------------+---------------+---------------+---------------+---------------+--+
Time taken: 4.55 seconds, Fetched: 23 row(s)

MIN

> select pone_number,
createtime,
call_minute,
min(call_minute) OVER(partition by pone_number order by createtime) as call_minute1, -- 默认为从起点到当前行
min(call_minute) OVER(partition by pone_number order by createtime rows between unbounded preceding and current row) as call_minute2, --从起点到当前行,结果同call_minute1 
min(call_minute) OVER(partition by pone_number) as call_minute3,--分组内所有行
min(call_minute) OVER(partition by pone_number order by createtime rows between 3 preceding and current row) as call_minute4,   --当前行+往前3行
min(call_minute) OVER(partition by pone_number order by createtime rows between 3 preceding and 1 following) as call_minute5,    --当前行+往前3行+往后1行
min(call_minute) OVER(partition by pone_number order by createtime rows between current row and unbounded following) as call_minute6   ---当前行+往后所有行  
FROM call_test;
Query ID = hdfs_20181211000210_2e8b0633-0e95-4ace-a964-79ed946da362
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id application_1541064601030_38864)

----------------------------------------------------------------------------------------------
        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED  
----------------------------------------------------------------------------------------------
Map 1 .......... container     SUCCEEDED      1          1        0        0       0       0  
Reducer 2 ...... container     SUCCEEDED      1          1        0        0       0       0  
Reducer 3 ...... container     SUCCEEDED      1          1        0        0       0       0  
----------------------------------------------------------------------------------------------
VERTICES: 03/03  [==========================>>] 100%  ELAPSED TIME: 0.31 s     
----------------------------------------------------------------------------------------------
OK
+--------------+----------------------+--------------+---------------+---------------+---------------+---------------+---------------+---------------+--+
| pone_number  |      createtime      | call_minute  | call_minute1  | call_minute2  | call_minute3  | call_minute4  | call_minute5  | call_minute6  |
+--------------+----------------------+--------------+---------------+---------------+---------------+---------------+---------------+---------------+--+
| 15600000000  | 2018-12-14 13:00:00  | 6            | 1             | 1             | 1             | 1             | 1             | 2             |
| 15600000000  | 2018-12-13 13:00:00  | 1            | 1             | 1             | 1             | 1             | 1             | 1             |
| 15600000000  | 2018-12-12 13:00:00  | 7            | 2             | 2             | 1             | 2             | 1             | 1             |
| 15600000000  | 2018-12-11 13:00:00  | 4            | 2             | 2             | 1             | 2             | 2             | 1             |
| 15600000000  | 2018-12-10 13:00:00  | 2            | 2             | 2             | 1             | 2             | 2             | 1             |
| 15600000000  | 2018-12-18 13:00:00  | 7            | 1             | 1             | 1             | 2             | 2             | 7             |
| 15600000000  | 2018-12-17 13:00:00  | 4            | 1             | 1             | 1             | 2             | 2             | 4             |
| 15600000000  | 2018-12-16 13:00:00  | 2            | 1             | 1             | 1             | 1             | 1             | 2             |
| 15600000000  | 2018-12-15 13:00:00  | 8            | 1             | 1             | 1             | 1             | 1             | 2             |
| 18600000000  | 2018-12-23 13:00:00  | 8            | 1             | 1             | 1             | 1             | 1             | 8             |
| 18600000000  | 2018-12-22 13:00:00  | 6            | 1             | 1             | 1             | 1             | 1             | 6             |
| 18600000000  | 2018-12-21 13:00:00  | 1            | 1             | 1             | 1             | 1             | 1             | 1             |
| 18600000000  | 2018-12-20 13:00:00  | 7            | 1             | 1             | 1             | 2             | 1             | 1             |
| 18600000000  | 2018-12-19 13:00:00  | 4            | 1             | 1             | 1             | 2             | 2             | 1             |
| 18600000000  | 2018-12-18 13:00:00  | 2            | 1             | 1             | 1             | 1             | 1             | 1             |
| 18600000000  | 2018-12-17 13:00:00  | 8            | 1             | 1             | 1             | 1             | 1             | 1             |
| 18600000000  | 2018-12-16 13:00:00  | 6            | 1             | 1             | 1             | 1             | 1             | 1             |
| 18600000000  | 2018-12-15 13:00:00  | 1            | 1             | 1             | 1             | 1             | 1             | 1             |
| 18600000000  | 2018-12-14 13:00:00  | 7            | 1             | 1             | 1             | 4             | 1             | 1             |
| 18600000000  | 2018-12-13 13:00:00  | 4            | 1             | 1             | 1             | 1             | 1             | 1             |
| 18600000000  | 2018-12-11 13:00:00  | 6            | 1             | 1             | 1             | 1             | 1             | 1             |
| 18600000000  | 2018-12-10 13:00:00  | 1            | 1             | 1             | 1             | 1             | 1             | 1             |
| 18600000000  | 2018-12-12 13:00:00  | 8            | 1             | 1             | 1             | 1             | 1             | 1             |
+--------------+----------------------+--------------+---------------+---------------+---------------+---------------+---------------+---------------+--+
Time taken: 0.823 seconds, Fetched: 23 row(s)

MAX

> select pone_number,
createtime,
call_minute,
max(call_minute) OVER(partition by pone_number order by createtime) as call_minute1, -- 默认为从起点到当前行
max(call_minute) OVER(partition by pone_number order by createtime rows between unbounded preceding and current row) as call_minute2, --从起点到当前行,结果同call_minute1 
max(call_minute) OVER(partition by pone_number) as call_minute3, --分组内所有行
max(call_minute) OVER(partition by pone_number order by createtime rows between 3 preceding and current row) as call_minute4,   --当前行+往前3行
max(call_minute) OVER(partition by pone_number order by createtime rows between 3 preceding and 1 following) as call_minute5,    --当前行+往前3行+往后1行
max(call_minute) OVER(partition by pone_number order by createtime rows between current row and unbounded following) as call_minute6   ---当前行+往后所有行  
FROM call_test;
Query ID = hdfs_20181211000216_bdde124f-79b2-4d0f-b3c2-a8b7339a02a6
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id application_1541064601030_38864)

----------------------------------------------------------------------------------------------
        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED  
----------------------------------------------------------------------------------------------
Map 1 .......... container     SUCCEEDED      1          1        0        0       0       0  
Reducer 2 ...... container     SUCCEEDED      1          1        0        0       0       0  
Reducer 3 ...... container     SUCCEEDED      1          1        0        0       0       0  
----------------------------------------------------------------------------------------------
VERTICES: 03/03  [==========================>>] 100%  ELAPSED TIME: 0.34 s     
----------------------------------------------------------------------------------------------
OK
+--------------+----------------------+--------------+---------------+---------------+---------------+---------------+---------------+---------------+--+
| pone_number  |      createtime      | call_minute  | call_minute1  | call_minute2  | call_minute3  | call_minute4  | call_minute5  | call_minute6  |
+--------------+----------------------+--------------+---------------+---------------+---------------+---------------+---------------+---------------+--+
| 15600000000  | 2018-12-14 13:00:00  | 6            | 7             | 7             | 8             | 7             | 8             | 8             |
| 15600000000  | 2018-12-13 13:00:00  | 1            | 7             | 7             | 8             | 7             | 7             | 8             |
| 15600000000  | 2018-12-12 13:00:00  | 7            | 7             | 7             | 8             | 7             | 7             | 8             |
| 15600000000  | 2018-12-11 13:00:00  | 4            | 4             | 4             | 8             | 4             | 7             | 8             |
| 15600000000  | 2018-12-10 13:00:00  | 2            | 2             | 2             | 8             | 2             | 4             | 8             |
| 15600000000  | 2018-12-18 13:00:00  | 7            | 8             | 8             | 8             | 8             | 8             | 7             |
| 15600000000  | 2018-12-17 13:00:00  | 4            | 8             | 8             | 8             | 8             | 8             | 7             |
| 15600000000  | 2018-12-16 13:00:00  | 2            | 8             | 8             | 8             | 8             | 8             | 7             |
| 15600000000  | 2018-12-15 13:00:00  | 8            | 8             | 8             | 8             | 8             | 8             | 8             |
| 18600000000  | 2018-12-23 13:00:00  | 8            | 8             | 8             | 8             | 8             | 8             | 8             |
| 18600000000  | 2018-12-22 13:00:00  | 6            | 8             | 8             | 8             | 7             | 8             | 8             |
| 18600000000  | 2018-12-21 13:00:00  | 1            | 8             | 8             | 8             | 7             | 7             | 8             |
| 18600000000  | 2018-12-20 13:00:00  | 7            | 8             | 8             | 8             | 8             | 8             | 8             |
| 18600000000  | 2018-12-19 13:00:00  | 4            | 8             | 8             | 8             | 8             | 8             | 8             |
| 18600000000  | 2018-12-18 13:00:00  | 2            | 8             | 8             | 8             | 8             | 8             | 8             |
| 18600000000  | 2018-12-17 13:00:00  | 8            | 8             | 8             | 8             | 8             | 8             | 8             |
| 18600000000  | 2018-12-16 13:00:00  | 6            | 8             | 8             | 8             | 7             | 8             | 8             |
| 18600000000  | 2018-12-15 13:00:00  | 1            | 8             | 8             | 8             | 8             | 8             | 8             |
| 18600000000  | 2018-12-14 13:00:00  | 7            | 8             | 8             | 8             | 8             | 8             | 8             |
| 18600000000  | 2018-12-13 13:00:00  | 4            | 8             | 8             | 8             | 8             | 8             | 8             |
| 18600000000  | 2018-12-11 13:00:00  | 6            | 6             | 6             | 8             | 6             | 8             | 8             |
| 18600000000  | 2018-12-10 13:00:00  | 1            | 1             | 1             | 8             | 1             | 6             | 8             |
| 18600000000  | 2018-12-12 13:00:00  | 8            | 8             | 8             | 8             | 8             | 8             | 8             |
+--------------+----------------------+--------------+---------------+---------------+---------------+---------------+---------------+---------------+--+
Time taken: 0.832 seconds, Fetched: 23 row(s)

后续继续整理并分享hive其他窗口函数。。。

你可能感兴趣的:(hive)