hive窗口函数

建立一张测试表

CREATE TABLE test (cookieid STRING, create_time STRING, pv INT) 
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',';

查看表数据

select * from test;

+----------------+-------------------+----------+--+
| test.cookieid  | test.create_time  | test.pv  |
+----------------+-------------------+----------+--+
| cookieid1      | 2019-01-01        | 1        |
| cookieid1      | 2019-01-02        | 2        |
| cookieid1      | 2019-01-03        | 3        |
| cookieid1      | 2019-01-03        | 3        |
| cookieid1      | 2019-01-04        | 4        |
| cookieid2      | 2019-01-01        | 1        |
| cookieid2      | 2019-01-02        | 2        |
| cookieid2      | 2019-01-03        | 3        |
| cookieid2      | 2019-01-03        | 3        |
| cookieid2      | 2019-01-04        | 4        |
+----------------+-------------------+----------+--+
SUM(), MIN(),MAX(),AVG()等聚合函数
  • SUM(col) 对窗口期的指定列进行求和
  • MIN(col) 取窗口期的指令列的最小值
  • MAX(col) 取窗口期的指定列的最大值
  • AVG(col) 取窗口期的指定列的平均值

查询示例

select *,
SUM(pv) over (partition by cookieid order by create_time ROWS between 2 preceding and current row),
SUM(pv) over (partition by cookieid order by create_time ROWS between 2 preceding and 1 following),
SUM(pv) over (partition by cookieid order by create_time),
MIN(pv) over (partition by cookieid order by create_time),
MAX(pv) over (partition by cookieid order by create_time),
AVG(pv) over (partition by cookieid order by create_time)
 from test;

+----------------+-------------------+----------+---------+---------+---------+---------+---------+---------+--+
| test.cookieid  | test.create_time  | test.pv  | _wcol0  | _wcol1  | _wcol2  | _wcol3  | _wcol4  | _wcol5  |
+----------------+-------------------+----------+---------+---------+---------+---------+---------+---------+--+
| cookieid1      | 2019-01-01        | 1        | 1       | 3       | 1       | 1       | 1       | 1.0     |
| cookieid1      | 2019-01-02        | 2        | 3       | 6       | 3       | 1       | 2       | 1.5     |
| cookieid1      | 2019-01-03        | 3        | 6       | 9       | 9       | 1       | 3       | 2.25    |
| cookieid1      | 2019-01-03        | 3        | 8       | 12      | 9       | 1       | 3       | 2.25    |
| cookieid1      | 2019-01-04        | 4        | 10      | 10      | 13      | 1       | 4       | 2.6     |
| cookieid2      | 2019-01-01        | 1        | 1       | 3       | 1       | 1       | 1       | 1.0     |
| cookieid2      | 2019-01-02        | 2        | 3       | 6       | 3       | 1       | 2       | 1.5     |
| cookieid2      | 2019-01-03        | 3        | 6       | 9       | 9       | 1       | 3       | 2.25    |
| cookieid2      | 2019-01-03        | 3        | 8       | 12      | 9       | 1       | 3       | 2.25    |
| cookieid2      | 2019-01-04        | 4        | 10      | 10      | 13      | 1       | 4       | 2.6     |
+----------------+-------------------+----------+---------+---------+---------+---------+---------+---------+--+
NTILE(), ROW_NUMBER(), RANK(), DENSE_RANK()

NTILE(num) 根据over里的partition by 来切分分区,并将分区切分为num类
ROW_NUMBER() 根据over里的partition by 来切分分区,并为每个分区数据添加行号[不对重复数据做特殊处理]
DENSE_RANK() 根据over里的partition by 来切分分区,并为每个分区数据添加行号[重复数据取相同行号,但保证行号连续]
RANK()根据over里的partition by 来切分分区,并为每个分区数据添加行号[重复数据取相同行号,如果有n行数据重复,则后一行数据行号为 x+n ]

查询示例

select *,
NTILE(2) over (partition by cookieid order by create_time),
ROW_NUMBER() over (partition by cookieid order by create_time),
RANK() over (partition by cookieid order by create_time),
DENSE_RANK() over (partition by cookieid order by create_time)
from test;

+----------------+-------------------+----------+---------+---------+---------+---------+--+
| test.cookieid  | test.create_time  | test.pv  | _wcol0  | _wcol1  | _wcol2  | _wcol3  |
+----------------+-------------------+----------+---------+---------+---------+---------+--+
| cookieid1      | 2019-01-01        | 1        | 1       | 1       | 1       | 1       |
| cookieid1      | 2019-01-02        | 2        | 1       | 2       | 2       | 2       |
| cookieid1      | 2019-01-03        | 3        | 1       | 3       | 3       | 3       |
| cookieid1      | 2019-01-03        | 3        | 2       | 4       | 3       | 3       |
| cookieid1      | 2019-01-04        | 4        | 2       | 5       | 5       | 4       |
| cookieid2      | 2019-01-01        | 1        | 1       | 1       | 1       | 1       |
| cookieid2      | 2019-01-02        | 2        | 1       | 2       | 2       | 2       |
| cookieid2      | 2019-01-03        | 3        | 1       | 3       | 3       | 3       |
| cookieid2      | 2019-01-03        | 3        | 2       | 4       | 3       | 3       |
| cookieid2      | 2019-01-04        | 4        | 2       | 5       | 5       | 4       |
+----------------+-------------------+----------+---------+---------+---------+---------+--+
LAG, LEAD, FIRST_VALUE, LAST_VALUE

LAG(col,num) 对某一行进行向后错行
LEAD(col,num)对某一行进行向前错行
FIRST_VALUE(col) 取窗口期内第一次看到该列的值
LAST_VALUE(col) 取窗口期内最后一次看到该列的值

查询示例

select *,
LAG(pv,2) over (partition by cookieid order by create_time ),
LEAD(pv,2) over (partition by cookieid order by create_time),
FIRST_VALUE(pv) over (partition by cookieid order by create_time),
FIRST_VALUE(pv) over (partition by cookieid order by create_time rows between 1 preceding and current row),
LAST_VALUE(pv) over (partition by cookieid order by create_time)
from test;

+----------------+-------------------+----------+---------+---------+---------+---------+---------+--+
| test.cookieid  | test.create_time  | test.pv  | _wcol0  | _wcol1  | _wcol2  | _wcol3  | _wcol4  |
+----------------+-------------------+----------+---------+---------+---------+---------+---------+--+
| cookieid1      | 2019-01-01        | 1        | NULL    | 3       | 1       | 1       | 1       |
| cookieid1      | 2019-01-02        | 2        | NULL    | 3       | 1       | 1       | 2       |
| cookieid1      | 2019-01-03        | 3        | 1       | 4       | 1       | 2       | 3       |
| cookieid1      | 2019-01-03        | 3        | 2       | NULL    | 1       | 3       | 3       |
| cookieid1      | 2019-01-04        | 4        | 3       | NULL    | 1       | 3       | 4       |
| cookieid2      | 2019-01-01        | 1        | NULL    | 3       | 1       | 1       | 1       |
| cookieid2      | 2019-01-02        | 2        | NULL    | 3       | 1       | 1       | 2       |
| cookieid2      | 2019-01-03        | 3        | 1       | 4       | 1       | 2       | 3       |
| cookieid2      | 2019-01-03        | 3        | 2       | NULL    | 1       | 3       | 3       |
| cookieid2      | 2019-01-04        | 4        | 3       | NULL    | 1       | 3       | 4       |
+----------------+-------------------+----------+---------+---------+---------+---------+---------+--+

你可能感兴趣的:(hive)