建立一张测试表
CREATE TABLE test (cookieid STRING, create_time STRING, pv INT)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',';
查看表数据
select * from test;
+----------------+-------------------+----------+--+
| test.cookieid | test.create_time | test.pv |
+----------------+-------------------+----------+--+
| cookieid1 | 2019-01-01 | 1 |
| cookieid1 | 2019-01-02 | 2 |
| cookieid1 | 2019-01-03 | 3 |
| cookieid1 | 2019-01-03 | 3 |
| cookieid1 | 2019-01-04 | 4 |
| cookieid2 | 2019-01-01 | 1 |
| cookieid2 | 2019-01-02 | 2 |
| cookieid2 | 2019-01-03 | 3 |
| cookieid2 | 2019-01-03 | 3 |
| cookieid2 | 2019-01-04 | 4 |
+----------------+-------------------+----------+--+
查询示例:
select *,
SUM(pv) over (partition by cookieid order by create_time ROWS between 2 preceding and current row),
SUM(pv) over (partition by cookieid order by create_time ROWS between 2 preceding and 1 following),
SUM(pv) over (partition by cookieid order by create_time),
MIN(pv) over (partition by cookieid order by create_time),
MAX(pv) over (partition by cookieid order by create_time),
AVG(pv) over (partition by cookieid order by create_time)
from test;
+----------------+-------------------+----------+---------+---------+---------+---------+---------+---------+--+
| test.cookieid | test.create_time | test.pv | _wcol0 | _wcol1 | _wcol2 | _wcol3 | _wcol4 | _wcol5 |
+----------------+-------------------+----------+---------+---------+---------+---------+---------+---------+--+
| cookieid1 | 2019-01-01 | 1 | 1 | 3 | 1 | 1 | 1 | 1.0 |
| cookieid1 | 2019-01-02 | 2 | 3 | 6 | 3 | 1 | 2 | 1.5 |
| cookieid1 | 2019-01-03 | 3 | 6 | 9 | 9 | 1 | 3 | 2.25 |
| cookieid1 | 2019-01-03 | 3 | 8 | 12 | 9 | 1 | 3 | 2.25 |
| cookieid1 | 2019-01-04 | 4 | 10 | 10 | 13 | 1 | 4 | 2.6 |
| cookieid2 | 2019-01-01 | 1 | 1 | 3 | 1 | 1 | 1 | 1.0 |
| cookieid2 | 2019-01-02 | 2 | 3 | 6 | 3 | 1 | 2 | 1.5 |
| cookieid2 | 2019-01-03 | 3 | 6 | 9 | 9 | 1 | 3 | 2.25 |
| cookieid2 | 2019-01-03 | 3 | 8 | 12 | 9 | 1 | 3 | 2.25 |
| cookieid2 | 2019-01-04 | 4 | 10 | 10 | 13 | 1 | 4 | 2.6 |
+----------------+-------------------+----------+---------+---------+---------+---------+---------+---------+--+
NTILE(num) 根据over里的partition by 来切分分区,并将分区切分为num类
ROW_NUMBER() 根据over里的partition by 来切分分区,并为每个分区数据添加行号[不对重复数据做特殊处理]
DENSE_RANK() 根据over里的partition by 来切分分区,并为每个分区数据添加行号[重复数据取相同行号,但保证行号连续]
RANK()根据over里的partition by 来切分分区,并为每个分区数据添加行号[重复数据取相同行号,如果有n行数据重复,则后一行数据行号为 x+n ]
查询示例:
select *,
NTILE(2) over (partition by cookieid order by create_time),
ROW_NUMBER() over (partition by cookieid order by create_time),
RANK() over (partition by cookieid order by create_time),
DENSE_RANK() over (partition by cookieid order by create_time)
from test;
+----------------+-------------------+----------+---------+---------+---------+---------+--+
| test.cookieid | test.create_time | test.pv | _wcol0 | _wcol1 | _wcol2 | _wcol3 |
+----------------+-------------------+----------+---------+---------+---------+---------+--+
| cookieid1 | 2019-01-01 | 1 | 1 | 1 | 1 | 1 |
| cookieid1 | 2019-01-02 | 2 | 1 | 2 | 2 | 2 |
| cookieid1 | 2019-01-03 | 3 | 1 | 3 | 3 | 3 |
| cookieid1 | 2019-01-03 | 3 | 2 | 4 | 3 | 3 |
| cookieid1 | 2019-01-04 | 4 | 2 | 5 | 5 | 4 |
| cookieid2 | 2019-01-01 | 1 | 1 | 1 | 1 | 1 |
| cookieid2 | 2019-01-02 | 2 | 1 | 2 | 2 | 2 |
| cookieid2 | 2019-01-03 | 3 | 1 | 3 | 3 | 3 |
| cookieid2 | 2019-01-03 | 3 | 2 | 4 | 3 | 3 |
| cookieid2 | 2019-01-04 | 4 | 2 | 5 | 5 | 4 |
+----------------+-------------------+----------+---------+---------+---------+---------+--+
LAG(col,num) 对某一行进行向后错行
LEAD(col,num)对某一行进行向前错行
FIRST_VALUE(col) 取窗口期内第一次看到该列的值
LAST_VALUE(col) 取窗口期内最后一次看到该列的值
查询示例:
select *,
LAG(pv,2) over (partition by cookieid order by create_time ),
LEAD(pv,2) over (partition by cookieid order by create_time),
FIRST_VALUE(pv) over (partition by cookieid order by create_time),
FIRST_VALUE(pv) over (partition by cookieid order by create_time rows between 1 preceding and current row),
LAST_VALUE(pv) over (partition by cookieid order by create_time)
from test;
+----------------+-------------------+----------+---------+---------+---------+---------+---------+--+
| test.cookieid | test.create_time | test.pv | _wcol0 | _wcol1 | _wcol2 | _wcol3 | _wcol4 |
+----------------+-------------------+----------+---------+---------+---------+---------+---------+--+
| cookieid1 | 2019-01-01 | 1 | NULL | 3 | 1 | 1 | 1 |
| cookieid1 | 2019-01-02 | 2 | NULL | 3 | 1 | 1 | 2 |
| cookieid1 | 2019-01-03 | 3 | 1 | 4 | 1 | 2 | 3 |
| cookieid1 | 2019-01-03 | 3 | 2 | NULL | 1 | 3 | 3 |
| cookieid1 | 2019-01-04 | 4 | 3 | NULL | 1 | 3 | 4 |
| cookieid2 | 2019-01-01 | 1 | NULL | 3 | 1 | 1 | 1 |
| cookieid2 | 2019-01-02 | 2 | NULL | 3 | 1 | 1 | 2 |
| cookieid2 | 2019-01-03 | 3 | 1 | 4 | 1 | 2 | 3 |
| cookieid2 | 2019-01-03 | 3 | 2 | NULL | 1 | 3 | 3 |
| cookieid2 | 2019-01-04 | 4 | 3 | NULL | 1 | 3 | 4 |
+----------------+-------------------+----------+---------+---------+---------+---------+---------+--+