其他窗口函数可翻看:
窗口函数之(sum、avg、max、min)
窗口函数之(row_number, rank, dense_rank)
id crtime pv
cookie1,2015-04-10,1
cookie1,2015-04-11,5
cookie1,2015-04-12,7
cookie1,2015-04-13,3
cookie1,2015-04-14,2
cookie1,2015-04-15,4
cookie1,2015-04-16,4
cookie2,2015-04-10,2
cookie2,2015-04-11,3
cookie2,2015-04-12,5
cookie2,2015-04-13,6
cookie2,2015-04-14,3
cookie2,2015-04-15,9
cookie2,2015-04-16,7
ntile(n)用于将分组数据进行切片,n代表切成多少片。相当于把数据分成几等份,如果不能均匀等份,则多出来的从第一片开始加。
比如多出来1份,则加给第一片。
比如多出来2份,则分别加给第一片和第二片。
select id,crtime,pv,
ntile(2) over(partition by id order by crtime) n2, --分2片
ntile(3) over(partition by id order by crtime) n3, --分3片
ntile(4) over(partition by id order by crtime) n4, --分4片
ntile(5) over(partition by id order by crtime) n5 --分5片
from nt;
->
id crtime pv n2 n3 n4 n5
cookie1 2015-04-10 1 1 1 1 1
cookie1 2015-04-11 5 1 1 1 1
cookie1 2015-04-12 7 1 1 2 2
cookie1 2015-04-13 3 1 2 2 2
cookie1 2015-04-14 2 2 2 3 3
cookie1 2015-04-15 4 2 3 3 4
cookie1 2015-04-16 4 2 3 4 5
cookie2 2015-04-10 2 1 1 1 1
cookie2 2015-04-11 3 1 1 1 1
cookie2 2015-04-12 5 1 1 2 2
cookie2 2015-04-13 6 1 2 2 2
cookie2 2015-04-14 3 2 2 3 3
cookie2 2015-04-15 9 2 3 3 4
cookie2 2015-04-16 7 2 3 4 5
可以看到,cookie1有7条数据,当将分组数据分成2片时,7/2余数为1份,加到第1片中,所以有4个1,3个2;
当将分组数据分成3片时,7/3余数为1份,加到第1片中,所以有3个1,2个2,2个3;
当将分组数据分成4片时,7/4余数为3份,分别加到第1,2,3片中,所以有2个1,2个2,2个3,1个4;
当将分组数据分成5片时,7/5余数为2份,分别加到第1,2片中,所以有2个1,2个2,1个3,1个4,1个5。
需求:统计cookie前1/3天的pv数有多少?
思路:前1/3天,可以使用ntile(3)分成三片,取ntile值为1的pv进行sum。
select t.id,sum(t.pv) spv from
(select id,crtime,pv,ntile(3) over(partition by id order by crtime) nt3 from nt) t
where t.nt3 = 1
group by t.id;
->
id spv
cookie1 13
cookie2 10
这几个函数经常用于时间序列,但是不支持rows between(window子句)。
lag(col,n,default):统计窗口内往上数第n行的值。
lead(col,n,default):统计窗口内往下数第n行的值。
first_value(col):求分组排序后截止到当前行的第一个值。
last_value(col):求分组排序后截止到当前行的最后一个值
select *,
lag(crtime,1,'a') over(partition by id order by crtime) lagc,
lead(crtime,2,'b') over(partition by id order by crtime) leadc,
first_value(pv) over(partition by id order by crtime) fpv,
last_value(pv) over(partition by id order by crtime) lpv
from nt;
->
id crtime pv lagc leadc fpv lpv
cookie1 2015-04-10 1 a 2015-04-12 1 1
cookie1 2015-04-11 5 2015-04-10 2015-04-13 1 5
cookie1 2015-04-12 7 2015-04-11 2015-04-14 1 7
cookie1 2015-04-13 3 2015-04-12 2015-04-15 1 3
cookie1 2015-04-14 2 2015-04-13 2015-04-16 1 2
cookie1 2015-04-15 4 2015-04-14 b 1 4
cookie1 2015-04-16 4 2015-04-15 b 1 4
cookie2 2015-04-10 2 a 2015-04-12 2 2
cookie2 2015-04-11 3 2015-04-10 2015-04-13 2 3
cookie2 2015-04-12 5 2015-04-11 2015-04-14 2 5
cookie2 2015-04-13 6 2015-04-12 2015-04-15 2 6
cookie2 2015-04-14 3 2015-04-13 2015-04-16 2 3
cookie2 2015-04-15 9 2015-04-14 b 2 9
cookie2 2015-04-16 7 2015-04-15 b 2 7
select *,
first_value(pv) over(partition by id order by crtime desc) newpv
from nt;
->
id crtime pv newpv
cookie1 2015-04-16 4 4
cookie1 2015-04-15 4 4
cookie1 2015-04-14 2 4
cookie1 2015-04-13 3 4
cookie1 2015-04-12 7 4
cookie1 2015-04-11 5 4
cookie1 2015-04-10 1 4
cookie2 2015-04-16 7 7
cookie2 2015-04-15 9 7
cookie2 2015-04-14 3 7
cookie2 2015-04-13 6 7
cookie2 2015-04-12 5 7
cookie2 2015-04-11 3 7
cookie2 2015-04-10 2 7
但是此时的crtime是倒序的,如果想升序排序,则需要加order by id,crtime
select *,
first_value(pv) over(partition by id order by crtime desc) newpv
from nt
order by id,crtime;
->
id crtime pv newpv
cookie1 2015-04-10 1 4
cookie1 2015-04-11 5 4
cookie1 2015-04-12 7 4
cookie1 2015-04-13 3 4
cookie1 2015-04-14 2 4
cookie1 2015-04-15 4 4
cookie1 2015-04-16 4 4
cookie2 2015-04-10 2 7
cookie2 2015-04-11 3 7
cookie2 2015-04-12 5 7
cookie2 2015-04-13 6 7
cookie2 2015-04-14 3 7
cookie2 2015-04-15 9 7
cookie2 2015-04-16 7 7
不排序则crtime既不是升序也不是降序
select *,
lag(pv) over(partition by id) lagc, - 默认取前1行的值,前1行没有值默认为null
lead(pv) over(partition by id) leadc - 默认取下1行的值,下1行没有值默认为null
from nt;
->
id crtime pv lagc leadc
cookie1 2015-04-10 1 NULL 4
cookie1 2015-04-16 4 1 4
cookie1 2015-04-15 4 4 2
cookie1 2015-04-14 2 4 3
cookie1 2015-04-13 3 2 7
cookie1 2015-04-12 7 3 5
cookie1 2015-04-11 5 7 NULL
cookie2 2015-04-16 7 NULL 9
cookie2 2015-04-15 9 7 3
cookie2 2015-04-14 3 9 6
cookie2 2015-04-13 6 3 5
cookie2 2015-04-12 5 6 3
cookie2 2015-04-11 3 5 2
cookie2 2015-04-10 2 3 NULL
select *,
first_value(pv) over(partition by id) fpv, -取分组的第一个值
last_value(pv) over(partition by id) lpv -取分组的最后一个值
from nt;
->
id crtime pv fpv lpv
cookie1 2015-04-10 1 1 5
cookie1 2015-04-16 4 1 5
cookie1 2015-04-15 4 1 5
cookie1 2015-04-14 2 1 5
cookie1 2015-04-13 3 1 5
cookie1 2015-04-12 7 1 5
cookie1 2015-04-11 5 1 5
cookie2 2015-04-16 7 7 2
cookie2 2015-04-15 9 7 2
cookie2 2015-04-14 3 7 2
cookie2 2015-04-13 6 7 2
cookie2 2015-04-12 5 7 2
cookie2 2015-04-11 3 7 2
cookie2 2015-04-10 2 7 2