0307-Hive窗口函数练习

0307-Hive窗口函数练习

  • 1. 案例一
    • 1.1 数据准备
    • 1.2 需求
    • 1.3 分析与实现
      • 1.3.1 查询在2017年4月份购买过的顾客及总人数
      • 1.3.2 查询顾客的购买明细及月购买总额
      • 1.3.3 上述的场景, 将每个顾客的cost按照日期进行累加
      • 1.3.4 查看顾客上次的购买时间
      • 1.3.5 查询前20%时间的订单信息

1. 案例一

1.1 数据准备

name,orderdate,cost

jack,2017-01-01,10
tony,2017-01-02,15
jack,2017-02-03,23
tony,2017-01-04,29
jack,2017-01-05,46
jack,2017-04-06,42
tony,2017-01-07,50
jack,2017-01-08,55
mart,2017-04-08,62
mart,2017-04-09,68
neil,2017-05-10,12
mart,2017-04-11,75
neil,2017-06-12,80
mart,2017-04-13,94

1.2 需求

(1)查询在2017年4月份购买过的顾客及总人数
(2)查询顾客的购买明细及月购买总额
(3)上述的场景, 将每个顾客的cost按照日期进行累加
(4)查询每个顾客上次的购买时间
(5)查询前20%时间的订单信息

1.3 分析与实现

1.3.1 查询在2017年4月份购买过的顾客及总人数

SELECT name, 
       count(*) OVER()
FROM business
WHERE substring(orderdate, 0, 7) = '2017-04' 
GROUP BY name;

1.3.2 查询顾客的购买明细及月购买总额

不是要做聚合操作, 而是要在原始数据的基础上再加一列月购买总额(按月分组);

name	orderdate	cost	sum_window_0
jack	2017-01-01	10	205
tony	2017-01-02	15	205
tony	2017-01-04	29	205
jack	2017-01-05	46	205
tony	2017-01-07	50	205
jack	2017-01-08	55	205
jack	2017-02-03	23	23
mart	2017-04-13	94	341
mart	2017-04-08	62	341
mart	2017-04-09	68	341
mart	2017-04-11	75	341
jack	2017-04-06	42	341
neil	2017-05-10	12	12
neil	2017-06-12	80	80
SELECT name,
	   orderdate,
	   cost,
	   sum(cost) OVER(PARTITION BY month(orderdate))
FROM business;

1.3.3 上述的场景, 将每个顾客的cost按照日期进行累加

各种窗口大小

SELECT name
    , orderdate
    , cost
    , sum(cost) OVER() AS sample1
    , sum(cost) OVER(PARTITION BY name) AS sample2
    , sum(cost) OVER(PARTITION BY name ORDER BY orderdate) AS sample3
    , sum(cost) OVER(PARTITION BY name ORDER BY orderdate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS sample4
    , sum(cost) OVER(PARTITION BY name ORDER BY orderdate ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) AS sample5
    , sum(cost) OVER(PARTITION BY name ORDER By orderdate ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS sample6
    , sum(cost) OVER(PARTITION BY name ORDER BY orderdate ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS sample7
FROM business;

1.3.4 查看顾客上次的购买时间

SELECT name
	, orderdate
	, cost
    , lag(orderdate, 1, '1900-01-01') OVER (PARTITION BY name ORDER BY orderdate) AS time1
    , lag(orderdate, 2) OVER (PARTITION BY name ORDER BY orderdate) AS time2
FROM business;

1.3.5 查询前20%时间的订单信息

SELECT *
FROM (
    SELECT name
    	 , orderdate
    	 , cost
    	 , ntile(5) OVER (ORDER BY orderdate) AS sorted
    FROM business
) t
WHERE sorted = 1;

你可能感兴趣的:(03,大数据开发核心技术,-,大数据仓库Hive,Hive,窗口函数)