需求:
1、求用户明细并统计每天的用户总数
2、计算从第一天到现在的所有 score 大于80分的用户总数
3、计算每个用户到当前日期分数大于80的天数
test_window.txt数据:
20191020,11111,85
20191020,22222,83
20191020,33333,86
20191021,11111,87
20191021,22222,65
20191021,33333,98
20191022,11111,67
20191022,22222,34
20191022,33333,88
20191023,11111,99
20191023,22222,33
建表:
0: jdbc:hive2://hadoop:11240> create table test_window(logday string,userid string,score int)
. . . . . . . . . . . . . . > row format delimited
. . . . . . . . . . . . . . > fields terminated by ',';
0: jdbc:hive2://hadoop:11240> load data local inpath '/home/xiaokang/hivedata/test_window.txt'
. . . . . . . . . . . . . . > into table test_window;
0: jdbc:hive2://hadoop:11240> select * from test_window;
+---------------------+---------------------+--------------------+
| test_window.logday | test_window.userid | test_window.score |
+---------------------+---------------------+--------------------+
| 20191020 | 11111 | 85 |
| 20191020 | 22222 | 83 |
| 20191020 | 33333 | 86 |
| 20191021 | 11111 | 87 |
| 20191021 | 22222 | 65 |
| 20191021 | 33333 | 98 |
| 20191022 | 11111 | 67 |
| 20191022 | 22222 | 34 |
| 20191022 | 33333 | 88 |
| 20191023 | 11111 | 99 |
| 20191023 | 22222 | 33 |
+---------------------+---------------------+--------------------+
1、求用户明细并统计每天的用户总数
0: jdbc:hive2://hadoop:11240> select *,count()over(partition by logday)as day_total from test_window;
+---------------------+---------------------+--------------------+------------+
| test_window.logday | test_window.userid | test_window.score | day_total |
+---------------------+---------------------+--------------------+------------+
| 20191020 | 33333 | 86 | 3 |
| 20191020 | 22222 | 83 | 3 |
| 20191020 | 11111 | 85 | 3 |
| 20191021 | 33333 | 98 | 3 |
| 20191021 | 22222 | 65 | 3 |
| 20191021 | 11111 | 87 | 3 |
| 20191022 | 33333 | 88 | 3 |
| 20191022 | 22222 | 34 | 3 |
| 20191022 | 11111 | 67 | 3 |
| 20191023 | 22222 | 33 | 2 |
| 20191023 | 11111 | 99 | 2 |
+---------------------+---------------------+--------------------+------------+
2、计算从第一天到现在的所有 score 大于80分的用户总数
0: jdbc:hive2://hadoop:11240> select *,count()over(order by logday rows between unbounded preceding and current row) as total from test_window where score>80;
+---------------------+---------------------+--------------------+--------+
| test_window.logday | test_window.userid | test_window.score | total |
+---------------------+---------------------+--------------------+--------+
| 20191020 | 33333 | 86 | 1 |
| 20191020 | 22222 | 83 | 2 |
| 20191020 | 11111 | 85 | 3 |
| 20191021 | 33333 | 98 | 4 |
| 20191021 | 11111 | 87 | 5 |
| 20191022 | 33333 | 88 | 6 |
| 20191023 | 11111 | 99 | 7 |
+---------------------+---------------------+--------------------+--------+
3、计算每个用户到当前日期分数大于80的天数
0: jdbc:hive2://hadoop:11240> select *,count()over(partition by userid order by logday rows between unbounded preceding and current row) as total
. . . . . . . . . . . . . . > from test_window where score>80 order by logday,userid;
+---------------------+---------------------+--------------------+--------+
| test_window.logday | test_window.userid | test_window.score | total |
+---------------------+---------------------+--------------------+--------+
| 20191020 | 11111 | 85 | 1 |
| 20191020 | 22222 | 83 | 1 |
| 20191020 | 33333 | 86 | 1 |
| 20191021 | 11111 | 87 | 2 |
| 20191021 | 33333 | 98 | 2 |
| 20191022 | 33333 | 88 | 3 |
| 20191023 | 11111 | 99 | 3 |
+---------------------+---------------------+--------------------+--------+
需求:
1、查询在2017年4月份购买过的顾客及总人数
2、查询顾客的购买明细及月购买总额
3、查询顾客的购买明细及到目前为止每个顾客购买总金额
4、查询顾客上次的购买时间----lag()over()偏移量分析函数的运用
数据:
jack,2017-01-01,10
tony,2017-01-02,15
jack,2017-02-03,23
tony,2017-01-04,29
jack,2017-01-05,46
jack,2017-04-06,42
tony,2017-01-07,50
jack,2017-01-08,55
mart,2017-04-08,62
mart,2017-04-09,68
neil,2017-05-10,12
mart,2017-04-11,75
neil,2017-06-12,80
mart,2017-04-13,94
建表:
0: jdbc:hive2://hadoop:11240> create table business(name string,orderdate string,cost int)
. . . . . . . . . . . . . . > row format delimited
. . . . . . . . . . . . . . > fields terminated by ',';
0: jdbc:hive2://hadoop:11240> load data local inpath "/home/xiaokang/hivedata/business.txt"
. . . . . . . . . . . . . . > into table business;
0: jdbc:hive2://hadoop:11240> select * from business;
+----------------+---------------------+----------------+
| business.name | business.orderdate | business.cost |
+----------------+---------------------+----------------+
| jack | 2017-01-01 | 10 |
| tony | 2017-01-02 | 15 |
| jack | 2017-02-03 | 23 |
| tony | 2017-01-04 | 29 |
| jack | 2017-01-05 | 46 |
| jack | 2017-04-06 | 42 |
| tony | 2017-01-07 | 50 |
| jack | 2017-01-08 | 55 |
| mart | 2017-04-08 | 62 |
| mart | 2017-04-09 | 68 |
| neil | 2017-05-10 | 12 |
| mart | 2017-04-11 | 75 |
| neil | 2017-06-12 | 80 |
| mart | 2017-04-13 | 94 |
+----------------+---------------------+----------------+
1、查询在2017年4月份购买过的顾客及总人数
在本例中:
0: jdbc:hive2://hadoop:11240> select name,count(*) over()
. . . . . . . . . . . . . . > from business
. . . . . . . . . . . . . . > where substring(orderdate,1,7)='2017-04'
+-------+-----------------+
| name | count_window_0 |
+-------+-----------------+
| mart | 5 |
| mart | 5 |
| mart | 5 |
| mart | 5 |
| jack | 5 |
+-------+-----------------+
2、查询顾客的购买明细及所有顾客的购买总额
所有人的花费求和
0: jdbc:hive2://hadoop:11240> select *,sum(cost)over()
. . . . . . . . . . . . . . > from business;
+----------------+---------------------+----------------+---------------+
| business.name | business.orderdate | business.cost | sum_window_0 |
+----------------+---------------------+----------------+---------------+
| mart | 2017-04-13 | 94 | 661 |
| neil | 2017-06-12 | 80 | 661 |
| mart | 2017-04-11 | 75 | 661 |
| neil | 2017-05-10 | 12 | 661 |
| mart | 2017-04-09 | 68 | 661 |
| mart | 2017-04-08 | 62 | 661 |
| jack | 2017-01-08 | 55 | 661 |
| tony | 2017-01-07 | 50 | 661 |
| jack | 2017-04-06 | 42 | 661 |
| jack | 2017-01-05 | 46 | 661 |
| tony | 2017-01-04 | 29 | 661 |
| jack | 2017-02-03 | 23 | 661 |
| tony | 2017-01-02 | 15 | 661 |
| jack | 2017-01-01 | 10 | 661 |
+----------------+---------------------+----------------+---------------+
3、查询顾客的购买明细以及每位顾客的总花费
按人分组求和
0: jdbc:hive2://hadoop:11240> select *,sum(cost) over(distribute by name)
. . . . . . . . . . . . . . > from business;
+----------------+---------------------+----------------+---------------+
| business.name | business.orderdate | business.cost | sum_window_0 |
+----------------+---------------------+----------------+---------------+
| jack | 2017-01-05 | 46 | 176 |
| jack | 2017-01-08 | 55 | 176 |
| jack | 2017-01-01 | 10 | 176 |
| jack | 2017-04-06 | 42 | 176 |
| jack | 2017-02-03 | 23 | 176 |
| mart | 2017-04-13 | 94 | 299 |
| mart | 2017-04-11 | 75 | 299 |
| mart | 2017-04-09 | 68 | 299 |
| mart | 2017-04-08 | 62 | 299 |
| neil | 2017-05-10 | 12 | 92 |
| neil | 2017-06-12 | 80 | 92 |
| tony | 2017-01-04 | 29 | 94 |
| tony | 2017-01-02 | 15 | 94 |
| tony | 2017-01-07 | 50 | 94 |
+----------------+---------------------+----------------+---------------+
4、查询顾客的购买明细及到目前为止每个顾客购买总金额
按人分组,按时间排序,花费累加
# 方法一:
0: jdbc:hive2://hadoop:11240> select * ,sum(cost) over(distribute by name sort by orderdate)
. . . . . . . . . . . . . . > from business;
+----------------+---------------------+----------------+---------------+
| business.name | business.orderdate | business.cost | sum_window_0 |
+----------------+---------------------+----------------+---------------+
| jack | 2017-01-01 | 10 | 10 |
| jack | 2017-01-05 | 46 | 56 |
| jack | 2017-01-08 | 55 | 111 |
| jack | 2017-02-03 | 23 | 134 |
| jack | 2017-04-06 | 42 | 176 |
| mart | 2017-04-08 | 62 | 62 |
| mart | 2017-04-09 | 68 | 130 |
| mart | 2017-04-11 | 75 | 205 |
| mart | 2017-04-13 | 94 | 299 |
| neil | 2017-05-10 | 12 | 12 |
| neil | 2017-06-12 | 80 | 92 |
| tony | 2017-01-02 | 15 | 15 |
| tony | 2017-01-04 | 29 | 44 |
| tony | 2017-01-07 | 50 | 94 |
+----------------+---------------------+----------------+---------------+
#方法二:
0: jdbc:hive2://hadoop:11240> select *,
. . . . . . . . . . . . . . > sum(cost)
. . . . . . . . . . . . . . > over(partition by name
. . . . . . . . . . . . . . > order by orderdate rows between unbounded preceding and current row)
. . . . . . . . . . . . . . > as total_amount
. . . . . . . . . . . . . . > from business;
+----------------+---------------------+----------------+---------------+
| business.name | business.orderdate | business.cost | total_amount |
+----------------+---------------------+----------------+---------------+
| jack | 2017-01-01 | 10 | 10 |
| jack | 2017-01-05 | 46 | 56 |
| jack | 2017-01-08 | 55 | 111 |
| jack | 2017-02-03 | 23 | 134 |
| jack | 2017-04-06 | 42 | 176 |
| mart | 2017-04-08 | 62 | 62 |
| mart | 2017-04-09 | 68 | 130 |
| mart | 2017-04-11 | 75 | 205 |
| mart | 2017-04-13 | 94 | 299 |
| neil | 2017-05-10 | 12 | 12 |
| neil | 2017-06-12 | 80 | 92 |
| tony | 2017-01-02 | 15 | 15 |
| tony | 2017-01-04 | 29 | 44 |
| tony | 2017-01-07 | 50 | 94 |
+----------------+---------------------+----------------+---------------+
5、查询顾客上次的购买时间----lag()over()偏移量分析函数的运用
0: jdbc:hive2://hadoop:11240> select *,
#如果上次的购买时间为null,将其处理为1970-01-01
. . . . . . . . . . . . . . > lag(orderdate,1,'1970-01-01') over(partition by name order by orderdate) last_date
. . . . . . . . . . . . . . > from business;
+----------------+---------------------+----------------+-------------+
| business.name | business.orderdate | business.cost | last_date |
+----------------+---------------------+----------------+-------------+
| jack | 2017-01-01 | 10 | 1970-01-01 |
| jack | 2017-01-05 | 46 | 2017-01-01 |
| jack | 2017-01-08 | 55 | 2017-01-05 |
| jack | 2017-02-03 | 23 | 2017-01-08 |
| jack | 2017-04-06 | 42 | 2017-02-03 |
| mart | 2017-04-08 | 62 | 1970-01-01 |
| mart | 2017-04-09 | 68 | 2017-04-08 |
| mart | 2017-04-11 | 75 | 2017-04-09 |
| mart | 2017-04-13 | 94 | 2017-04-11 |
| neil | 2017-05-10 | 12 | 1970-01-01 |
| neil | 2017-06-12 | 80 | 2017-05-10 |
| tony | 2017-01-02 | 15 | 1970-01-01 |
| tony | 2017-01-04 | 29 | 2017-01-02 |
| tony | 2017-01-07 | 50 | 2017-01-04 |
+----------------+---------------------+----------------+-------------+
6、查询顾客下一次的购买时间
0: jdbc:hive2://hadoop:11240> select *,
. . . . . . . . . . . . . . > lead(orderdate,1,'9999-99-99') over(partition by name order by orderdate) last_date
. . . . . . . . . . . . . . > from business;
+----------------+---------------------+----------------+-------------+
| business.name | business.orderdate | business.cost | last_date |
+----------------+---------------------+----------------+-------------+
| jack | 2017-01-01 | 10 | 2017-01-05 |
| jack | 2017-01-05 | 46 | 2017-01-08 |
| jack | 2017-01-08 | 55 | 2017-02-03 |
| jack | 2017-02-03 | 23 | 2017-04-06 |
| jack | 2017-04-06 | 42 | 9999-99-99 |
| mart | 2017-04-08 | 62 | 2017-04-09 |
| mart | 2017-04-09 | 68 | 2017-04-11 |
| mart | 2017-04-11 | 75 | 2017-04-13 |
| mart | 2017-04-13 | 94 | 9999-99-99 |
| neil | 2017-05-10 | 12 | 2017-06-12 |
| neil | 2017-06-12 | 80 | 9999-99-99 |
| tony | 2017-01-02 | 15 | 2017-01-04 |
| tony | 2017-01-04 | 29 | 2017-01-07 |
| tony | 2017-01-07 | 50 | 9999-99-99 |
+----------------+---------------------+----------------+-------------+
7、查询顾客购买明细及月购买总额
先按姓名进行分组,再按月份进行分组
hive> select *,
. . > sum(cost)over(partition by name,substr(orderdate,1,7)) as total
. . > from business;
hive> select *,
. . > ntile(5)over(order by orderdate) as sortgroup_num
. . > from business;
再查询出分组号为1的记录
hive> select * from
. . > (select *,
. . > ntile(5)over(order by orderdate) as sortgroup_num
. . > from business) t
. . > where t.sortgroup_num = 1;
需求:
1、每门学科学生成绩排名(是否并列排名、空位排名三种实现)
2、每门学科成绩排名top n的学生
score.txt
name subject score
孙悟空 语文 87
孙悟空 数学 95
孙悟空 英语 68
大海 语文 94
大海 数学 56
大海 英语 84
宋宋 语文 64
宋宋 数学 86
宋宋 英语 84
婷婷 语文 65
婷婷 数学 85
婷婷 英语 78
建表:
0: jdbc:hive2://hadoop:11240> create table score(name string,subject string,score int)
. . . . . . . . . . . . . . > row format delimited fields terminated by "\t";
0: jdbc:hive2://hadoop:11240> load data local inpath '/home/xiaokang/hivedata/score.txt' into table score;
0: jdbc:hive2://hadoop:11240> select * from score;
+-------------+----------------+--------------+
| score.name | score.subject | score.score |
+-------------+----------------+--------------+
| 孙悟空 | 语文 | 87 |
| 孙悟空 | 数学 | 95 |
| 孙悟空 | 英语 | 68 |
| 大海 | 语文 | 94 |
| 大海 | 数学 | 56 |
| 大海 | 英语 | 84 |
| 宋宋 | 语文 | 64 |
| 宋宋 | 数学 | 86 |
| 宋宋 | 英语 | 84 |
| 婷婷 | 语文 | 65 |
| 婷婷 | 数学 | 85 |
| 婷婷 | 英语 | 78 |
+-------------+----------------+--------------+
1、每门学科学生成绩排名(是否并列排名、空位排名三种实现)
0: jdbc:hive2://hadoop:11240> select *,
. . . . . . . . . . . . . . > row_number()over(partition by subject order by score desc) as row_number_method,
. . . . . . . . . . . . . . > rank()over(partition by subject order by score desc) as rank_method,
. . . . . . . . . . . . . . > dense_rank()over(partition by subject order by score desc) as dense_rank_method
. . . . . . . . . . . . . . > from score;
+-------------+----------------+--------------+--------------------+--------------+--------------------+
| score.name | score.subject | score.score | row_number_method | rank_method | dense_rank_method |
+-------------+----------------+--------------+--------------------+--------------+--------------------+
| 孙悟空 | 数学 | 95 | 1 | 1 | 1 |
| 宋宋 | 数学 | 86 | 2 | 2 | 2 |
| 婷婷 | 数学 | 85 | 3 | 3 | 3 |
| 大海 | 数学 | 56 | 4 | 4 | 4 |
| 宋宋 | 英语 | 84 | 1 | 1 | 1 |
| 大海 | 英语 | 84 | 2 | 1 | 1 |
| 婷婷 | 英语 | 78 | 3 | 3 | 2 |
| 孙悟空 | 英语 | 68 | 4 | 4 | 3 |
| 大海 | 语文 | 94 | 1 | 1 | 1 |
| 孙悟空 | 语文 | 87 | 2 | 2 | 2 |
| 婷婷 | 语文 | 65 | 3 | 3 | 3 |
| 宋宋 | 语文 | 64 | 4 | 4 | 4 |
+-------------+----------------+--------------+--------------------+--------------+--------------------+
2、每门学科成绩排名前三的学生
0: jdbc:hive2://hadoop:11240> select * from (
. . . . . . . . . . . . . . > select *,
. . . . . . . . . . . . . . > row_number() over(partition by subject order by score desc) as rmp
. . . . . . . . . . . . . . > from score
. . . . . . . . . . . . . . > ) as t
. . . . . . . . . . . . . . > where t.rmp<=3;
+---------+------------+----------+--------+
| t.name | t.subject | t.score | t.rmp |
+---------+------------+----------+--------+
| 孙悟空 | 数学 | 95 | 1 |
| 宋宋 | 数学 | 86 | 2 |
| 婷婷 | 数学 | 85 | 3 |
| 宋宋 | 英语 | 84 | 1 |
| 大海 | 英语 | 84 | 2 |
| 婷婷 | 英语 | 78 | 3 |
| 大海 | 语文 | 94 | 1 |
| 孙悟空 | 语文 | 87 | 2 |
| 婷婷 | 语文 | 65 | 3 |
+---------+------------+----------+--------+