Hive实践练习

创建表

create table t_access_times(username string,month string,salary int)
row format delimited fields terminated by ',';

查看刚刚创建的表

desc formatted t_access_times;
Hive实践练习_第1张图片
QQ截图20180421114239.png

造测试数据

A,2015-01,5
A,2015-01,15
B,2015-01,5
A,2015-01,8
B,2015-01,25
A,2015-01,5
A,2015-02,4
A,2015-02,6
B,2015-02,10
B,2015-02,5

加载测试数据

//overvrite table是覆盖数据 into table是插入数据 数据会导入到指定目录上图红色框内, 
//select * from t_access_times;查看加载的测试数据
load data local inpath '/home/hadoop/t_access_times.dat' into table t_access_times;

由于是内部表上图红色路径中的元数据在drop表的同时也会被清除

dfs -ls /user/hive/warehouse/t_access_times;
dfs -cat /user/hive/warehouse/t_access_times/t_access_times.dat;

第一步,先求个用户的月总金额

select username,month,sum(salary) as salary from t_access_times group by username,month

+-----------+----------+---------+--+
| username  |  month   | salary  |
+-----------+----------+---------+--+
| A         | 2015-01  | 33      |
| A         | 2015-02  | 10      |
| B         | 2015-01  | 30      |
| B         | 2015-02  | 15      |
+-----------+----------+---------+--+

第二步,将月总金额表 自己连接 自己连接

+-------------+----------+-----------+-------------+----------+-----------+--+
| a.username  | a.month  | a.salary  | b.username  | b.month  | b.salary  |
+-------------+----------+-----------+-------------+----------+-----------+--+
| A           | 2015-01  | 33        | A           | 2015-01  | 33        |
| A           | 2015-01  | 33        | A           | 2015-02  | 10        |
| A           | 2015-02  | 10        | A           | 2015-01  | 33        |
| A           | 2015-02  | 10        | A           | 2015-02  | 10        |
| B           | 2015-01  | 30        | B           | 2015-01  | 30        |
| B           | 2015-01  | 30        | B           | 2015-02  | 15        |
| B           | 2015-02  | 15        | B           | 2015-01  | 30        |
| B           | 2015-02  | 15        | B           | 2015-02  | 15        |
+-------------+----------+-----------+-------------+----------+-----------+--+

第三步,从上一步的结果中进行分组查询,分组的字段是a.username a.month

求月累计值: 将b.month <= a.month的所有b.salary求和即可
select A.username,A.month,max(A.salary) as salary,sum(B.salary) as accumulate
from 
(select username,month,sum(salary) as salary from t_access_times group by username,month) A 
inner join 
(select username,month,sum(salary) as salary from t_access_times group by username,month) B
on
A.username=B.username
where B.month <= A.month
group by A.username,A.month
order by A.username,A.month;

+-----------+----------+---------+--+
| username  |  month   | salary  |
+-----------+----------+---------+--+
| A         | 2015-01  | 33      |  33
| A         | 2015-02  | 10      |  43
| B         | 2015-01  | 30      |  30 
| B         | 2015-02  | 15      |  45
+-----------+----------+---------+--+

其他练习

https://blog.csdn.net/forgetthatnight/article/details/79632364#t32
目录:(四)Hive的基本操作--DDL操作
篇幅太长了,所以测试数据放在这

student.txt
95001,李勇,男,20,CS
95002,刘晨,女,19,IS
95003,王敏,女,22,MA
95004,张立,男,19,IS
95005,刘刚,男,18,MA
95006,孙庆,男,23,CS
95007,易思玲,女,19,MA
95008,李娜,女,18,CS
95009,梦圆圆,女,18,MA
95010,孔小涛,男,19,CS
95011,包小柏,男,18,MA
95012,孙花,女,20,CS
95013,冯伟,男,21,CS
95014,王小丽,女,19,CS
95015,王君,男,18,MA
95016,钱国,男,21,MA
95017,王风娟,女,18,IS
95018,王一,女,19,IS
95019,邢小丽,女,19,IS
95020,赵钱,男,21,IS
95021,周二,男,17,MA
95022,郑明,男,20,MA

你可能感兴趣的:(Hive实践练习)