创建表
create table t_access_times(username string,month string,salary int)
row format delimited fields terminated by ',';
查看刚刚创建的表
desc formatted t_access_times;
造测试数据
A,2015-01,5
A,2015-01,15
B,2015-01,5
A,2015-01,8
B,2015-01,25
A,2015-01,5
A,2015-02,4
A,2015-02,6
B,2015-02,10
B,2015-02,5
加载测试数据
//overvrite table是覆盖数据 into table是插入数据 数据会导入到指定目录上图红色框内,
//select * from t_access_times;查看加载的测试数据
load data local inpath '/home/hadoop/t_access_times.dat' into table t_access_times;
由于是内部表上图红色路径中的元数据在drop表的同时也会被清除
dfs -ls /user/hive/warehouse/t_access_times;
dfs -cat /user/hive/warehouse/t_access_times/t_access_times.dat;
第一步,先求个用户的月总金额
select username,month,sum(salary) as salary from t_access_times group by username,month
+-----------+----------+---------+--+
| username | month | salary |
+-----------+----------+---------+--+
| A | 2015-01 | 33 |
| A | 2015-02 | 10 |
| B | 2015-01 | 30 |
| B | 2015-02 | 15 |
+-----------+----------+---------+--+
第二步,将月总金额表 自己连接 自己连接
+-------------+----------+-----------+-------------+----------+-----------+--+
| a.username | a.month | a.salary | b.username | b.month | b.salary |
+-------------+----------+-----------+-------------+----------+-----------+--+
| A | 2015-01 | 33 | A | 2015-01 | 33 |
| A | 2015-01 | 33 | A | 2015-02 | 10 |
| A | 2015-02 | 10 | A | 2015-01 | 33 |
| A | 2015-02 | 10 | A | 2015-02 | 10 |
| B | 2015-01 | 30 | B | 2015-01 | 30 |
| B | 2015-01 | 30 | B | 2015-02 | 15 |
| B | 2015-02 | 15 | B | 2015-01 | 30 |
| B | 2015-02 | 15 | B | 2015-02 | 15 |
+-------------+----------+-----------+-------------+----------+-----------+--+
第三步,从上一步的结果中进行分组查询,分组的字段是a.username a.month
求月累计值: 将b.month <= a.month的所有b.salary求和即可
select A.username,A.month,max(A.salary) as salary,sum(B.salary) as accumulate
from
(select username,month,sum(salary) as salary from t_access_times group by username,month) A
inner join
(select username,month,sum(salary) as salary from t_access_times group by username,month) B
on
A.username=B.username
where B.month <= A.month
group by A.username,A.month
order by A.username,A.month;
+-----------+----------+---------+--+
| username | month | salary |
+-----------+----------+---------+--+
| A | 2015-01 | 33 | 33
| A | 2015-02 | 10 | 43
| B | 2015-01 | 30 | 30
| B | 2015-02 | 15 | 45
+-----------+----------+---------+--+
其他练习
https://blog.csdn.net/forgetthatnight/article/details/79632364#t32
目录:(四)Hive的基本操作--DDL操作
篇幅太长了,所以测试数据放在这
student.txt
95001,李勇,男,20,CS
95002,刘晨,女,19,IS
95003,王敏,女,22,MA
95004,张立,男,19,IS
95005,刘刚,男,18,MA
95006,孙庆,男,23,CS
95007,易思玲,女,19,MA
95008,李娜,女,18,CS
95009,梦圆圆,女,18,MA
95010,孔小涛,男,19,CS
95011,包小柏,男,18,MA
95012,孙花,女,20,CS
95013,冯伟,男,21,CS
95014,王小丽,女,19,CS
95015,王君,男,18,MA
95016,钱国,男,21,MA
95017,王风娟,女,18,IS
95018,王一,女,19,IS
95019,邢小丽,女,19,IS
95020,赵钱,男,21,IS
95021,周二,男,17,MA
95022,郑明,男,20,MA