Hive面试sql

数据

username month salary

A,2015-01,5
A,2015-01,15
B,2015-01,5
A,2015-01,8
B,2015-01,25
A,2015-01,5
A,2015-02,4
A,2015-02,6
B,2015-02,10
B,2015-02,5

求结果

A    2015-01    33    33
A    2015-02    10    43
B    2015-01    30    30
B    2015-02    15    45

即级联查询,比如

a一月份 salary为33 二月份为10 最后一个字段为salary(1+2) 43

 

首先遇到sql复杂问题一步一步拆分来解决

思路通过一个表自身内链接,来进行错位级联


1查询出每个用户每个月每个group by(username,month)的总salary

select username,month,sum(salary) as salary from time group by username,month;

+-----------+----------+---------+--+
| username  |  month   | salary  |
+-----------+----------+---------+--+
| A         | 2015-01  | 33      |
| A         | 2015-02  | 10      |
| B         | 2015-01  | 30      |
| B         | 2015-02  | 15      |
+-----------+----------+---------+--+

2

将表自身链接,左右内都可,自身链接

条件是username相等则,A表的1月会和B 表的1和2月相链接

同理

select A.*,B.*

from

(select username,month,sum(salary) as salary from time group by username,month) A

inner join

(select username,month,sum(salary) as salary from time group by username,month) B

on

A.username=B.username;


+-------------+----------+-----------+-------------+----------+-----------+--+
| a.username  | a.month  | a.salary  | b.username  | b.month  | b.salary  |
+-------------+----------+-----------+-------------+----------+-----------+--+
| A           | 2015-01  | 33        | A           | 2015-01  | 33        |
| A           | 2015-01  | 33        | A           | 2015-02  | 10        |
| A           | 2015-02  | 10        | A           | 2015-01  | 33        |
| A           | 2015-02  | 10        | A           | 2015-02  | 10        |
| B           | 2015-01  | 30        | B           | 2015-01  | 30        |
| B           | 2015-01  | 30        | B           | 2015-02  | 15        |
| B           | 2015-02  | 15        | B           | 2015-01  | 30        |
| B           | 2015-02  | 15        | B           | 2015-02  | 15        |
+-------------+----------+-----------+-------------+----------+-----------+--+

3 将2作为一个表,查找,先按照A表用户名然后月份分组则变成

要取salary则从下面中取,两个相同的所以max就行从一组中

A           | 2015-01  | 33        | A           | 2015-01  | 33        |
| A           | 2015-01  | 33        | A           | 2015-02  | 10        |

where限定,取小于当前月份的


select A.username ,A.month ,max(salary) as salary ,sum(B.salary) as result

from


(select username,month,sum(salary) as salary from time group by username,month) A

inner join

(select username,month,sum(salary) as salary from time group by username,month) B

on

A.username=B.username


where B.month<=A.month

group by A.username,A.month;

你可能感兴趣的:(hive)