点赞收藏呦!
默认mysql老版本没有支持,在最新的8.0版本中支持, Oracle和Hive中都支持窗口函数,下列题目皆在Hive中实现
第一题:
了解哪些窗口函数,都是什么功能?找一个在某个业务中的应用? 手写窗口函数及功能意义,同时随便写一个带窗口函数的sql,并说明其sql的含义。
窗口函数:通常格式为 可用函数+over()函数
-- 查询所有明细
select * from t_order;
# 查询总量
select count(*) from t_order;
等价于:
select *, count(*) over() from t_order;
数据: video表
uid channel min
1 1 23
2 1 12
3 1 12
4 1 32
5 1 342
6 2 13
7 2 34
8 2 13
9 2 134
create table video(
uid int,
channel string,
min int
)
row format delimited
fields terminated by ' '
;
load data local inpath './data/video' into table video;
答案:
select channel,count(*) count,sum(min) total from video group by channel;
数据:
userid,month,visits
A,2015-01,5
A,2015-01,15
B,2015-01,5
A,2015-01,8
B,2015-01,25
A,2015-01,5
A,2015-02,4
A,2015-02,6
B,2015-02,10
B,2015-02,5
A,2015-03,16
A,2015-03,22
B,2015-03,23
B,2015-03,10
B,2015-03,1
drop table visits;
create table visits(
userid string,
month string,
visits int
)
row format delimited
fields terminated by ','
;
load data local inpath './hivedata/visits' overwrite into table visits;
完成需求:每个用户截止到每月为止的最大单月访问次数和累计到该月的总访问次数,结果数据格式如下:
+---------+----------+---------+-------------+---------------+--+
| userid | month | visits | max_visits | total_visits |
+---------+----------+---------+-------------+---------------+--+
| A | 2015-01 | 33 | 33 | 33 |
| A | 2015-02 | 10 | 33 | 43 |
| A | 2015-03 | 38 | 38 | 81 |
| B | 2015-01 | 30 | 30 | 30 |
| B | 2015-02 | 15 | 30 | 45 |
| B | 2015-03 | 34 | 34 | 79 |
+---------+----------+---------+-------------+---------------+--+
select userid,month,
visits,
max(visits) over(distribute by userid sort by month) max_visits,
sum(visits) over(distribute by userid sort by month) total_visits
from (
select userid,month,
sum(visits) visits
from visits
group by userid,month
) t;
数据: t1表
Uid dt login_status(1登录成功,0异常)
1 2019-07-11 1
1 2019-07-12 1
1 2019-07-13 1
1 2019-07-14 1
1 2019-07-15 1
1 2019-07-16 1
1 2019-07-17 1
1 2019-07-18 1
2 2019-07-11 1
2 2019-07-12 1
2 2019-07-13 0
2 2019-07-14 1
2 2019-07-15 1
2 2019-07-16 0
2 2019-07-17 1
2 2019-07-18 0
2 2019-07-19 1
2 2019-07-20 0
2 2019-07-21 1
2 2019-07-22 0
2 2019-07-23 1
2 2019-07-24 0
3 2019-07-11 1
3 2019-07-12 1
3 2019-07-13 1
3 2019-07-14 1
3 2019-07-15 1
3 2019-07-16 1
3 2019-07-17 1
3 2019-07-18 1
drop table login;
create table login(
Uid int,
dt string,
login_status int
)
row format delimited
fields terminated by ' '
;
load data local inpath './hivedata/login' into table login;
select count(*)
from
(
select distinct uid
from(
select uid,dt,lag(dt,6) over(partition by uid order by dt) pre_dt,
sum(login_status) over(partition by uid order by dt rows between 6 preceding and current row) total
from login
) t
where date_sub(dt,6)=pre_dt and t.total=7) t1;
row_number从1开始,按照顺序,生成分组内记录的序列,row_number()的值不会存在重复,当排序的值相同时,按照表中记录的顺序进行排列
RANK() 生成数据项在分组中的排名,排名相等会在名次中留下空位
DENSE_RANK() 生成数据项在分组中的排名,排名相等会在名次中不会留下空位
数据: stu表
Stu_no class score
1 1901 90
2 1901 90
3 1901 83
4 1901 60
5 1902 66
6 1902 23
7 1902 99
8 1902 67
9 1902 87
drop table stu;
create table stu(
Stu_no int,
class string,
score int
)
row format delimited
fields terminated by '\t'
;
load data local inpath './hivedata/stu' into table stu;
编写sql实现,结果如下:
+--------+---------+--------+-----+----------+--+
| class | stu_no | score | rn | rn_diff |
+--------+---------+--------+-----+----------+--+
| 1901 | 2 | 90 | 1 | 90 |
| 1901 | 1 | 90 | 2 | 0 |
| 1901 | 3 | 83 | 3 | -7 |
| 1902 | 7 | 99 | 1 | 99 |
| 1902 | 9 | 87 | 2 | -12 |
| 1902 | 8 | 67 | 3 | -20 |
+--------+---------+--------+-----+----------+--+
select class,stu_no,score,rn,rn_diff
from(
select class,stu_no,score,
row_number() over(partition by class order by score desc) rn,
score-nvl(lag(score,1) over(partition by class order by score desc),0) rn_diff
from stu
) t
where t.rn<4;
collect_set,collect_list,explode.....
id sid subject int
1,001,语文,90
2,001,数学,92
3,001,英语,80
4,002,语文,88
5,002,数学,90
6,002,英语,75.5
7,003,语文,70
8,003,数学,85
9,003,英语,90
10,003,政治,82
编写sql实现,得到结果如下:
+---------+--------+--------+--------+--------+-----------+--+
| sid | 语文 | u2.数学 | u2.英语 | u2.政治 | u2.total |
+---------+--------+--------+--------+--------+-----------+--+
| 001 | 90.0 | 92.0 | 80.0 | 0.0 | 262.0 |
| 002 | 88.0 | 90.0 | 75.5 | 0.0 | 253.5 |
| 003 | 70.0 | 85.0 | 90.0 | 82.0 | 327.0 |
| total | 248.0 | 267.0 | 245.5 | 82.0 | 842.5 |
+---------+--------+--------+--------+--------+-----------+--+
drop table score;
create table score(
id int,
sid string,
subject string,
score double
)
row format delimited
fields terminated by ','
;
load data local inpath './hivedata/score' into table score;
select
sid,
sum(if(subject="语文",score,0)) as `语文`,
sum(if(subject="数学",score,0)) as `数学`,
sum(case when subject="英语" then score else 0 end) as `英语`,
sum(case when subject="政治" then score else 0 end) as `政治`,
sum(score) total
from score
group by sid
union
select "total",sum(`语文`),sum(`数学`),sum(`英语`),sum(`政治`),sum(total) from
(
select
sid,
sum(if(subject="语文",score,0)) as `语文`,
sum(if(subject="数学",score,0)) as `数学`,
sum(case when subject="英语" then score else 0 end) as `英语`,
sum(case when subject="政治" then score else 0 end) as `政治`,
sum(score) total
from score
group by sid
)t;
数据: t1表
uid tags
1 1,2,3
2 2,3
3 1,2
编写sql实现如下结果:
uid tag
1 1
1 2
1 3
2 2
2 3
3 1
3 2
create table t1(
uid int,
tags string
)
row format delimited
fields terminated by '\t'
;
load data local inpath './hivedata/t1' into table t1;
select uid,tag
from t1 lateral view explode(split(tags,",")) t as tag;
数据: T2表:
Tags
1,2,3
1,2
2,3
T3表:
id lab
1 A
2 B
3 C
根据T2和T3表的数据,编写sql实现如下结果:
+--------+--------+--+
| tags | labs |
+--------+--------+--+
| 1,2 | A,B |
| 1,2,3 | A,B,C |
| 2,3 | B,C |
+--------+--------+--+
create table t2(
tags string
);
load data local inpath './hivedata/t2' overwrite into table t2;
create table t3(
id int,
lab string
)
row format delimited
fields terminated by ' '
;
load data local inpath './hivedata/t3' overwrite into table t3;
select tags,
concat_ws(",",collect_set(lab)) labs
from
(select tags,lab
from
(select tags,tag
from t2 lateral view explode(split(tags,",")) A as tag) B
join t3 on B.tag = t3.id) C
group by tags;
数据: t4表:
id tag flag
a b 2
a b 1
a b 3
c d 6
c d 8
c d 8
编写sql实现如下结果:
id tag flag
a b 1|2|3
c d 6|8
create table t4(
id string,
tag string,
flag int
)
row format delimited
fields terminated by ' '
;
load data local inpath './hivedata/t4' overwrite into table t4;
select id,tag,concat_ws("|",collect_set(cast(flag as string))) from t4 group by id,tag;
数据: t5表
uid name tags
1 goudan chihuo,huaci
2 mazi sleep
3 laotie paly
编写sql实现如下结果:
uid name tag
1 goudan chihuo
1 goudan huaci
2 mazi sleep
3 laotie paly
create table t5(
uid string,
name string,
tags string
)
row format delimited
fields terminated by '\t' ;
load data local inpath './hivedata/t5' overwrite into table t5;
select
uid,
name,
tag
from t5
lateral view explode(split(tags,",")) t1 as tag ;
数据: content表:
uid contents
1 i|love|china
2 china|is|good|i|i|like
统计结果如下,如果出现次数一样,则按照content名称排序:
+----------+------+--+
| content | num |
+----------+------+--+
| i | 3 |
| china | 2 |
| good | 1 |
| is | 1 |
| like | 1 |
| love | 1 |
+----------+------+--+
drop table content;
create table content(
uid int,
contents string
)
row format delimited
fields terminated by '\t'
;
load data local inpath './hivedata/content' overwrite into table content;
select content,count(1) num
from content lateral view explode(split(contents,"\\|")) tmp as content
group by content
order by num desc,content;
数据: course1表
id course
1,a
1,b
1,c
1,e
2,a
2,c
2,d
2,f
3,a
3,b
3,c
3,e
根据编写sql,得到结果如下(表中的1表示选修,表中的0表示未选修):
+-----+----+----+----+----+----+----+--+
| id | a | b | c | d | e | f |
+-----+----+----+----+----+----+----+--+
| 1 | 1 | 1 | 1 | 0 | 1 | 0 |
| 2 | 1 | 0 | 1 | 1 | 0 | 1 |
| 3 | 1 | 1 | 1 | 0 | 1 | 0 |
+-----+----+----+----+----+----+----+--+
create table course(
id int,
course string
)
row format delimited
fields terminated by ','
;
load data local inpath './hivedata/course' overwrite into table course;
select
id,
sum(case when c.course="a" then 1 else 0 end) as `a`,
sum(case when c.course="b" then 1 else 0 end) as `b`,
sum(case when c.course="c" then 1 else 0 end) as `c`,
sum(case when c.course="d" then 1 else 0 end) as `d`,
sum(case when c.course="e" then 1 else 0 end) as `e`,
sum(case when c.course="f" then 1 else 0 end) as `f`
from course c
group by id ;
获取当前时间戳:
select unix_timestamp();
获取"2019-07-31 11:57:25"对应的时间戳:
select unix_timestamp("2019-07-31 11:57:25");
获取"2019-07-31 11:57"对应的时间戳:
select unix_timestamp("2019-07-31 11:57","yyyy-MM-dd HH:mm");
获取时间戳:1564545445所对应的日期和时分秒:
select from_unixtime(1564545445);
获取时间戳:1564545446所对应的日期和小时(yyyy/MM/dd HH):
select from_unixtime(1564545445,"yyyy/mm/dd HH");
数据: dt表
20190730
20190731
编写sql实现如下的结果:
2019-07-30
2019-07-31
create table dt(
dt string
);
load data local inpath './hivedata/dt' overwrite into table dt;
select from_unixtime(unix_timestamp(dt,"yyyyMMdd"),"yyyy-MM-dd") from dt;
sid month money
a,01,150
a,01,200
b,01,1000
b,01,800
c,01,250
c,01,220
b,01,6000
a,02,2000
a,02,3000
b,02,1000
b,02,1500
c,02,350
c,02,280
a,03,350
a,03,250
create table store(
sid string,
month string,
money int
)
row format delimited
fields terminated by ','
;
load data local inpath './hivedata/store' overwrite into table store;
编写Hive的HQL语句求出每个店铺的当月销售额和累计到当月的总销售额?
select sid,month,sales,
sum(sales) over(partition by sid order by month) total
from (
select sid,month,sum(money) sales from store group by sid,month
) t;
Table A 是一个用户登陆时间记录表,当月每次登陆一次会记录一条记录。A表如下:
log_time uid
2018-10-01 12:34:11 123
2018-10-02 13:21:08 123
2018-10-02 14:21:08 123
2018-10-02 14:08:09 456
2018-10-04 05:10:22 123
2018-10-04 21:38:38 456
2018-10-04 22:38:38 456
2018-10-05 09:57:32 123
2018-10-06 13:22:56 123
2018-11-01 12:34:11 123
2018-11-02 13:21:08 123
2018-11-02 14:21:08 123
2018-11-02 14:08:09 456
2018-11-04 05:10:22 123
2018-11-04 21:38:38 456
2018-11-05 09:57:32 123
2018-11-06 13:22:56 123
需计算出每个用户本月最大连续登陆天数。如表A样例数据中,用户123最大连续登陆天数为3,而用户456最大连续登陆天数为1
drop table login_time;
create table login_time(
log_time timestamp,
uid string
)
row format delimited
fields terminated by '\t';
load data local inpath './hivedata/login_time' overwrite into table login_time;
-- 注意:可能需要对原始数据做清洗,保证每个用户每天只有一条登录信息
select uid,max(cnt)
from
(select uid,date_format(dt,"yyyy-MM"),count(1) cnt
from
(select uid,dt,date_sub(dt,row_number() over(partition by uid order by dt)) grp_dt
from
(
select distinct uid,date_format(log_time,"yyyy-MM-dd") as dt
from login_time
)a
)b group by uid,grp_dt,date_format(dt,"yyyy-MM")
)c
group by uid;
需求:查询顾客下次购买的时间
select name,orderdate,cost,
lead(orderdate,1) over(partition by name order by orderdate ) as time1
from t_order;
数据 diff_t1表:
id name
1 zs
2 ls
diff_t2表:
id name
1 zs
3 ww
结果如下:
id name
2 ls
3 ww
create table diff_t1(
id string,
name string
)
row format delimited
fields terminated by ' '
;
load data local inpath './hivedata/diff_t1.txt' overwrite into table diff_t1;
create table diff_t2(
id string,
name string
)
row format delimited
fields terminated by ' '
;
load data local inpath './hivedata/diff_t2.txt' overwrite into table diff_t2;
select t1.id id, t1.name name
from diff_t1 t1
left join diff_t2 t2 on t1.id=t2.id
where t2.id is null
union
select t2.id id, t2.name name
from diff_t1 t1
right join diff_t2 t2 on t1.id=t2.id
where t1.id is null ;
现有某网站购买记录字段如下
orderid,userid,productid,price,timestamp,date
121,张三,3,100,1535945356,2018-08-07
122,张三,3,200,1535945356,2018-08-08
123,李四,3,200,1535945356,2018-08-08
124,王五,1,200,1535945356,2018-08-08
125,张三,3,200,1535945356,2018-08-09
126,张三,2,200,1535945356,2018-08-09
127,李四,3,200,1535945356,2018-08-09
128,李四,3,200,1535945356,2018-08-10
129,李四,3,200,1535945356,2018-08-11
用sql统计今日及昨日都购买过商品productid为3的用户及其昨日消费。
drop table product;
create table product(
orderid string,
userid string,
productid int,
price int,
tamp int,
dt date
)
row format delimited
fields terminated by ',';
load data local inpath './hivedata/product.txt' overwrite into table product;
select *
from(
select userid,dt,
lag(dt,1) over(partition by userid order by dt) yestoday,
case when
datediff(dt,lag(dt,1) over(partition by userid order by dt))=1
then lag(total_price) over(partition by userid order by dt)
else null end yestoday_price
from
(
select userid,dt,productid,sum(price) over(partition by userid,dt order by dt) total_price
from product order by userid,dt
)A
where productid = 3
)B where yestoday_price is not null;
表user_action_log用户行为故据
uid time action
1 Time1 Read
3 Time2 Comment
1 Time3 Share
2 Time4 Like
1 Time5 Write
2 Time6 like
3 Time7 Write
2 Time8 Read
分析用户行为习惯找到毎一个用户在表中的第一次行为
drop table user_action_log;
create table user_action_log(
uid int,
time string,
action string
)
row format delimited
fields terminated by '\t';
load data local inpath './hivedata/user_action_log.txt' overwrite into table user_action_log;
使用代码实现
select uid,time,action
from (
select uid,time,action,row_number() over(partition by uid order by time) rn
from user_action_log
)A
where rn = 1;
数据: user_login表
uid,dt
1,2019-08-01
1,2019-08-02
1,2019-08-03
2,2019-08-01
2,2019-08-02
3,2019-08-01
3,2019-08-03
4,2019-07-28
4,2019-07-29
4,2019-08-01
4,2019-08-02
4,2019-08-03
结果如下:
uid cnt_days
1 3
2 2
3 1
4 3
create table user_login(
uid int,
dt date
)
row format delimited
fields terminated by ',';
load data local inpath './hivedata/user_login.txt' overwrite into table user_login;
select uid,max(cnt)
from
(select uid,dt_sub,count(1) cnt
from
(select uid,dt,
date_sub(dt,row_number() over(partition by uid order by dt)) dt_sub
from user_login
)A
group by uid, dt_sub
)B group by uid;
数据:
t1表
uid dt url
1 2019-08-06 http://www.baidu.com
2 2019-08-06 http://www.baidu.com
3 2019-08-06 http://www.baidu.com
3 2019-08-06 http://www.soho.com
3 2019-08-06 http://www.meituan.com
3 2019-08-06
结果如下:
dt uv pv
2019-08-6 3 5
create table user_net_log(
uid int,
dt date,
url string
)
row format delimited
fields terminated by ' ';
load data local inpath './hivedata/user_net_log.txt' overwrite into table user_net_log;
select dt,count(distinct uid),count(url)
from user_net_log
group by dt;
自行扩展吧
pk_moba表
id names
1 亚索,挖据机,艾瑞莉亚,洛,卡莎
2 亚索,盖伦,奥巴马,牛头,皇子
3 亚索,盖伦,艾瑞莉亚,宝石,琴女
4 亚索,盖伦,赵信,老鼠,锤石
请用HiveSQL计算出出场次数最多的top3英雄及其Pick率(=出现场数/总场数)
create table pk_moba(
id int,
names array
)
row format delimited
fields terminated by '\t'
collection items terminated by ',';
load data local inpath './hivedata/pk_moba.txt' overwrite into table pk_moba;
select name,cnt,rn,concat(round(cnt/4*100,0),"%") lv
from
(
select name,cnt,dense_rank() over(sort by cnt desc) rn
from (
select name,count(name) cnt
from pk_moba lateral view explode(names) tmp as name
group by name
)A
)B where rn<4;
区域(district) 区域中有两个字段分别是区域Id(disid)和区域名称(disname)
城市(city) 城市有两个字段分别是城市ID(cityid)和区域ID(disid)
订单(order) 订单有四个字段分别是订单ID(orderid)、用户ID(userid)、城市ID(cityid)和消费金额(amount)。
district表:
disid disname
1 华中
2 西南
create table district(
disid int,
disname string
)
row format delimited
fields terminated by ' ';
load data local inpath './hivedata/district.txt' overwrite into table district;
city表:
cityid disid
1 1
2 1
3 2
4 2
5 2
create table city(
cityid int,
disid int
)
row format delimited
fields terminated by ' ';
load data local inpath './hivedata/city.txt' overwrite into table city;
order表:
oid userid cityid amount
1 1 1 1223.9
2 1 1 9999.9
3 2 2 2322
4 2 2 8909
5 2 3 6789
6 2 3 798
7 3 4 56786
8 4 5 78890
create table order_t(
oid int,
userid int,
cityid int,
amount float
)
row format delimited
fields terminated by ' ';
load data local inpath './hivedata/order.txt' overwrite into table order_t;
高消费者是消费金额大于1W的用户,使用hive hql生成如下报表:
区域名 高消费者人数 消费总额
select disname,count(1),sum(amount)
from(
select disname,userid,sum(amount) amount from district A join city B on A.disid = B.disid
join order_t C on B.cityid = C.cityid
group by disname,userid
)D
where amount>10000
group by disname;
(1)、每天整体的访问UV、PV?
select log_time,count(distinct user_id) uv,count(1) pv
from access_log
group by log_time
(2)、每天每个类型的访问UV、PV?
select log_time,user_type,count(distinct user_id) uv,count(1) pv
from access_log
group by log_time;
(3)、每天每个类型中最早访问时间和最晚访问时间?
select log_time,user_type,min(log_time),max(log_time)
from access_log
group by log_time;
(4)、每天每个类型中访问次数最高的10个用户?
select dt,user_type,cnt,rn
from(
select dt,user_type,cnt,row_number() over(partition by dt,user_type order by cnt desc) rn
from
(select substr(log_time,1,7) dt,user_type,count(1) cnt
from access_log
group by substr(log_time,1,7),user_type
)A
)B where rn <11;
它把有序的数据集合 平均分配 到 指定的数量(num)个桶中, 将桶号分配给每一行。如果不能平均分配,则优先分配较小编号的桶,并且各个桶中能放的行数最多相差1。
例子:
select name,orderdate,cost,
ntile(3) over(partition by name) # 按照name进行分组,在分组内将数据切成3份
from t_order;
mart 2018-04-13 94 1
mart 2018-04-11 75 1
mart 2018-04-09 68 2
mart 2018-04-08 62 3
neil 2018-06-12 80 1
neil 2018-05-10 12 2
saml 2018-01-01 10 1
saml 2018-01-08 55 1
saml 2018-04-06 42 2
saml 2018-01-05 46 2
saml 2018-02-03 23 3
tony 2018-01-07 50 1
tony 2018-01-02 15 2
tony 2018-01-04 29 3
表login_a(登录表):
ds user_id
2019-08-06 1
2019-08-06 2
2019-08-06 3
2019-08-06 4
create table login_a(
ds date,
user_id int
)
row format delimited
fields terminated by ' ';
load data local inpath './hivedata/login_a.txt' overwrite into table login_a;
表read_b(阅读表):
ds user_id read_num
2019-08-06 1 2
2019-08-06 2 3
2019-08-06 3 6
create table read_b(
ds date,
user_id int,
read_num int
)
row format delimited
fields terminated by ' ';
load data local inpath './hivedata/read_b.txt' overwrite into table read_b;
表cost_c(付费表):
ds user_id price
2019-08-06 1 55.6
2019-08-06 2 55.8
create table cost_c(
ds date,
user_id int,
price float
)
row format delimited
fields terminated by ' ';
load data local inpath './hivedata/cost_c.txt' overwrite into table cost_c;
基于上述三张表,请使用hive的hql语句实现如下需求:
(1)、用户登录并且当天有个阅读的用户数,已经阅读书籍数量
select A.ds,count(distinct A.user_id),sum(B.read_num)
from login_a A join read_b B on A.user_id = B.user_id and A.ds = B.ds
group by A.ds;
(2)、用户登录并且阅读,但是没有付费的用户数
select A.ds,count(1)
from login_a A join read_b B on A.user_id = B.user_id and A.ds = B.ds
left join cost_c C on B.user_id = C.user_id and B.ds = C.ds
where C.price is null
group by A.ds;
(3)、用户登录并且付费,付费用户数量和金额总数
select A.ds,count(1),sum(price)
from login_a A join read_b B on A.user_id = B.user_id and A.ds = B.ds
left join cost_c C on B.user_id = C.user_id and B.ds = C.ds
group by A.ds;
left semi join,我们称之为半开连接。它是left join的一种优化形式,只能查询左表的信息,主要用于解决hive中左表的数据是否存在的问题
create table order(
order_id long,
user_id long comment '用户id',
amount double comment '订单金额',
channel string comment '渠道',
time string comment '订单时间,yyyy-MM-dd HH:mi:ss'
)
partition by (dt string comment '天,yyyy-MM-dd');
请使用hive hql查询出2019-08-06号 每个渠道的下单用户数、订单总金额。
hql语句实现,结果表头如下: channel user_num order_amount
select channel,count(user_id) user_num,sum(amount) order_acount
from order
where to_date(time) = '2019-08-06'
group by channel;
设计数据库表,用来存放学生基本信息,课程信息,学生的课程及成绩,并给出查询语句,查询平均成绩大于85的所有学生。
create table stu_1(
id string,
name string,
age int,
addr string
)
row format delimited
fields terminated by ','
;
create table course_1(
cid string,
cname string
)
row format delimited
fields terminated by ',' ;
create table course_sc(
id string,
cid string,
score int
)
row format delimited
fields terminated by ','
;
load data local inpath '/hivedata/course_1.txt' overwrite into table course_1;
load data local inpath '/hivedata/stu_1.txt' overwrite into table stu_1;
load data local inpath '/hivedata/course_sc.txt' overwrite into table course_sc;
select cs.id, avg(score) avgscore
from course_sc cs
group by cs.id
having avg(score)>85 ;
有用户表user(uid,name) 以及黑名单BanUser(uid)
create table u(
id string,
name string
)
row format delimited
fields terminated by ','
;
create table banuser(
id string
);
load data local inpath '/hivedata/banuser.txt' overwrite into table banuser;
load data local inpath '/hivedata/u.txt' overwrite into table u;
select u.*
from u left join banuser on u.id = banuser.id
where banuser.id is null;
select u.*
from u
where not exists (select 1 from banuser where banuser.id = u.id);
course_score表数据:
1,zhangsan,数学,80,2015
2,lisi,语文,90,2016
3,lisi,数学,70,2016
4,wangwu,化学,80,2017
5,zhangsan,语文,85,2015
6,zhangsan,化学,90,2015
create table course_score(
id string,
name string,
course string,
score int,
year string
)
row format delimited
fields terminated by ','
;
load data local inpath './hivedata/course_score.txt' overwrite into table course_score;
1、查出每个学期每门课程最高分记录(包含全部5个字段)
select id,name,course,score,year,max(score) over(partition by year,course)
from course_score;
select year,course,max(score)
from course_score
group by year,course order by year;
2、查出单个学期中语文课在90分以上的学生的数学成绩记录(包含全部字段)
select A.id,A.name,A.course,A.score,A.year
from course_score A join course_score B on A.name = B.name
where B.course='语文' and B.score>=90 and A.course='数学';
select s.id, s.name, s.course, s.score, s.year
from course_score s
join
(select
course,
year,
max(score) score
from course_score
group by course,year) t1
on s.course=t1.course
and
s.year=t1.year
and
s.score=t1.score
;
select s.id, s.name, s.course, s.score, s.year
from course_score s
join
(select
id, name, course, score, year from course_score
where score >=90 and course="语文") t1
on s.name=t1.name
where s.course="数学"
;
t1表:
name course score
aa English 75
bb math 85
aa math 90
create table t1_1(
name string,
course string,
score int
)
row format delimited
fields terminated by ' ';
load data local inpath './hivedata/t1_1.txt' overwrite into table t1_1;
使用hql输出以下结果
name English math
aa 75 90
bb 0 85
select name,
max(if(course="English",score,0)) English,
max(if(course="math",score,0)) math
from t1_1
group by name;
t1表:
用户 商品
A P1
B P1
A P2
B P3
请你使用hql变成如下结果:
用户 P1 P2 P3
A 1 1 0
B 1 0 1
select username,
max(if(product="p1",1,0)) p1,
max(if(product="p2",1,0)) p2,
max(if(product="p3",1,0)) p3
from t1
group by username;
dpt部门
dpt_id dpt_name
1 产品
2 技术
User用户表
User_id Dpt_id
1 1
2 1
3 2
4 2
5 3
result表
user_id dpt_id dpt_name
1 1 产品
2 1 产品
3 2 技术
4 2 技术
5 3 其他部门
原数据表时user和dpt表,请使用hql实现result表中的结果.
select user.user_id,user.dpt_id,dpt.dpt_id
from user
join dpt
on user.dpt_id = dpt.dpt_id;
t1_order表:
order_id order_type order_time
111 N 10:00
111 A 10:05
111 B 10:10
create table t1_order(
order_id string,
order_type string,
order_time string
)
row format delimited
fields terminated by ' ';
load data local inpath './hivedata/t1_order.txt' overwrite into table t1_order;
是用hql获取结果如下:
order_id order_type_1 order_type_2 order_time_1 order_time_2
111 N A 10:00 10:05
111 A B 10:05 10:10
select order_id,order_type_1,order_type_2,order_time_1,order_time_2
from(
select order_id,order_type order_type_1,
lead(order_type,1) over(sort by order_time) order_type_2,
order_time order_time_1,
lead(order_time,1) over(sort by order_time) order_time_2
from t1_order
)A
where order_type_2 is not null;
t1_hobby表
name sex hobby
janson 男 打乒乓球、游泳、看电影
tom 男 打乒乓球、看电影
drop table t1_hobby;
create table t1_hobby(
name string,
sex string,
hobby string
)
row format delimited
fields terminated by ' ';
load data local inpath './hivedata/t1_hobby.txt' overwrite into table t1_hobby;
hobby最多3个值,使用hql实现结果如下:
name sex hobby1 hobby2 hobby3
janson 男 打乒乓球 游泳 看电影
tom 男 打乒乓球 看电影
select name,sex,split(hobby,"、")[0] hobby1,
split(hobby,"、")[1] hobby2,
nvl(split(hobby,"、")[2],"") hobby3
from t1_hobby;
收集编写不易!来个收藏点赞呗