hiveSQL基本语句三-----连接查询--inner join、left join、full join、union all、union

一、(inner) join

连接2个表,取出公共部分

  • 必须重命名
  • on后连接条件键值唯一
  • 连接前注意去重,提高效率
  • inner可省略
select * from user_list_1 as a 
inner join user_list_2 as b
on a.user_id=b.user_id;

三表连接

select a.user_name 
from
(select distinct user_name from data1)as a 
inner join (select distinct user_name from data2) as b
on a.user_name=b.user_name
inner join (select distinct user_name from data3) as c 
on b.user_name=c.user_name;

二、left /right join

left join:以左表为全集,返回能匹配上的匹配结果,没匹配上的显示NULL;
应用:取出表1中存在,表2中不存在的项

select *
from user_list_1 a
left join user_list_2 b
on a.user_id=b.user_id;

hiveSQL基本语句三-----连接查询--inner join、left join、full join、union all、union_第1张图片

三、full join全连接

并列将2个表的信息展示

select *
from user_list_1 a
full join user_list_2 b
on a.user_id=b.user_id;

hiveSQL基本语句三-----连接查询--inner join、left join、full join、union all、union_第2张图片

coalesce函数
coalesce(expression_1,expression_2,…,expression_n)
依次取值,遇到非空立即返回并停止对后面参数取值。若所有参数都为空,最终返回一个空值

select coalesce(a.user_name,b.user_name)
from user_list_1 a
full join user_list_2 b
on a.user_id=b.user_id;

hiveSQL基本语句三-----连接查询--inner join、left join、full join、union all、union_第3张图片

四、union all/union联合

增加行

  • 字段名称必须一致
  • 字段顺序必须一致
  • 无连接条件

== union all和union的区别:==
union会去重且排序

hiveSQL基本语句三-----连接查询--inner join、left join、full join、union all、union_第4张图片

  • union all
select a.user_id,a.user_name
from user_list_1 a
union all
select b.user_id,b.user_name
from user_list_2 b;

hiveSQL基本语句三-----连接查询--inner join、left join、full join、union all、union_第5张图片

  • union
select a.user_id,a.user_name
from user_list_1 a
union
select b.user_id,b.user_name
from user_list_2 b;

hiveSQL基本语句三-----连接查询--inner join、left join、full join、union all、union_第6张图片

练习
练习1:2019年购买后又退款的用户性别分布(user_trade/user_refund/user_info)

select c.sex,count(c.user_name)
from (select distinct user_name from user_trade where year(dt) =2019) a
join (select distinct user_name from user_refund where year(dt)=2019) b
on a.user_name=b.user_name
join user_info c
on b.user_name=c.user_name
group by c.sex;

hiveSQL基本语句三-----连接查询--inner join、left join、full join、union all、union_第7张图片
练习2:在2018年购买,但是没有在2019年购买的用户城市分布(user_trade、user_info)

select d.city,count(d.user_name)
from
(select a.user_name
from
(select distinct user_name from user_trade where year(dt) =2018) a
left join (select distinct user_name from user_trade where year(dt) =2019) b
on a.user_name=b.user_name
where b.user_name is NULL) c
join user_info d
on c.user_name=d.user_name
group by d.city;

hiveSQL基本语句三-----连接查询--inner join、left join、full join、union all、union_第8张图片
练习3:2017-2019年,有交易但是没有退款的手机品牌分布(trade_2017/trade_2018/trade_2019/user_refund/user_info)

select d.phone_brand,count(d.user_name)
from 
(select a.user_name
from
(select user_name from trade_2017
union
select user_name from trade_2018
union
select user_name from trade_2019) a
left join
(select distinct user_name from user_refund where dt > '0') b
on a.user_name=b.user_name
where b.user_name is NULL) c
join
(select user_name,get_json_object(extra1,'$.phonebrand') phone_brand from user_info) d
on c.user_name=d.user_name
group by d.phone_brand;

hiveSQL基本语句三-----连接查询--inner join、left join、full join、union all、union_第9张图片
最开始的思路,user_info不用提前处理,由于’ . p h o n e b r a n d ′ 中 .phonebrand'中 .phonebrand后没加点失败,加了后成功,注意注意

select get_json_object(d.extra1,'$.phonebrand'),count(d.user_name)
from 
(select a.user_name
from
(select user_name from trade_2017
union
select user_name from trade_2018
union
select user_name from trade_2019) a
left join
(select distinct user_name from user_refund where year(dt) >= 2017 and year(dt)<= 2019) b
on a.user_name=b.user_name
where b.user_name is NULL) c
join user_info d
on c.user_name=d.user_name
group by get_json_object(d.extra1,'$.phonebrand');

你可能感兴趣的:(hive,语法)