社交软件上如何判断自己关注的博主是否也关注了自己呢?现在有一张表为fans(粉丝表),表中有两个字段from_user,to_user,如果两者一致代表from_user关注了to_user。
create table if not exists table15
(
from_user string comment '关注用户',
to_user string comment '被关注用户',
`timestamp` string comment '关注时间'
)
comment '关注表';
INSERT overwrite table fans
VALUES ("A","B","2022-11-28 12:12:12"),
("A","C","2022-11-28 12:12:13"),
("A","D","2022-11-28 12:12:14"),
("B","A","2022-11-28 12:12:15"),
("B","E","2022-11-28 12:12:16"),
("C","A","2022-11-28 12:12:17");
思路一:如果是互相关注,很容易想到表自关联,具体sql如下:
--方法1
select
tmp1.from_user,
tmp1.to_user,
if(tmp2.from_user is not null, '是', '否') as is_friend
from fans tmp1
left join fans tmp2
on tmp1.from_user = tmp2.to_user
and tmp1.to_user = tmp2.from_user;
思路二:使用union all进行上下拼接
--方法2
select
u1,
u2
from (
select
from_user u1,
to_user u2
from fans
union all
select
to_user u1,
from_user u2
from fans
) tmp1
group by u1, u2
having count(1) = 2;
上述两个sql逻辑都可以实现需求,但是当用户量到了亿级别,关注关系到了百亿级别,join底层会走shuffle,查询效率就会很低。
思路三:
select
from_user,
to_user,
---开窗函数sum() over (partition by..)的窗口计算范围:根据关系feature分组,组内全局聚合
if(sum(1) over (partition by feature) > 1, 1, 0) as is_friend
from
(
select
from_user,
to_user,
if(from_user > to_user, concat(to_user, from_user), concat(from_user, to_user)) as feature
from fans
)tmp1
当问题有多种解决方案时,尽可能选择最优解