HiveSQL题——互相关注(共同好友)

0 问题描述

    社交软件上如何判断自己关注的博主是否也关注了自己呢?现在有一张表为fans(粉丝表),表中有两个字段from_user,to_user,如果两者一致代表from_user关注了to_user。

1 数据准备

create table if not exists table15
(
    from_user   string comment '关注用户',
    to_user     string comment '被关注用户',
   `timestamp`  string comment '关注时间'
)
    comment '关注表';
INSERT overwrite table fans
VALUES ("A","B","2022-11-28 12:12:12"),
       ("A","C","2022-11-28 12:12:13"),
       ("A","D","2022-11-28 12:12:14"),
       ("B","A","2022-11-28 12:12:15"),
       ("B","E","2022-11-28 12:12:16"),
       ("C","A","2022-11-28 12:12:17");

2 数据分析

   思路一:如果是互相关注,很容易想到表自关联,具体sql如下:

--方法1
    select
        tmp1.from_user,
        tmp1.to_user,
        if(tmp2.from_user is not null, '是', '否') as is_friend
    from fans tmp1
             left join fans tmp2
                       on tmp1.from_user = tmp2.to_user
                           and tmp1.to_user = tmp2.from_user;


 思路二:使用union all进行上下拼接

--方法2
    select
        u1,
        u2
    from (
             select
                 from_user u1,
                 to_user   u2
             from fans
             union all
             select
                 to_user   u1,
                 from_user u2
             from fans
         ) tmp1
    group by u1, u2
    having count(1) = 2;

     上述两个sql逻辑都可以实现需求,但是当用户量到了亿级别,关注关系到了百亿级别,join底层会走shuffle,查询效率就会很低

思路三:

select 
      from_user,
      to_user,
  ---开窗函数sum() over (partition by..)的窗口计算范围:根据关系feature分组,组内全局聚合
      if(sum(1) over (partition by feature) > 1, 1, 0) as is_friend
from 
(
  select 
       from_user,
       to_user,
       if(from_user > to_user, concat(to_user, from_user), concat(from_user, to_user)) as feature
   from fans 
)tmp1

3 小结

   当问题有多种解决方案时,尽可能选择最优解

你可能感兴趣的:(Hive,数据仓库,大数据,hive)