hive经典面试题连续天数sql题

1.先把数据按照用户id分组,根据登录日期排序

SQL:

SELECT
   id,
   login_date,
   row_number() over(partition by id order by login_date asc) as rn 
FROM data;

结果:

+---+----------+---+
|id |login_date|rn |
+---+----------+---+
|01 |2021-02-28|1  |
|01 |2021-03-01|2  |
|01 |2021-03-02|3  |
|01 |2021-03-04|4  |
|01 |2021-03-05|5  |
|01 |2021-03-06|6  |
|01 |2021-03-08|7  |
|02 |2021-03-01|1  |
|02 |2021-03-02|2  |
|02 |2021-03-03|3  |
|02 |2021-03-06|4  |
|03 |2021-03-06|1  |
+---+----------+---+

2.用登录日期与rn求date_sub,得到的差值日期如果是相等的,则说明这两天肯定是连续的

SQL:

SELECT
     t1.id,
     t1.login_date,
     date_sub(t1.login_date, rn) as diff_date
FROM
    (
       SELECT
           id,
           login_date,
           row_number() over(partition by id order by login_date asc) as rn 
       FROM data
    ) t1;

结果:

+---+----------+----------+
|id |login_date|diff_date |
+---+----------+----------+
|01 |2021-02-28|2021-02-27|
|01 |2021-03-01|2021-02-27|
|01 |2021-03-02|2021-02-27|
|01 |2021-03-04|2021-02-28|
|01 |2021-03-05|2021-02-28|
|01 |2021-03-06|2021-02-28|
|01 |2021-03-08|2021-03-01|
|02 |2021-03-01|2021-02-28|
|02 |2021-03-02|2021-02-28|
|02 |2021-03-03|2021-02-28|
|02 |2021-03-06|2021-03-02|
|03 |2021-03-06|2021-03-05|
+---+----------+----------+

3.根据id和日期差date_diff分组,登录次数即为分组后的count(1)

SQL:

SELECT
 t2.id,
 count(1)           as login_times,
 min(t2.login_date) as start_date,
 max(t2.login_date) as end_date
FROM
(
    SELECT
     t1.id,
     t1.login_date,
     date_sub(t1.login_date,rn) as diff_date
    FROM
    (
        SELECT
         id,
         login_date,
         row_number() over(partition by id order by login_date asc) as rn 
        FROM data
    ) t1
) t2
group by t2.id, t2.diff_date
having login_times >= 3;

结果:

+---+-----------+----------+----------+
|id |login_times|start_date|end_date  |
+---+-----------+----------+----------+
| 01|3          |2021-02-28|2021-03-02|
| 01|3          |2021-03-04|2021-03-06|
| 02|3          |2021-03-01|2021-03-03|
+---+-----------+----------+----------+

你可能感兴趣的:(hive,数据库,大数据)