问题描述:如下数据为蚂蚁森林中用户领取的减少碳排放量,找出连续3天及以上减少碳排量在100以上的用户。
id | dt | lowcarbon |
---|---|---|
1001 | 2021-12-12 | 123 |
1002 | 2021-12-12 | 45 |
1001 | 2021-12-13 | 43 |
1001 | 2021-12-13 | 45 |
1001 | 2021-12-13 | 23 |
1002 | 2021-12-14 | 45 |
1001 | 2021-12-14 | 230 |
1002 | 2021-12-15 | 45 |
1001 | 2021-12-15 | 23 |
DROP TABLE IF EXISTS `carbon`;
CREATE TABLE `carbon` (
`id` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
`dt` date NULL DEFAULT NULL,
`lowcarbon` int(11) NULL DEFAULT NULL
) ENGINE = InnoDB CHARACTER SET = utf8mb4 COLLATE = utf8mb4_general_ci ROW_FORMAT = Dynamic;
INSERT INTO `carbon` VALUES ('1001', '2021-12-12', 123);
INSERT INTO `carbon` VALUES ('1002', '2021-12-12', 45);
INSERT INTO `carbon` VALUES ('1001', '2021-12-13', 43);
INSERT INTO `carbon` VALUES ('1001', '2021-12-13', 45);
INSERT INTO `carbon` VALUES ('1001', '2021-12-13', 23);
INSERT INTO `carbon` VALUES ('1002', '2021-12-14', 45);
INSERT INTO `carbon` VALUES ('1001', '2021-12-14', 230);
INSERT INTO `carbon` VALUES ('1002', '2021-12-15', 45);
INSERT INTO `carbon` VALUES ('1001', '2021-12-15', 23);
select
id, dt, sum(lowcarbon) lowcarbon
from
carbon
group by id, dt
having lowcarbon > 100;
2. 对中间表t进行ROW_NUMBER(),加上行号,SQL语句如下。查询结果作为中间表t1。
select id, dt, lowcarbon,
ROW_NUMBER() over (partition by id ORDER BY dt) rk
from
(select
id, dt, sum(lowcarbon) lowcarbon
from
carbon
group by id, dt
having lowcarbon > 100)t;
3. 对中间表t1求dt与rk的差值得到新的属性new_dt,SQL语句如下。同时结果作为中间值t2。
select id, dt, lowcarbon, rk,
DATE_SUB(dt, INTERVAL rk DAY) new_dt
from
(
select id, dt, lowcarbon,
ROW_NUMBER() over (partition by id ORDER BY dt) rk
from
(select
id, dt, sum(lowcarbon) lowcarbon
from
carbon
group by id, dt
having lowcarbon > 100)t
)t1;
4. 对t2表按照id, new_dt分组,分组内数据量>=3即为符合要求的用户。sql代码如下:
select id
from
(
select id, dt, lowcarbon, rk,
DATE_SUB(dt, INTERVAL rk DAY) new_dt
from
(
select id, dt, lowcarbon,
ROW_NUMBER() over (partition by id ORDER BY dt) rk
from
(select
id, dt, sum(lowcarbon) lowcarbon
from
carbon
group by id, dt
having lowcarbon > 100)t
)t1
)t2 group by id, new_dt having count(id) >= 3;
问题描述:如下为电商公司用户访问时间数据,某个用户连续的访问记录如果时间间隔小于60秒,则分为一个组。
id | ts(秒) |
---|---|
1001 | 17523641234 |
1001 | 17523641253 |
1002 | 17523641278 |
1001 | 17523641334 |
1002 | 17523641434 |
1001 | 17523641534 |
1001 | 17523641544 |
1002 | 17523641634 |
1001 | 17523641638 |
1001 | 17523641654 |
语句如下:
DROP TABLE IF EXISTS `group_table`;
CREATE TABLE `group_table` (
`id` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
`ts` bigint NULL DEFAULT NULL
) ENGINE = InnoDB CHARACTER SET = utf8mb4 COLLATE = utf8mb4_general_ci ROW_FORMAT = DYNAMIC;
语句如下:
INSERT INTO `group_table` VALUES ('1001', 17523641234);
INSERT INTO `group_table` VALUES ('1001', 17523641253);
INSERT INTO `group_table` VALUES ('1001', 17523641334);
INSERT INTO `group_table` VALUES ('1001', 17523641534);
INSERT INTO `group_table` VALUES ('1001', 17523641544);
INSERT INTO `group_table` VALUES ('1001', 17523641638);
INSERT INTO `group_table` VALUES ('1001', 17523641654);
INSERT INTO `group_table` VALUES ('1002', 17523641278);
INSERT INTO `group_table` VALUES ('1002', 17523641434);
INSERT INTO `group_table` VALUES ('1002', 17523641634);
select id , ts,
ts - LAG(ts, 1, 0) over (partition by id order by ts) diff_ts
from
group_table;
2. 对t1表中的diff_ts字段进行转换操作,把大于等于60的转换为1,小于60的转换为0。对转换后的数据进行累加,从首行累加到当前行的值即为该行的分组。sql语句如下:
select id, ts,diff_ts,
sum(if(diff_ts >= 60, 1, 0)) over(partition by id order by ts) groupid
FROM
(
select id , ts,
ts - LAG(ts, 1, 0) over (partition by id order by ts) diff_ts
from
group_table
)t1;
问题描述:某游戏公司记录的用户每日登录数据,计算每个用户最大的连续登录天数,可以间隔一天。解释:如果一个用户在1,3,5,6登录游戏,则视为连续6天登录。数据如下:
id | dt |
---|---|
1001 | 2021-12-12 |
1002 | 2021-12-12 |
1001 | 2021-12-13 |
1001 | 2021-12-14 |
1001 | 2021-12-16 |
1002 | 2021-12-16 |
1001 | 2021-12-19 |
1002 | 2021-12-17 |
1001 | 2021-12-20 |
sql语句如下:
DROP TABLE IF EXISTS `lianxu_table`;
CREATE TABLE `lianxu_table` (
`id` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci NULL DEFAULT NULL,
`dt` date NULL DEFAULT NULL
) ENGINE = InnoDB CHARACTER SET = utf8mb4 COLLATE = utf8mb4_0900_ai_ci ROW_FORMAT = Dynamic;
sql语句如下:
INSERT INTO `lianxu_table` VALUES ('1001', '2021-12-19');
INSERT INTO `lianxu_table` VALUES ('1002', '2021-12-17');
INSERT INTO `lianxu_table` VALUES ('1001', '2021-12-13');
INSERT INTO `lianxu_table` VALUES ('1001', '2021-12-16');
INSERT INTO `lianxu_table` VALUES ('1001', '2021-12-14');
INSERT INTO `lianxu_table` VALUES ('1002', '2021-12-16');
INSERT INTO `lianxu_table` VALUES ('1001', '2021-12-12');
INSERT INTO `lianxu_table` VALUES ('1002', '2021-12-12');
INSERT INTO `lianxu_table` VALUES ('1001', '2021-12-20');
select id, dt ,
lag(dt, 1, '2000-10-10') over (partition by id order by dt) pre_dt
from lianxu_table;
select id, dt, pre_dt,
DATEDIFF(dt,pre_dt) diff_day
from
(
select id, dt ,
lag(dt, 1, '2000-10-10') over (partition by id order by dt) pre_dt
from lianxu_table
)t1;
select id, dt, pre_dt, diff_day,
sum(if(diff_day > 2, 1, 0)) over(partition by id order by dt) groupid
from
(
select id, dt, pre_dt,
DATEDIFF(dt,pre_dt) diff_day
from
(
select id, dt ,
lag(dt, 1, '2000-10-10') over (partition by id order by dt) pre_dt
from lianxu_table
)t1
)t2;
4. 对t3表,按照用户id和groupid进行分组,求分组内的个数,个数值即为用户连续登录的天数。查询语句如下:
select id, count(id) lianxu_day
from
(
select id, dt, pre_dt, diff_day,
sum(if(diff_day > 2, 1, 0)) over(partition by id order by dt) groupid
from
(
select id, dt, pre_dt,
DATEDIFF(dt,pre_dt) diff_day
from
(
select id, dt ,
lag(dt, 1, '2000-10-10') over (partition by id order by dt) pre_dt
from lianxu_table
)t1
)t2
)t3 GROUP BY id, groupid;
问题描述:如下为平台商品促销数据;字段为品牌,打折开始日期,打折结束日期,计算每个品牌总的打折销售天数,注意其中的交叉日期
brand | sdt | edt |
---|---|---|
oppo | 2021-06-05 | 2021-06-09 |
oppo | 2021-06-11 | 2021-06-21 |
vivo | 2021-06-05 | 2021-06-15 |
vivo | 2021-06-09 | 2021-06-21 |
redmi | 2021-06-05 | 2021-06-21 |
redmi | 2021-06-09 | 2021-06-15 |
redmi | 2021-06-17 | 2021-06-26 |
huawei | 2021-06-05 | 2021-06-26 |
huawei | 2021-06-09 | 2021-06-15 |
huawei | 2021-06-17 | 2021-06-21 |
语句如下:
DROP TABLE IF EXISTS `dazhe_tablle`;
CREATE TABLE `dazhe_tablle` (
`brand` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci NULL DEFAULT NULL,
`sdt` date NULL DEFAULT NULL,
`edt` date NULL DEFAULT NULL
) ENGINE = InnoDB CHARACTER SET = utf8mb4 COLLATE = utf8mb4_0900_ai_ci ROW_FORMAT = Dynamic;
语句如下:
INSERT INTO `dazhe_tablle` VALUES ('oppo', '2021-06-05', '2021-06-09');
INSERT INTO `dazhe_tablle` VALUES ('oppo', '2021-06-11', '2021-06-21');
INSERT INTO `dazhe_tablle` VALUES ('vivo', '2021-06-05', '2021-06-15');
INSERT INTO `dazhe_tablle` VALUES ('vivo', '2021-06-09', '2021-06-21');
INSERT INTO `dazhe_tablle` VALUES ('redmi', '2021-06-05', '2021-06-21');
INSERT INTO `dazhe_tablle` VALUES ('redmi', '2021-06-09', '2021-06-15');
INSERT INTO `dazhe_tablle` VALUES ('redmi', '2021-06-17', '2021-06-26');
INSERT INTO `dazhe_tablle` VALUES ('huawei', '2021-06-05', '2021-06-26');
INSERT INTO `dazhe_tablle` VALUES ('huawei', '2021-06-09', '2021-06-15');
INSERT INTO `dazhe_tablle` VALUES ('huawei', '2021-06-17', '2021-06-21');
select brand, sdt, edt,
max(edt) over (partition by brand order by edt rows BETWEEN unbounded preceding and 1 preceding) maxEdt
from dazhe_tablle;
2. 在t1表中使用edt - max(sdt, maxEdt),如果maxEdt为null或者sdt大于maxEdt,结果+1,计算结果为当前记录活动持续时间,记为dazhe_day_num。中间查询结果记为t2。sql语句如下:
select brand, sdt, edt, maxEdt,
DATEDIFF(edt,
CASE
WHEN maxEdt is NULL THEN DATE_SUB(sdt,INTERVAL 1 DAY)
WHEN sdt > maxEdt THEN DATE_SUB(sdt,INTERVAL 1 DAY)
ELSE maxEdt
END
) dazhe_day_num
from
(
select brand, sdt, edt,
max(edt) over (partition by brand order by edt rows BETWEEN unbounded preceding and 1 preceding) maxEdt
from dazhe_tablle
)t1;
select brand, sum(dazhe_day_num) tatal_day
from
select brand, sum(dazhe_day_num) tatal_day
from
(
select brand, sdt, edt, maxEdt,
DATEDIFF(edt,
CASE
WHEN maxEdt is NULL THEN DATE_SUB(sdt,INTERVAL 1 DAY)
WHEN sdt > maxEdt THEN DATE_SUB(sdt,INTERVAL 1 DAY)
ELSE maxEdt
END
) dazhe_day_num
from
(
select brand, sdt, edt,
max(edt) over (partition by brand order by edt rows BETWEEN unbounded preceding and 1 preceding) maxEdt
from dazhe_tablle
)t1
)t2 GROUP BY brand;
问题描述:如下为某平台主播开播及关播时间,根据该数据计算出平台最高峰同时在线的主播人数。数据如下:
id | sdt | edt |
---|---|---|
1001 | 2021-06-14 12:12:12 | 2021-06-14 18:12:12 |
1003 | 2021-06-14 13:12:12 | 2021-06-14 16:12:12 |
1004 | 2021-06-14 13:15:12 | 2021-06-14 20:12:12 |
1002 | 2021-06-14 15:12:12 | 2021-06-14 16:12:12 |
1005 | 2021-06-14 15:18:12 | 2021-06-14 20:12:12 |
1001 | 2021-06-14 20:12:12 | 2021-06-14 23:12:12 |
1006 | 2021-06-14 21:12:12 | 2021-06-14 23:15:12 |
1007 | 2021-06-14 22:12:12 | 2021-06-14 23:10:12 |
语句如下:
DROP TABLE IF EXISTS `zaixian_table`;
CREATE TABLE `zaixian_table` (
`id` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci NULL DEFAULT NULL,
`sdt` datetime NULL DEFAULT NULL,
`edt` datetime NULL DEFAULT NULL
) ENGINE = InnoDB CHARACTER SET = utf8mb4 COLLATE = utf8mb4_0900_ai_ci ROW_FORMAT = Dynamic;
语句如下:
INSERT INTO `zaixian_table` VALUES ('1001', '2021-06-14 12:12:12', '2021-06-14 18:12:12');
INSERT INTO `zaixian_table` VALUES ('1003', '2021-06-14 13:12:12', '2021-06-14 16:12:12');
INSERT INTO `zaixian_table` VALUES ('1004', '2021-06-14 13:15:12', '2021-06-14 20:12:12');
INSERT INTO `zaixian_table` VALUES ('1002', '2021-06-14 15:12:12', '2021-06-14 16:12:12');
INSERT INTO `zaixian_table` VALUES ('1005', '2021-06-14 15:18:12', '2021-06-14 20:12:12');
INSERT INTO `zaixian_table` VALUES ('1001', '2021-06-14 20:12:12', '2021-06-14 23:12:12');
INSERT INTO `zaixian_table` VALUES ('1006', '2021-06-14 21:12:12', '2021-06-14 23:15:12');
INSERT INTO `zaixian_table` VALUES ('1007', '2021-06-14 22:12:12', '2021-06-14 23:10:12');
select id, sdt dt, 1 as flag
from zaixian_table
union
select id, edt dt, -1 as flag
from zaixian_table;
2. 在t1表中按照dt时间升序进行窗口计算,统计当前时间在线人数,查询结果记为t2。sql语句如下:
select dt, sum(flag) over(order by dt) cnt
from
(
select id, sdt dt, 1 as flag
from zaixian_table
union
select id, edt dt, -1 as flag
from zaixian_table
)t1
select max(cnt)
from
(
select dt, sum(flag) over(order by dt) cnt
from
(
select id, sdt dt, 1 as flag
from zaixian_table
union
select id, edt dt, -1 as flag
from zaixian_table
)t1
)t2;