mysql视频练习题
2个表:
- order_info_utf.csv
- user_info_utf.csv
导入到mysql数据库。
题目:
1.统计不同月份下单的人数。
⚠️这里的人数是指共有多少名自然人下单,不是指“人次”。所以count()内加上distinct,去重复。
SELECT month(paidTime), count(distinct userId) FROM test1.orderinfo where isPaid = "已支付" group by month(paidTime);
+-----------------+------------------------+ | month(paidTime) | count(distinct userId) | +-----------------+------------------------+ | 3 | 54799 | | 4 | 43967 | | 5 | 6 | +-----------------+------------------------+ 3 rows in set (1.23 sec)
2.统计用户3月的回购率和复购率
2.1复购率:本月消费1次以上的人数/本月消费的总人数。
SELECT userid, count(userid) as ct FROM test1.orderinfo where isPaid = "已支付" and month(paidTime) = 3 group by userid having ct > 1;
得到一个列表:本月消费1次以上的人的id和他的消费次数的集合。
这里使用了having筛选出消费1次以上的人的id。
更好的方法是,查询结果显示2个列,分别储存“本月消费1次以上的人数”, “本月消费的总人数”:
所以,需要外面再加一层,并去掉having:
select count(if(ct > 1, 1, null)), count(1) from ( SELECT userid, count(userid) as ct FROM test1.orderinfo where isPaid = "已支付" and month(paidTime) = 3 group by userid) as t
得到2个数字16916和54799,所以复购率是:30.87%
2.2回购率:3月消费的人,又在4月进行了消费。这是一次回购:x。x除以3月消费的人数= 3月回购率。
3月消费的人数: 54799人
select count(1) from ( select userid from test1.orderinfo where isPaid = "已支付" and month(paidTime) = 3 group by userid) as user
3月消费的人,又在4月进行了消费的人数: 13119人
select count(1) from ( select userid from test1.orderinfo where isPaid = "已支付" and month(paidTime) = 3 group by userid) as user_3 inner join ( select userid from test1.orderinfo where isPaid = "已支付" and month(paidTime) = 4 group by userid) as user_4 on user_3.userid = user_4.userid;
因此最后: 13119/54799 =23.94%, 所以3月回购率是23.94%。
2.3计算所有月份的回购率
需要对2.2的代码进行修改。
select * from ( select userid, date_format(paidTime, "%Y-%m-01") as m, count(isPaid) from test1.orderinfo where isPaid = "已支付" group by userid, m)) as a left join ( select userid, date_format(paidTime, "%Y-%m-01") as m, count(isPaid) from test1.orderinfo where isPaid = "已支付" group by userid, m) as b on a.userid = b.userid
上面的代码把两个完全相同的表连接,解释:
1.子查询使用userid和date_format(paidTime, "%Y-%m-01")列进行分组。即得到的记录是用户在一个月内的消费次数。
2.使用date_format()函数,改变格式,把日设为01,以便进行后面的DATE_SUB(start_date,INTERVAL expr unit)计算。
然后,见完整代码:
3.⚠️使用的是左连接left join
4.因为使用左连接,所以可以这么增加一个筛选条件:and a.m = date_sub(b.m, interval 1 month)
- 即让a的月份=b的月份-1. 这样就可以把:每条记录按照同一用户在3月消费记录和4月消费记录连接起来,同样4月和5月,6月和7月等等,即相邻月份进行关联,以便计算回购率。
- 不符合条件的b表的value都是null表示。
5. 最后的外层表用a.m进行分组,然后使用聚合函数统计a表每个月的消费人数,以及b.表每个月的消费人数。因为通过条件合并了相邻月份。所以最后得到回购率的分母和分子。
select a.m, count(a.m),count(b.m) from ( select userid, date_format(paidTime, "%Y-%m-01") as m, count(isPaid) from test1.orderinfo where isPaid = "已支付" group by userid, m) as a left join ( select userid, date_format(paidTime, "%Y-%m-01") as m, count(isPaid) from test1.orderinfo where isPaid = "已支付" group by userid, m) as b on a.userid = b.userid and a.m = date_sub(b.m, interval 1 month) group by a.m
⚠️
6. 这里必须使用and,而不是where,
- 因为where是对left join ...on..后的数据再筛选,不符合条件的就被去掉了。
- 而and是在 on子句的内部,不符合条件的数据会用null表示。
and a.m = date_sub(b.m, interval 1 month)
3.统计男女的消费频次是否有差异
设所有男性消费者为x, 所有男性消费者的合计消费次数是x_order,那么男性消费频次为 x_order/x
select sex, avg(ct) from ( select o.userid, t.sex, count(1) as ct from orderinfo as o inner join ( SELECT * FROM test1.userinfo where sex <> "") t on o.userid = t.userid and o.ispaid = "已支付" group by o.userid, t.sex) t2 group by sex
4.统计多次消费的用户,第一次和最后一次消费间隔是多少
(相当于一个消费者的消费的周期)
首先,查询每个用户第一次消费和最后一次消费的时间,使用max, min函数:
SELECT userid, max(paidtime), min(paidtime) FROM test1.orderinfo where ispaid = "已支付" group by userid having count(1) > 1
然后,让max()减去min()但得到的是秒,所以需要使用datediff()函数,
SELECT userid,max(paidtime), min(paidtime), datediff(max(paidtime), min(paidtime)) as interval_day FROM test1.orderinfo where ispaid = "已支付" group by userid having count(1) > 1
最后,再加上一层:
select avg(df) from ( SELECT userid,max(paidtime), min(paidtime), datediff(max(paidtime), min(paidtime)) as df FROM test1.orderinfo where ispaid = "已支付" group by userid having count(1) > 1) t
得到一个结果:15.6484天。
求平均值,这是不准确的简单计算。根据用户类型,个别高消费频次用户,应该属于统计中的极值,不当算在统计样本中。
或者改求中位数的值。
5统计不同年龄段,用户的消费金额是否有差异?(将消费金额定义为每个年龄段的人均消费金额和总金额。)
首先,得到一个每个用户所在年龄组的表t1
然后,使用orderinfo,得到每个用户的消费总额的表t2。
最后,把t1,t2内连接后,按照age_group分组,并使用聚合函数avg()计算每个年龄组的人均消费金额。
select t1.age_group, cast(avg(t2.price) as decimal(10,2)) as avg_price from (select userid, ceil((year(now())-year(birth))/10) as age_group from userinfo where birth > '1901-01-01') t1 #用'1901-01-01'去除一些脏数据 inner join ( SELECT userid, sum(price) as price FROM test1.orderinfo where isPaid = '已支付' group by userid) t2 on t1.userid = t2.userid group by t1.age_group order by age_group
更好的方法是使用case when:
select t1.age_group, cast(avg(t2.price) as decimal(10,2)) as avg_price from ( select userid, case when (year(now())-year(birth)) <=10 then "<=10 " when (year(now())-year(birth)) between 11 and 20 then "10 to 20" when (year(now())-year(birth)) between 21 and 30 then "20 to 30" when (year(now())-year(birth)) between 31 and 40 then "30 to 40" when (year(now())-year(birth)) between 41 and 50 then "40 to 50" when (year(now())-year(birth)) between 51 and 70 then "50 to 70" when (year(now())-year(birth)) >=71 then ">=71" end as age_group from userinfo where birth > '1901-01-01') t1 inner join ( SELECT userid, sum(price) as price FROM test1.orderinfo where isPaid = '已支付' group by userid) t2 on t1.userid = t2.userid group by t1.age_group order by t1.age_group
结果:
+-----------+-----------+ | age_group | avg_price | +-----------+-----------+ | 10 to 20 | 846.63 | | 20 to 30 | 1003.96 | | 30 to 40 | 1178.61 | | 40 to 50 | 1183.59 | | 50 to 70 | 1099.86 | | <=10 | 1322.29 | | >=71 | 3269.92 | +-----------+-----------+
6统计消费的2/8法则,消费的top20%的用户,贡献了多少额度
首先,计算出每个用户的消费额度@total, 和总共有多少消费用户。
然后,计算前20%的用户数量,使用limit得到这个用户范围的表格t
最后,对t计算消费总额@top20_price,这个数值除以@total,得到0.85。即前20%的用户,贡献了85%的额度。
select @count := count(userid), @total := sum(price) as total from ( SELECT userid, sum(price) as price FROM test1.orderinfo where isPaid = '已支付' group by userid order by price desc) t; #total 318503081.54, count = 85649 select @top20_percent := ceil(@count*0.2); #17130 select @top20_price :=sum(t.price) from ( SELECT userid, sum(price) as price FROM test1.orderinfo where isPaid = '已支付' group by userid order by price desc limit 17130) t # 272203711.45 select @top20_price/@total; #0.85
或者:
select @count := count(userid), @total := sum(price) as total from ( SELECT userid, sum(price) as price FROM test1.orderinfo where isPaid = '已支付' group by userid order by price desc) t; #total 318503081.54, count = 85649 -- select @top20_percent := ceil(@count*0.2); #17130 select sum(price)/@total from ( select row_number() over(order by price desc ) as rk, userid, price from ( SELECT userid, sum(price) as price FROM test1.orderinfo where isPaid = '已支付' group by userid order by price desc) as t1 ) as t2 where t2.rk < @top20_percent
这里使用row_numbe()计算函数,得到一个排名,然后用where子句得到前20%的用户的表t2,最后t2进行计算。