销售数据文件链接(失效了可以在评论区艾特):
https://pan.baidu.com/s/1-thB0rrXYmkSoey30lwYLg 提取码:kmr8
使用mysql完成上面的6个问题。首先导入本地csv文件到mysql,有两种方式。第一种,直接导入,如下图所示:
注意:csv表要有表头,不然会导入不进去。
但是这种方式导入数据很慢,采取第二种方式,用代码导入很快
导入两个数据表order_info和user_info,其中order_info包含的信息如下:
user_info包含的信息如下:
select month(paidtime),count(distinct userid)
from order_info
where status !='未支付'
group by month(paidtime);
select userid,count(userid)
from order_info
where status ='已支付'
and month(paidtime)=3
group by userid;
先计算出3月份每个用户的购买次数,复购率就等于购买次数大于1的人数除以购买的总人数
select count(ct) '购买人数',
count(if(ct>1,1,null)) '回购人数' ,
count(if(ct>1,1,null))/count(ct) '复购率'
from(
select userid,Count(userid) as ct from order_info
where status = '已支付'
and month(paidTime) = 3
group by userid
order by userid) t;
结果如下
对于回购率,就是计算一段时间间隔内用户再次购买的概率
select t1.m '月份',
count(t1.m) '消费总数' ,
count(t2.m) '回购数' ,
count(t2.m)/count(t1.m) '回购率'
from (
select userid,date_format(paidtime,'%Y-%m-01') as m
from order_info
where status = '已支付'
group by userid,date_format(paidtime,'%Y-%m-01')
) t1
left join (
select userid,date_format(paidtime,'%Y-%m-01') as m
from order_info
where status = '已支付'
group by userid,date_format(paidtime,'%Y-%m-01')
) t2
on t1.userid=t2.userid
and t1.m = date_sub(t2.m,interval 1 month)
group by t1.m;
select sex,avg(ct) from(
select o.userid,sex,count(o.userid)as ct from order_info o
inner join
(select * from user_info
where sex != '') t
on o.userid = t.userid
group by userid,sex
order by userid)t2
group by sex;
select userid,max(paidtime),min(paidtime),datediff(max(paidtime),min(paidtime))
from order_info
where status = '已支付'
group by userid
having count(1)>1;
select o.userid,age,sum(price),count(o.userid) as ct
from order_info o
left join (
select userid,ceil((year(now())-year(birth))/10) as age
from user_info
where birth>1901-00-00) t
on o.userid=t.userid
where status = '已支付'
group by age;
首先计算总的消费额度
select count(userid),sum(total)
from (
select userid,sum(price) as total
from order_info
where status = '已支付'
group by userid
order by total DESC
) as t;
然后计算消费的top20%
select count(userid)*0.2,sum(total)
from (
select userid,sum(price) as total
from order_info
where status = '已支付'
group by userid
order by total DESC
) as t;
select count(userid),sum(total)
from (
select userid,sum(price) as total
from order_info
where status = '已支付'
group by userid
order by total DESC
limit 17129
) as t;
然后用前20%的消费额度除以上面得到的总额度就得到了top20%的消费额度占比。
最近找实习找的挺惆怅的,希望可以快点找到好实习,找到想去的公司,就不用天天这么丧了
好想有一个数据分析行业相关的朋友给我指点指点…双非硕真的这么不行么…还是我不行…