使用MySQL进行销售数据分析(练习)

数据说明

由两张表组成,其中orderinfo表记录了3月-5月的订单情况,字段包括orderid、userid、ispaid(支付状态)、price和paidtime(支付时间),共539411行。

userinfo记录了所有用户情况,字段包括userid、sex和birth,共101535行。

分析问题

1.统计不同月份的下单人数

2.统计用户三月份的回购率和复购率

3.统计男女用户的消费频次是否有差异

4.统计多次消费的用户,第一次和最后一次消费间隔是多少?

5.统计不同年龄段,用户的消费金额是否有差异?

6.统计消费的二八法则,消费的top20%用户,贡献了多少额度

导入处理

  1. 在数据库中建表
create table orderinfo(
orderid    int   primary key not null ,
userid      int,
isPaid      varchar(10),
price        float,
paidTime   varchar(30));

create table userinfo(
    userid   int primary key,
    sex     varchar(10),
    birth    date);

2.从cvs导入数据到mysql

load data local  infile 'C:/Users/47207/Desktop/order_info_utf.csv' into table orderinfo  fields terminated by ',';
load data local  infile 'C:/Users/47207/Desktop/user_info_utf.csv' into table userinfo  fields terminated by ',';

注:这里fields terminated by ','是指文件内容按‘逗号为分割填入sql表中

  1. 观察数据,将数据格式转化为统一的sql日期格式


    image.png

1、先把时间格式标准化为 1993-02-27
update orderinfo set paidtime=replace(paidtime,'/','-') where paidtime is not null;
2、然后更新字符串为日期格式,然后才能使用日期函数进行操作,
update orderinfo set paidtime=str_to_date(paidtime,'%Y-%m-%d %H:%i') where paidtime is not null;
预处理完成如下:

image.png

问题解决

1.统计不同月份的下单人数

select DATE_FORMAT(paidtime,'%Y-%M') as month, count(distinct userid)
from orderinfo
where ispaid='已支付'
group by month;
image.png

2.统计用户三月份的回购率复购率
这两个指标逻辑不同:
(1).回购率是三月购买的userid四月依旧购买。

select count(distinct userid)
from orderinfo 
where month(paidtime)=4 and ispaid='已支付' and userid in (
select userid
from orderinfo
where ispaid='已支付' and month(paidtime)=3
group by userid);

上述语句得出了4月依旧购买的人数,除以3月购买人数即可
image.png

方法二:

select count(b.april_user)/count(a.march_user)
from (select userid as march_user
from orderinfo
where isPaid='已支付' and month(paidTime)=3
group by userid) a left join (select userid as april_user
from orderinfo
where isPaid='已支付' and month(paidTime)=4
group by userid) b on a.march_user=b.april_user;
image.png

每个月的复购率可以如下计算:

 select t1.m,count(t1.m),count(t2.m),count(t2.m)/count(t1.m) from 
    (select userid,date_format(paidtime,"%Y-%m-01") m from orderinfo where isPaid = "已支付" group by userid,date_format(paidtime,"%Y-%m-01")) t1
    left join 
    (select userid,date_format(paidtime,"%Y-%m-01") m from orderinfo where isPaid = "已支付" group by userid,date_format(paidtime,"%Y-%m-01")) t2
    on t1.userid = t2.userid and t2.m=date_add(t1.m,interval 1 month) group by t1.m;
image.png

(2).复购率是三月所有消费者中,购买2次及以上消费者的比例。

select count(purchase_num), SUM(IF(purchase_num>1,1,NULL)), SUM(IF(purchase_num>1,1,NULL))/count(purchase_num) as rate
FROM (select userid, count(*) as purchase_num 
from orderinfo
where ispaid='已支付' and month(paidtime)=3
group by userid) a;
image.png

3.统计男女用户的消费频次是否有差异(平均消费次数)

select u.sex, count(o.userid)/count(distinct u.userid)
from orderinfo o, userinfo u
where o.userid=u.userid and o.isPaid='已支付' and u.sex <> ''
group by u.sex;
image.png

4.统计多次消费的用户,第一次和最后一次消费间隔是多少?

select userid, min(paidtime), max(paidtime), DATEDIFF(max(paidtime),min(paidtime))
from orderinfo
where ispaid='已支付'
group by userid
having count(*)>1
order by userid asc;
image.png

5.统计不同年龄段,用户的消费金额是否有差异?
思路:去除有问题日期数据后,将年龄分段(ceil函数向上取整)。再求出每个用户的年龄段、订单数、总金额。最后求出不同年龄段的订单平均金额。

select a.age_level, sum(a.sm)/sum(a.ct)
from 
(select u.userid, u.age_level, count(u.userid) as ct, sum(price) as sm
from orderinfo o join
(select userid, CEIL((year(now())-year(birth))/10) as age_level
from userinfo
where birth > '1901-00-00') u on o.userid=u.userid
where o.isPaid='已支付'
group by u.userid, u.age_level) a
group by a.age_level
order by a.age_level asc;
image.png

6.统计消费的二八法则,消费的top20%用户,贡献了多少额度
思路:先求出总消费金额和用户数。

select count(*), sum(a.sum_p)
from(
select userid, sum(price) as sum_p
from orderinfo
where isPaid='已支付'
group by userid
order by sum(price) desc) a;
image.png

前百分之20%的用户为17130

select count(*), sum(a.sum_p)
from(
select userid, sum(price) as sum_p
from orderinfo
where isPaid='已支付'
group by userid
order by sum(price) desc
limit 17130) a;
image.png

贡献了额度的272201811.46/318501846.54=85%,基本符合二八定律。

你可能感兴趣的:(使用MySQL进行销售数据分析(练习))