用户行为分析(SQL)

一、前言

之前在博客记录了一篇行为分析(python)的文章,后来觉得自己可以用SQL再走一遍,也算练练手。数据来源于天池的2020-04-13的“UserBehavior.csv”。
数据集包含了2017年11月25日至2017年12月3日之间,有行为的约一百万随机用户的所有行为(行为包括点击、购买、加购、喜欢)。数据集的组织形式和MovieLens-20M类似,即数据集的每一行表示一条用户行为,由用户ID、商品ID、商品类目ID、行为类型和时间戳组成,并以逗号分隔。
我自己的MySql版本是5.5,Navicat是10.1,窗口函数不支持。另外,这一次我并没有将表拆分,之后的处理和查询也都是在一表中进行的,工作中还是应该循序范式操作
以下是我的分析操作流程,这里就不再做过多的结论描述了,毕竟只是对前文各分析模块用Sql取数。

二、分析流程

依然从以下四个方向着手进行分析

1.1 用户行为时间模型

PV、UV。
留存率。

1.2 用户消费行为分析

各周期内消费次数统计。
各行为转化模型。
复购率。
回购率。

1.3 用户价值分析

RFM模型。

1.4 商品分析

商品和行为关系。
TOP商品分析。

三、数据处理

源数据中的时间是时间戳,这里我将时间戳进行转换,并且增加日期和对应时段用于后续分析,因为电脑配置原因,我这里只写入大概80万条数据,实在是卡的要命。。。

1.新增列

增加转化后的数据date(%Y-%m-%d %H:%i:%s)
alter table ub add date VARCHAR(32)
UPDATE ub SET date = FROM_UNIXTIME(time,'%Y-%m-%d %H:%i:%s')
SELECT * from ub limit 10

新增日期(%Y-%m-%d)一列
alter table ub add day VARCHAR(32)
UPDATE ub SET day = cast(date as DATE)

新增小时列
alter table ub add hour VARCHAR(32)
UPDATE ub SET hour = right(CONVERT(date,TIME),8)
UPDATE ub SET hour = hour(date)

alter table ub modify column date datetime
alter table ub modify column day datetime
alter table ub modify column hour INT

在这里插入图片描述

2.查看缺失值

SELECT COUNT(1) FROM ub WHERE user_id IS NULL

3.备份表

有备无患

SELECT * INTO behavior_ORIG FROM behavior

4.删除无关日期

因为实际数据中不仅仅是2017-11-25 到 2017-12-04的数据,也有一些零零碎碎的杂项,所以对无关日期进行了删除处理。

DELETE FROM ub WHERE day < '2017-11-25' OR day >= '2017-12-04'

三、各分析项数据提取

1.用户行为时间模型

1.1每日pv

SELECT 
day,
COUNT(1) PV
FROM ub
GROUP BY day 
ORDER BY day ASC

用户行为分析(SQL)_第1张图片

1.2时段pv

SELECT hour,count(1)
from ub
GROUP BY hour
ORDER BY hour asc

用户行为分析(SQL)_第2张图片

1.3留存

SELECT 
diff_day.min_day as min_day,
sum(case when diff_day.to_fday=0 then 1 else 0 end) as day_1,
sum(case when diff_day.to_fday=1 then 1 else 0 end) as day_2,
sum(case when diff_day.to_fday=2 then 1 else 0 end) as day_3,
sum(case when diff_day.to_fday=3 then 1 else 0 end) as day_4,
sum(case when diff_day.to_fday=4 then 1 else 0 end) as day_5,
sum(case when diff_day.to_fday=5 then 1 else 0 end) as day_6,
sum(case when diff_day.to_fday=6 then 1 else 0 end) as day_7,
sum(case when diff_day.to_fday=7 then 1 else 0 end) as day_8
from 
		(SELECT a.user_id,a.day,b.min_day,DATEDIFF(a.day,b.min_day) as to_fday
		from ub as a
		LEFT JOIN
						(SELECT user_id, min(day) as min_day
						 from ub
						 GROUP BY user_id) as b 
		on a.user_id=b.user_id
		order by a.user_id,a.day) as diff_day
GROUP BY diff_day.min_day
ORDER BY diff_day.min_day

用户行为分析(SQL)_第3张图片

2.用户消费行为分析

2.1各类行为随日期的变化

SELECT
gp_day_type.type,
max(case when gp_day_type.day = '2017-11-25 00:00:00' then gp_day_type.type_count ELSE 0 end) as '2017-11-25',
max(case when gp_day_type.day = '2017-11-26 00:00:00' then gp_day_type.type_count ELSE 0 end) as '2017-11-26',
max(case when gp_day_type.day = '2017-11-27 00:00:00' then gp_day_type.type_count ELSE 0 end) as '2017-11-27',
max(case when gp_day_type.day = '2017-11-28 00:00:00' then gp_day_type.type_count ELSE 0 end) as '2017-11-28',
max(case when gp_day_type.day = '2017-11-29 00:00:00' then gp_day_type.type_count ELSE 0 end) as '2017-11-29',
max(case when gp_day_type.day = '2017-11-30 00:00:00' then gp_day_type.type_count ELSE 0 end) as '2017-11-30',
max(case when gp_day_type.day = '2017-12-01 00:00:00' then gp_day_type.type_count ELSE 0 end) as '2017-12-01',
max(case when gp_day_type.day = '2017-12-02 00:00:00' then gp_day_type.type_count ELSE 0 end) as '2017-12-02',
max(case when gp_day_type.day = '2017-12-03 00:00:00' then gp_day_type.type_count ELSE 0 end) as '2017-12-03',
max(case when gp_day_type.day = '2017-12-04 00:00:00' then gp_day_type.type_count ELSE 0 end) as '2017-12-04',
max(case when gp_day_type.day = '2017-12-05 00:00:00' then gp_day_type.type_count ELSE 0 end) as '2017-12-05',
max(case when gp_day_type.day = '2017-12-06 00:00:00' then gp_day_type.type_count ELSE 0 end) as '2017-12-06'
from
		(SELECT day, type, count(user_id) as type_count
		from ub
		GROUP BY day, type) as gp_day_type
GROUP BY gp_day_type.type

在这里插入图片描述

2.2行为转化

SELECT 
type,
count(user_id) as count,
round((count(user_id)/(SELECT count(user_id) from ub where type='pv')),3)*100 as 'Proportion'
from ub
GROUP BY type
ORDER BY Proportion asc

用户行为分析(SQL)_第4张图片

2.3UV转化率

刚看到我这里用了a.*,实际中应该避免这样写法

SELECT a.* 
from
		(SELECT 
				 type,
				 count(DISTINCT user_id) as count,
				 round((count(DISTINCT user_id)/(SELECT count(DISTINCT user_id) from ub where type='pv')),3)*100 as 'Proportion'
		 from ub
		 GROUP BY type
		 UNION all
		 SELECT	
				 'uv' type, 
				 count(DISTINCT user_id) as count,
				 '100' as 'Proportion'
		 FROM ub) as a
ORDER BY a.Proportion asc

用户行为分析(SQL)_第5张图片

2.4复购率

这里我用有复购行为的用户数 / 有购买行为的用户数

SELECT
((select 
sum(buy_s.count)
from
	(SELECT
		user_id,
		count(user_id) as count
	from ub
	where type='buy'
	group by user_id
	having count(user_id)>2) as buy_s)
/
(select 
sum(buy_o.count)
from
	(SELECT
		user_id,
		count(user_id) as count
	from ub
	where type='buy'
	group by user_id) as buy_o))*100 as '复购率'

在这里插入图片描述

2.5复购间隔天数分布

SELECT
gp_days.user_id, 
min(gp_days.day) as min_day, 
max(gp_days.day) as max_day,
DATEDIFF(max(gp_days.day),min(gp_days.day)) as DATEDIFF
FROM
(SELECT
buy_all.user_id,buy_all.day
from 
				(SELECT user_id, day
				from ub
				where type='buy') as buy_all
		join 
				(SELECT user_id, count(user_id) as count
				from ub
				where type='buy'
				group by user_id
				having count(user_id)>2) as buy_s 
		on buy_all.user_id=buy_s.user_id) as gp_days
GROUP BY gp_days.user_id

用户行为分析(SQL)_第6张图片

3.用户价值分析(RFM模型)

这里我的思路是取出各用户的最近一次消费时间、消费频率、共计消费金额,之后分别计算这三类的平均值,最后再用三类数据减去均值,如果大于0置1,小于0置0。
源数据中没有消费金额项,所以只能将用户分成四类

3.1 -R 最近一次消费时间

3.1.1 r平均值

SELECT
avg(a.datediff)
FROM
(SELECT user_id, 
			 datediff('2017-12-03 00:00:00',max(day)) as datediff
from ub
where type='buy'
GROUP BY user_id) as a

3.1.2所有数据最大日期

SELECT max(day) from ub

3.1.3R列

SELECT user_id, 
			 if(datediff('2017-12-03 00:00:00',max(day))-2.5076>0,1,0 ) as r
from ub
where type='buy'
GROUP BY user_id

用户行为分析(SQL)_第7张图片

3.2 -f 消费频率

3.2.1平均消费频率

SELECT
avg(a.count)
FROM
	(select user_id,count(1) as count
	from ub
	where type = 'buy'
	GROUP BY user_id) as a

在这里插入图片描述

3.2.2 F列

select user_id,
if(count(1)-3.0346 >0 ,1, 0) as f
from ub
where type = 'buy'
GROUP BY user_id

用户行为分析(SQL)_第8张图片

4 RF模型

SELECT
r.user_id,CONCAT(r.r,f.f) as rf
FROM
	(SELECT user_id, 
				 if(datediff('2017-12-03 00:00:00',max(day))-2.5076>0,1,0 ) as r
	from ub
	where type='buy'
	GROUP BY user_id) as r
join
	(select user_id,
				 if(count(1)-3.0346 >0 ,1, 0) as f
	from ub
	where type = 'buy'
	GROUP BY user_id) as f
ON r.user_id=f.user_id

用户行为分析(SQL)_第9张图片

4. 商品分析

4.1 前十类目点击量和购买量关系分析desc

SELECT buy.cate_id,pv.pv_count,buy.buy_count,round(buy.buy_count/pv.pv_count,3)*100 as Transform
from 
		(SELECT cate_id,count(1) as buy_count
		from ub
		where type='buy'
		GROUP BY cate_id) as buy
join
		(SELECT cate_id,count(1) as pv_count
		from ub
		where type='pv'
		GROUP BY cate_id) as pv
on buy.cate_id = pv.cate_id
ORDER BY buy.buy_count desc
LIMIT 10

用户行为分析(SQL)_第10张图片

4.2 前十商品点击量和购买量关系分析desc

SELECT buy.item_id,pv.pv_count,buy.buy_count,round(buy.buy_count/pv.pv_count,3)*100 as Transform
from 
		(SELECT item_id,count(1) as buy_count
		from ub
		where type='buy'
		GROUP BY item_id) as buy
join
		(SELECT item_id,count(1) as pv_count
		from ub
		where type='pv'
		GROUP BY item_id) as pv
on buy.item_id = pv.item_id
ORDER BY buy.buy_count desc
LIMIT 10

用户行为分析(SQL)_第11张图片

你可能感兴趣的:(练习)