数据来源:https://www.kaggle.com/mehdidag/black-friday/version/1
数据来源自kaggle平台的BlackFriday.csv文件,包含54万条记录,12个字段。
字段说明:User_ID:用户编码,用户唯一标识
Product_ID:产品编码,商品唯一标识
Gender:性别(F表示女性,M表示男性)
Age:年龄(分0~17、18~25、26~35、36~45、46~50、51~55、55+共7个年龄段)
Occupation:职业(由0~20数字组成,分成20个类别)
City_Category:城市类别(分A、B、C共3个类别)
Stay_In_Current_City_Years:在当前城市停留的年份(分0、1、2、3、4+共5个类别)
Marital_Status:婚姻状况(0表示未婚,1表示已婚)
Product_Category_1:商品所属分类1(以数字为代号,不可为空)
Product_Category_2:商品所属分类2(以数字为代号,可为空)
Product_Category_3:商品所属分类3(以数字为代号,可为空)
Purchase:消费金额(单位:美元)
分析思路:“黑五”期间最销量最高的商品是什么?
销量最高的商品种类是什么?
不同城市销量的差异?
不同性别、年龄、职业群体的消费状况
利用PPT进行可视化展示
分析过程:
① 将数据导入Navicat中(此步骤略)
②销量最高的商品TOP 10
(代码)
SELECT product_id, count( * ) AS sales_volume, sum( purchase ) AS sale
FROM blackfriday
GROUP BY product_id
ORDER BY count( * ) DESC;
LIMIT 10;
(PPT)
② 销量最高的商品种类TOP10
(代码)
SELECT concat('T',product_category_1), count( * ) AS sales_volume, sum( purchase ) AS sale
FROM blackfriday
GROUP BY product_category_1
ORDER BY count( * ) DESC;
LIMIT 10;
(PPT)
③ 不同城市的销售情况
(代码)
SELECT city_category, count( CASE WHEN gender = 'F' THEN 1 END ) AS f_buy,
count( CASE WHEN gender = 'M' THEN 1 END ) AS m_buy, count( * ) AS buy
FROM blackfriday
GROUP BY city_category
ORDER BY count( * ) DESC;
(PPT)
④ 男女分别的购买量
(代码)
SELECT gender, count( * ) AS sales_volume
FROM blackfriday
GROUP BY
gender;
(PPT)
⑤ 男性中的热销商品
(代码)
SELECT CONCAT('T',product_category_1), count( * ) AS sales_volume
FROM blackfriday
WHERE gender = 'M'
GROUP BY product_category_1
ORDER BY count( * ) DESC;
LIMIT 4;
(PPT)
⑥ 女性中的热销商品
(代码)
SELECT CONCAT('T',product_category_1), count( * ) AS sales_volume
FROM blackfriday
WHERE gender = 'F'
GROUP BY product_category_1
ORDER BY count( * ) DESC;
(PPT)
⑦ 不同职业的消费情况
(代码)
SELECT concat('J',occupation), sum( purchase ) AS buy
FROM blackfriday
GROUP BY occupation
ORDER BY sum( purchase ) DESC
LIMIT 10;
(PPT)
⑧ 不同年龄段的购买力
(代码)
SELECT age,
sum( CASE WHEN gender = 'M' THEN purchase END ) AS buy_m,
sum( CASE WHEN gender = 'F' THEN purchase END ) AS buy_f
FROM blackfriday
GROUP BY age
ORDER BY age;
(PPT)
⑨ 结论
(PPT)