本文是对Adventure Works 的一份月度报告,记录了整个项目需求分析与实现的过程,主要任务是使用python查询数据并计算,并且连接到PowerBI实现可视化。中间加入了个人的一些思考,最终将整个分析成果展示出来。
powerbi成果:201911月线上自行车业务分析报告 powerbi
PPT成果:2019年11月线上自行车业务分析报告 ppt
案例背景:
Adventure Works Cycles是Adventure Works样本数据库所虚构的公司,2019年12月业务组组长需要向领导汇报2019年11月自行车销售情况,为精细化运营提供数据支持,能精准的定位目标客户群体。
分析目标
1、如何制定销售策略,调整产品结构,才能保持高速增长,获取更多的收益,占领更多市场份额,是公司最关心的问题。
2、报告通过对整个公司的自行车销量持续监测和分析,掌握公司自行车销售状况、走势的变化,为客户制订、调整和检查销售策略,完善产品结构提供依据。
分析过程
分析主要从以下五方面入手
1、从整体的角度:分析2019.1—2019.11自行车整体销售表现
2、从地域的角度:分析11月每个区域销售量表现、11月TOP10城市销售量表现
3、从产品的角度:分析11月类别产品销售量表现、11月细分产品销售量表现
4、热销产品:分析11月TOP10产品销量榜、11月TOP10销量增速榜
5、从用户的角度:分析11月用户年龄分布及每个年龄段产品购买喜好、11月男女用户比例及产品购买喜好。
先导入可能需要使用的库:
#导入模块
import pandas as pd
import numpy as np
import pymysql
pymysql.install_as_MySQLdb()
from sqlalchemy import create_engine
1、自行车整体销售表现
连接数据库,导入数据集:
engine = create_engine('mysql+pymysql://****@***/***')
sql_cmd='select * from dw_customer_order'
gather_customer_order=pd.read_sql_query(sql=sql_cmd,con=engine)
然后观察数据:
gather_customer_order.head()
增加create_year_month月份字段以供月维度分析时使用。
#增加create_year_month月份字段。按月维度分析时使用
gather_customer_order['create_year‘_month']=gather_customer_order.create_date.apply(lambda x:x.strftime('%Y-%m'))
由于分析的是自行车的销量,所以筛选出自行车的数据
#筛选产品类别为自行车的数据
gather_customer_order=gather_customer_order[gather_customer_order['cplb_zw'] == '自行车']
1.2自行车整体销售量表现
按月度统计出2019每个月售产品销量,顾客数,销售额
#按月统计 月售产品销量,顾客数,销售额
overall_sales_performance = gather_customer_order.groupby('create_year_month').agg({'order_num':'sum','sum_amount':'sum'}).sort_values('create_year_month').reset_index()
overall_sales_performance.head()
计算出每个月的销量环比和销售额环比
#进行销售数量环比
order_num_diff = list((overall_sales_performance.order_num.diff()/overall_sales_performance.order_num)/-1)
order_num_diff.pop(0) #删除列表中第一个元素
order_num_diff.append(0) #将0新增到列表末尾
overall_sales_performance = pd.concat([overall_sales_performance,pd.DataFrame({'order_num_diff':order_num_diff})],axis = 1)
sum_amount_diff = list(-(overall_sales_performance.sum_amount.diff()/overall_sales_performance.sum_amount))
sum_amount_diff.pop(0)
sum_amount_diff.append(0)
overall_sales_performance = pd.concat([overall_sales_performance,pd.DataFrame({'sum_amount_diff':sum_amount_diff})],axis=1)
对获得数据整理排序
overall_sales_performance.sort_values('create_year_month',inplace=True)
overall_sales_performance = overall_sales_performance.reset_index().drop('index',1)
overall_sales_performance.head()
然后将数据导入数据库对应的文件夹或者到处excel,以便powerbi画图,本文是将数据导入数据库对应的文件中的方法。
engine=create_engine('mysql+pymysql://root:***@127.0.0.1:3306/adventure')
overall_sales_performance.to_sql('pt_overall_sale_performance',con=engine,index=False,if_exists='append')
然后将powerbi连接我们数据库获得数据进行可视化表格制作
从制作的图中,我们可以看到:2019年11月的销量和销售额都是最高的,11月销量比上一个的增长率为7.11%
二、自行车地区销售情况
gather_customer_order.head()
筛选出10月和11月的数据,以便对比各地区的增加率
#筛选10月11月自行车数据
gather_customer_order_10_11=gather_customer_order[(gather_customer_order['create_year_month']=='2019-10')|(gather_customer_order['create_year_month']=='2019-11')]
按照区域,月份对客户订单量求和,销售额求和
#按照区域,月份对客户订单量求和,销售额求和
gather_customer_order_10_11_group = gather_customer_order_10_11.groupby(['chinese_territory','create_year_month']).agg({'order_num':'sum','sum_amount':'sum'}).reset_index()
gather_customer_order_10_11_group.head()
求出各地区的环比
#求不同区域10月11月环比
order_x = pd.Series([])
amount_x = pd.Series([])
for i in region_list:
a = gather_customer_order_10_11_group[gather_customer_order_10_11_group['chinese_territory'] == i]['order_num'].pct_change()
b = gather_customer_order_10_11_group[gather_customer_order_10_11_group['chinese_territory'] == i]['sum_amount'].pct_change()
order_x = order_x.append(a)
amount_x = amount_x.append(b)
gather_customer_order_10_11_group['order_diff']=order_x
gather_customer_order_10_11_group['amount_diff']=amount_x
gather_customer_order_10_11_group=gather_customer_order_10_11_group.fillna(0)
将数据导入数据库,以便制图:
engine=create_engine('mysql+pymysql://root:***@127.0.0.1:3306/adventure')
gather_customer_order_10_11_group.to_sql('pt_bicy_november_territory',con=engine,index=False,if_exists='append')
从图中,我们可以看到:
1、华东的销售量远远领先其他的地域,销售了1854台单车
2、华南地区的订单增长最快,增加14.51%的订单量
2.2、2019年11月自行车销售量TOP10城市环
筛选出11月的自行车数据,找出销量top10城市
#筛选11月自行车交易数据
gather_customer_order_city_11 = gather_customer_order[gather_customer_order['create_year_month']=='2019-11']
gather_customer_order_city_11 = gather_customer_order_city_11.groupby('chinese_city').agg({'order_num':'sum'}).reset_index()
gather_customer_order_city_head=gather_customer_order_city_11.sort_values('order_num',ascending=False).head(10)
gather_customer_order_city_head
筛选出10月11月这些城市的自行车的销售数据,然后计算销量环比
#筛选销售10月11月自行车销售数据
gather_customer_order_10_11_head = gather_customer_order[(gather_customer_order['create_year_month']=='2019-10')|(gather_customer_order['create_year_month']=='2019-11')]
#筛选销售10月11月前十城市自行车销售数据
gather_customer_order_10_11_head=gather_customer_order_10_11_head.loc[gather_customer_order_10_11_head.chinese_city.apply(lambda x : x in gather_customer_order_city_head.chinese_city.tolist())]
#计算前十城市销量环比
gather_customer_order_city_10_11 = gather_customer_order_10_11_head.groupby(['chinese_city','create_year_month']).agg({'order_num':'sum','sum_amount':'sum'}).reset_index()
#计算前十城市环比
city_top_list = gather_customer_order_city_10_11.chinese_city.unique().tolist()
order_top_x = pd.Series([])
amount_top_x = pd.Series([])
for i in city_top_list:
a = gather_customer_order_city_10_11[gather_customer_order_city_10_11['chinese_city'] == i]['order_num'].pct_change()
b = gather_customer_order_city_10_11[gather_customer_order_city_10_11['chinese_city'] == i]['sum_amount'].pct_change()
order_top_x = order_top_x.append(a)
amount_top_x = amount_top_x.append(b)
gather_customer_order_city_10_11['order_diff']=order_top_x
gather_customer_order_city_10_11['amount_diff']=amount_top_x
gather_customer_order_city_10_11=gather_customer_order_city_10_11.fillna(0)
导入数据库
engine=create_engine('mysql+pymysql://root:***@127.0.0.1:3306/adventure')
gather_customer_order_city_10_11.to_sql('pt_bicy_november_october_city_3',con=engine,index=False,if_exists='append')
可见自行车在北京和上海的销量遥遥领先其他的城市,且北京和郑州的增长速度较快。
三、2019年11月自行车产品销售表现
求出每个月自行车的销量
#求每个月自行车累计销售数量
gather_customer_order_group_month = gather_customer_order.groupby('create_year_month').agg({'order_num':'sum'}).reset_index()
order_num_proportion = pd.merge(gather_customer_order,gather_customer_order_group_month,on='create_year_month')
#导入数据库
engine=create_engine('mysql+pymysql://root:***@127.0.0.1:3306/adventure')
order_num_proportion.to_sql('pt_bicycle_product_sales_month_4',con=engine,index=False,if_exists='append')
可见,公路自行车一直占过半的市场,销量最佳,二旅游自行车一直占最小的市场份额。
3.2、公路/山地/旅游自行车细分市场表现
先查出自行车的种类
#查看自行车有那些产品子类
gather_customer_order['cpzl_zw'].drop_duplicates()
公路自行车细分市场销量表现
#筛选出公路自行车
gather_customer_order_road = gather_customer_order[gather_customer_order['cpzl_zw'] == '公路自行车']
#每个月公路自行车累计销售数量
gather_customer_order_road_month =
gather_customer_order_road.groupby(by = ['create_year_month','product_name']).agg({'order_num':'sum'}).reset_index()
gather_customer_order_road_month = pd.merge(gather_customer_order_road_month,gather_customer_order_road_month_sum,on='create_year_month')
#将单车种类添加到表格中
gather_customer_order_road_month['cpzl_zw'] = '公路自行车'
山地自行车 细分市场销量表现
#筛选出山地自行车
gather_customer_order_road = gather_customer_order[gather_customer_order['cpzl_zw'] == '山地自行车']
#每个月山地自行车累计销售数量
gather_customer_order_mountain_month = gather_customer_order_mountain.groupby(['product_name','create_year_month']).agg({'order_num':'sum'}).reset_index()
gather_customer_order_mountain_month_sum = gather_customer_order_mountain.groupby('create_year_month').agg({'order_num':'sum'}).reset_index()
gather_customer_order_mountain_month = pd.merge(gather_customer_order_mountain_month,gather_customer_order_mountain_month_sum,on='create_year_month')
gather_customer_order_mountain_month['cpzl_zw'] = '山地自行车'
旅行自行车
#筛选出旅游自行车
gather_customer_order_road = gather_customer_order[gather_customer_order['cpzl_zw'] == '旅游自行车']
#每个月公路自行车累计销售数量
gather_customer_order_tour_month = gather_customer_order_tour.groupby(['product_name','create_year_month']).agg({'order_num':'sum'}).reset_index()
gather_customer_order_tour_month_sum = gather_customer_order_tour.groupby('create_year_month').agg({'order_num':'sum'}).reset_index()
gather_customer_order_tour_month=pd.merge(gather_customer_order_tour_month,gather_customer_order_tour_month_sum,on='create_year_month')
gather_customer_order_tour_month['cpzl_zw']='旅游自行车'
合并三表,得出各种类单车的销售详情
#将山地自行车、旅游自行车、公路自行车每月销量信息合并
gather_customer_order_month = pd.concat([gather_customer_order_road_month,gather_customer_order_mountain_month,gather_customer_order_tour_month],axis=0).reset_index(drop=True)
#各类自行车,销售量占每月自行车总销售量比率
gather_customer_order_month['order_num_proportion'] = gather_customer_order_month['order_num_x']/gather_customer_order_month['order_num_y']
#order_month_product当月产品累计销量
#sum_order_month当月自行车总销量
gather_customer_order_month.rename(columns={'order_num_x':'order_month_product','order_num_y':'sum_order_month'},inplace=True)
#导出到数据库
engine=create_engine('mysql+pymysql://root:***@127.0.0.1:3306/adventure')
gather_customer_order_month.to_sql('pt_bicycle_product_sales_order_month_4',con=engine,index=False,if_exists='replace')
计算2019年1月至11月产品累计销量
#计算2019年1月至11月自行车累计销量
gather_customer_order_month_1_11 = gather_customer_order_month[~(gather_customer_order_month['create_year_month']>'2019-11')]
gather_customer_order_month_1_11_sum = gather_customer_order_month_1_11.groupby(by = 'product_name').order_month_product.sum().reset_index()
#2019年11月自行车产品销量、环比、累计销量
#按相同字段product_name产品名,合并两张表
gather_customer_order_month_11 = pd.merge(gather_customer_order_month_11,gather_customer_order_month_1_11_sum,on='product_name')
#导出至数据库
engine=create_engine('mysql+pymysql://root:***@127.0.0.1:3306/adventure')
gather_customer_order_month_11.to_sql('pt_bicycle_product_sales_order_month_11',con=engine,index=False,if_exists='replace')
powerbi作图分析
- 11月公路自行车,除Road-350-W Yellow外,其他型号的自
行车环比都呈上升趋势 - Road-250 Red较10月增长14.19%,增速最快
- Road-150 Red销售占比最高,约为19.57%
- 11月山地自行车,除Mountain-200 Black外,其他型号的
自行车环比呈上升的趋势 - 型号Mountain-500 Silver增速最快,为19.51%
- 型号Mountain-200 Silver销售份额占比最大
- 11月旅游自行车,除型号Touring-2000 Blue、Touring-3000
Blue外,其他型号的自行车环呈上升趋势 - 型号Touring-1000 Yellow较10月增速最快,为27.18%
- 型号Touring-1000 Blue销售份额占比最大,为32.52%
四、2019年11月热品销售分析
找出TOP10销量产品
#筛选11月数据
gather_customer_order_11 = gather_customer_order[gather_customer_order['create_year_month'] == '2019-11']
#计算产品销售数量
#按照销量降序,取TOP10产品
customer_order_11_top10 = gather_customer_order_11.groupby(by = 'product_name').order_num.count().reset_index().\
sort_values(by = 'order_num',ascending = False).head(10)
#TOP10销量产品信息
list(customer_order_11_top10['product_name'])
计算TOP10销量及环比
由于前面已经精算出了11月环比,所以直接调取即可
customer_order_month_10_11 = gather_customer_order_month_10_11[['create_year_month','product_name','order_month_product','cpzl_zw','order_num_diff']]
customer_order_month_10_11 = customer_order_month_10_11[customer_order_month_10_11['product_name'].\
isin(list(customer_order_11_top10['product_name']))]
#增加标签方便区分
customer_order_month_10_11['category'] = '11月TOP10销量'
5.2、11月增速TOP10产品,销售数量及环比
由于前面已经精算出了11月环比,所以直接按环比降序获得增速top10城市
#增速TOP10产品,销售数量及环比
customer_order_month_11 = gather_customer_order_month_10_11.loc[gather_customer_order_month_10_11['create_year_month'] == '2019-11'].\
sort_values(by = 'order_num_diff',ascending = False).head(10)
customer_order_month_11_top10_seep = gather_customer_order_month_10_11.loc[gather_customer_order_month_10_11['product_name'].\
isin(list(customer_order_month_11['product_name']))]
customer_order_month_11_top10_seep = customer_order_month_11_top10_seep[['create_year_month','product_name','order_month_product','cpzl_zw','order_num_diff']]
customer_order_month_11_top10_seep['category'] = '11月TOP10增速'
#合并top10销量和top增速表
hot_products_11 = pd.concat([customer_order_month_10_11,customer_order_month_11_top10_seep],axis = 0)
导出到数据库
engine=create_engine('mysql+pymysql://root:***@127.0.0.1:3306/adventure')
hot_products_11.to_sql('pt_hot_products_november',con=engine,index=False,if_exists='replace')
powerbi作图分析
11月型号为Mountain-200 sliver销售量最多,为395辆,型号为Touring-1000 Yellow增速最快;较10月增长27.18%
五、用户行为分析
获取数据
#读取数据库客户信息表
engine = create_engine('mysql+pymysql://******:******@***:3306/***?charset=gbk')
datafrog=engine
df_CUSTOMER = pd.read_sql_query("select customer_key,birth_date,gender,marital_status from ods_customer where create_date < '2019-12-1'",con = datafrog)
#读取数据库销售订单表
engine = create_engine('mysql+pymysql://******:******@***:3306/***?charset=gbk')
datafrog=engine
df_sales_orders_11 = pd.read_sql_query("select * from ods_sales_orders where create_date>='2019-11-1' and create_date<'2019-12-1'",con = datafrog)
合并两表方便分析
#merge
sales_customer_order_11=pd.merge(df_sales_orders_11,df_CUSTOMER,on='customer_key',how='inner')
sales_customer_order_11 = sales_customer_order_11[sales_customer_order_11['cplb_zw']=='自行车']
4.1、用户年龄分析
计算出用户年龄,按年龄阶段划分顾客,得到各年龄段顾客占比,并分析各年龄段顾客的偏好
#分割birthday获得出生年份
sales_customer_order_11['birth_year'] = sales_customer_order_11.birth_date.str.split('-',expand=True)[0]
sales_customer_order_11['birth_year'] = sales_customer_order_11.birth_year.fillna(0).astype(int)
#计算出生年龄
import datetime
sales_customer_order_11['customer_age'] = datetime.datetime.now().year-sales_customer_order_11['birth_year']
#将年龄划分层次
import numpy as np
bin = np.arange(29,69,5)
sales_customer_order_11['age_level'] = pd.cut(sales_customer_order_11['customer_age'],bins = bin,labels=[u'30-34', u'35-39' , u'40-44' , u'45-49' ,u'50-54' , u'55-59' ,u'60-64'])
age_level_rate = sales_customer_order_11.groupby('age_level').agg({'age_level':'count'}).rename(columns={'age_level':'age_level_count'}).reset_index()
df_customer_order_bycle=pd.merge(sales_customer_order_11,age_level_rate,on='age_level')
# 计算年龄比率
df_customer_order_bycle['age_level_rate'] = df_customer_order_bycle.age_level_count/df_customer_order_bycle.age_level.count()
4.2、用户性别
通过整理数据,得到男女顾客的数量,占比
gender_count = df_customer_order_bycle.groupby(by = 'gender').cplb_zw.count().reset_index()
df_customer_order_bycle = pd.merge(df_customer_order_bycle,gender_count,on = 'gender').rename(columns = {'cplb_zw_y':'gender_count'})
df_customer_order_bycle['gender_rate'] = 1/df_customer_order_bycle['gender_count']
df_customer_order_bycle['age_level2_rate2'] = 1/df_customer_order_bycle['age_level_count']
df_customer_order_bycle=df_customer_order_bycle.drop('age_level2_count',axis=1)
导出到数据库中
engine=create_engine('mysql+pymysql://root:***@127.0.0.1:3306/adventure')
df_customer_order_bycle.to_sql('pt_user_behavior_november',con=engine,index=False,if_exists='replace')
powerbi作图分析
- 根据年龄断划分,年龄35-39岁消费人数占比最 高,为
29.96%;之后随着年龄的增长,占比逐渐下降。 -
对应各年龄段顾客,公路自行车都是最畅销的
- 男性购买自行车人数比女性多10%
- 针对性别和细分市场的关联分析,男性和女性购买公路自
行车占比最高,购买旅游自行车占比最少