Adventure 2019年11月线上自行车业务分析报告

本文是对Adventure Works 的一份月度报告,记录了整个项目需求分析与实现的过程,主要任务是使用python查询数据并计算,并且连接到PowerBI实现可视化。中间加入了个人的一些思考,最终将整个分析成果展示出来。
powerbi成果:201911月线上自行车业务分析报告 powerbi
PPT成果:2019年11月线上自行车业务分析报告 ppt

案例背景:

Adventure Works Cycles是Adventure Works样本数据库所虚构的公司,2019年12月业务组组长需要向领导汇报2019年11月自行车销售情况,为精细化运营提供数据支持,能精准的定位目标客户群体。

分析目标

1、如何制定销售策略,调整产品结构,才能保持高速增长,获取更多的收益,占领更多市场份额,是公司最关心的问题。
2、报告通过对整个公司的自行车销量持续监测和分析,掌握公司自行车销售状况、走势的变化,为客户制订、调整和检查销售策略,完善产品结构提供依据。

分析过程

分析主要从以下五方面入手
1、从整体的角度:分析2019.1—2019.11自行车整体销售表现
2、从地域的角度:分析11月每个区域销售量表现、11月TOP10城市销售量表现
3、从产品的角度:分析11月类别产品销售量表现、11月细分产品销售量表现
4、热销产品:分析11月TOP10产品销量榜、11月TOP10销量增速榜
5、从用户的角度:分析11月用户年龄分布及每个年龄段产品购买喜好、11月男女用户比例及产品购买喜好。

先导入可能需要使用的库:

#导入模块
import pandas as pd
import numpy as np
import pymysql
pymysql.install_as_MySQLdb()
from sqlalchemy import create_engine

1、自行车整体销售表现

连接数据库,导入数据集:

engine = create_engine('mysql+pymysql://****@***/***')
sql_cmd='select * from dw_customer_order'
gather_customer_order=pd.read_sql_query(sql=sql_cmd,con=engine)

然后观察数据:

gather_customer_order.head()
gather_customer_order

增加create_year_month月份字段以供月维度分析时使用。

#增加create_year_month月份字段。按月维度分析时使用
gather_customer_order['create_year‘_month']=gather_customer_order.create_date.apply(lambda x:x.strftime('%Y-%m'))

由于分析的是自行车的销量,所以筛选出自行车的数据

#筛选产品类别为自行车的数据
gather_customer_order=gather_customer_order[gather_customer_order['cplb_zw'] == '自行车']
gather_customer_order

1.2自行车整体销售量表现

按月度统计出2019每个月售产品销量,顾客数,销售额

#按月统计 月售产品销量,顾客数,销售额
overall_sales_performance = gather_customer_order.groupby('create_year_month').agg({'order_num':'sum','sum_amount':'sum'}).sort_values('create_year_month').reset_index()
overall_sales_performance.head()
overall_sales_performance

计算出每个月的销量环比和销售额环比

#进行销售数量环比
order_num_diff = list((overall_sales_performance.order_num.diff()/overall_sales_performance.order_num)/-1)
order_num_diff.pop(0) #删除列表中第一个元素
order_num_diff.append(0) #将0新增到列表末尾
overall_sales_performance = pd.concat([overall_sales_performance,pd.DataFrame({'order_num_diff':order_num_diff})],axis = 1)
sum_amount_diff = list(-(overall_sales_performance.sum_amount.diff()/overall_sales_performance.sum_amount))
sum_amount_diff.pop(0)
sum_amount_diff.append(0)
overall_sales_performance = pd.concat([overall_sales_performance,pd.DataFrame({'sum_amount_diff':sum_amount_diff})],axis=1)

对获得数据整理排序

overall_sales_performance.sort_values('create_year_month',inplace=True)
overall_sales_performance = overall_sales_performance.reset_index().drop('index',1)
overall_sales_performance.head()
overall_sales_performance

然后将数据导入数据库对应的文件夹或者到处excel,以便powerbi画图,本文是将数据导入数据库对应的文件中的方法。

engine=create_engine('mysql+pymysql://root:***@127.0.0.1:3306/adventure')
overall_sales_performance.to_sql('pt_overall_sale_performance',con=engine,index=False,if_exists='append')

然后将powerbi连接我们数据库获得数据进行可视化表格制作


自行车整体销售量表现

从制作的图中,我们可以看到:2019年11月的销量和销售额都是最高的,11月销量比上一个的增长率为7.11%

二、自行车地区销售情况

gather_customer_order.head()
gather_customer_order

筛选出10月和11月的数据,以便对比各地区的增加率

#筛选10月11月自行车数据
gather_customer_order_10_11=gather_customer_order[(gather_customer_order['create_year_month']=='2019-10')|(gather_customer_order['create_year_month']=='2019-11')]

按照区域,月份对客户订单量求和,销售额求和

#按照区域,月份对客户订单量求和,销售额求和
gather_customer_order_10_11_group = gather_customer_order_10_11.groupby(['chinese_territory','create_year_month']).agg({'order_num':'sum','sum_amount':'sum'}).reset_index()
gather_customer_order_10_11_group.head()

求出各地区的环比

#求不同区域10月11月环比
order_x = pd.Series([])
amount_x = pd.Series([])
for i in region_list:
    a = gather_customer_order_10_11_group[gather_customer_order_10_11_group['chinese_territory'] == i]['order_num'].pct_change()
    b = gather_customer_order_10_11_group[gather_customer_order_10_11_group['chinese_territory'] == i]['sum_amount'].pct_change()
    order_x = order_x.append(a)
    amount_x = amount_x.append(b)
gather_customer_order_10_11_group['order_diff']=order_x
gather_customer_order_10_11_group['amount_diff']=amount_x
gather_customer_order_10_11_group=gather_customer_order_10_11_group.fillna(0)
gather_customer_order_10_11_group

将数据导入数据库,以便制图:

engine=create_engine('mysql+pymysql://root:***@127.0.0.1:3306/adventure')
gather_customer_order_10_11_group.to_sql('pt_bicy_november_territory',con=engine,index=False,if_exists='append')
自行车地区销售情况

从图中,我们可以看到:
1、华东的销售量远远领先其他的地域,销售了1854台单车
2、华南地区的订单增长最快,增加14.51%的订单量

2.2、2019年11月自行车销售量TOP10城市环

筛选出11月的自行车数据,找出销量top10城市

#筛选11月自行车交易数据
gather_customer_order_city_11 = gather_customer_order[gather_customer_order['create_year_month']=='2019-11']
gather_customer_order_city_11 = gather_customer_order_city_11.groupby('chinese_city').agg({'order_num':'sum'}).reset_index()
gather_customer_order_city_head=gather_customer_order_city_11.sort_values('order_num',ascending=False).head(10)
gather_customer_order_city_head
gather_customer_order_city_head

筛选出10月11月这些城市的自行车的销售数据,然后计算销量环比

#筛选销售10月11月自行车销售数据
gather_customer_order_10_11_head = gather_customer_order[(gather_customer_order['create_year_month']=='2019-10')|(gather_customer_order['create_year_month']=='2019-11')]
#筛选销售10月11月前十城市自行车销售数据
gather_customer_order_10_11_head=gather_customer_order_10_11_head.loc[gather_customer_order_10_11_head.chinese_city.apply(lambda x : x in gather_customer_order_city_head.chinese_city.tolist())]
#计算前十城市销量环比
gather_customer_order_city_10_11 = gather_customer_order_10_11_head.groupby(['chinese_city','create_year_month']).agg({'order_num':'sum','sum_amount':'sum'}).reset_index()
#计算前十城市环比
city_top_list = gather_customer_order_city_10_11.chinese_city.unique().tolist()
order_top_x = pd.Series([])
amount_top_x = pd.Series([])
for i in city_top_list:
    a = gather_customer_order_city_10_11[gather_customer_order_city_10_11['chinese_city'] == i]['order_num'].pct_change()
    b = gather_customer_order_city_10_11[gather_customer_order_city_10_11['chinese_city'] == i]['sum_amount'].pct_change()
    order_top_x = order_top_x.append(a)
    amount_top_x = amount_top_x.append(b)
gather_customer_order_city_10_11['order_diff']=order_top_x
gather_customer_order_city_10_11['amount_diff']=amount_top_x
gather_customer_order_city_10_11=gather_customer_order_city_10_11.fillna(0)
gather_customer_order_city_10_11
导入数据库
engine=create_engine('mysql+pymysql://root:***@127.0.0.1:3306/adventure')
gather_customer_order_city_10_11.to_sql('pt_bicy_november_october_city_3',con=engine,index=False,if_exists='append')
前十城市自行车11月销售情况

可见自行车在北京和上海的销量遥遥领先其他的城市,且北京和郑州的增长速度较快。

三、2019年11月自行车产品销售表现

求出每个月自行车的销量

#求每个月自行车累计销售数量
gather_customer_order_group_month = gather_customer_order.groupby('create_year_month').agg({'order_num':'sum'}).reset_index()
order_num_proportion = pd.merge(gather_customer_order,gather_customer_order_group_month,on='create_year_month')
#导入数据库
engine=create_engine('mysql+pymysql://root:***@127.0.0.1:3306/adventure')
order_num_proportion.to_sql('pt_bicycle_product_sales_month_4',con=engine,index=False,if_exists='append')
细分市场表现

可见,公路自行车一直占过半的市场,销量最佳,二旅游自行车一直占最小的市场份额。

3.2、公路/山地/旅游自行车细分市场表现

先查出自行车的种类

#查看自行车有那些产品子类
gather_customer_order['cpzl_zw'].drop_duplicates()
公路自行车细分市场销量表现
#筛选出公路自行车
gather_customer_order_road = gather_customer_order[gather_customer_order['cpzl_zw'] == '公路自行车']
#每个月公路自行车累计销售数量
gather_customer_order_road_month = 
gather_customer_order_road.groupby(by = ['create_year_month','product_name']).agg({'order_num':'sum'}).reset_index()
gather_customer_order_road_month = pd.merge(gather_customer_order_road_month,gather_customer_order_road_month_sum,on='create_year_month')
gather_customer_order_road_month
#将单车种类添加到表格中
gather_customer_order_road_month['cpzl_zw'] = '公路自行车'
山地自行车 细分市场销量表现
#筛选出山地自行车
gather_customer_order_road = gather_customer_order[gather_customer_order['cpzl_zw'] == '山地自行车']
#每个月山地自行车累计销售数量
gather_customer_order_mountain_month = gather_customer_order_mountain.groupby(['product_name','create_year_month']).agg({'order_num':'sum'}).reset_index()
gather_customer_order_mountain_month_sum = gather_customer_order_mountain.groupby('create_year_month').agg({'order_num':'sum'}).reset_index()
gather_customer_order_mountain_month = pd.merge(gather_customer_order_mountain_month,gather_customer_order_mountain_month_sum,on='create_year_month')
gather_customer_order_mountain_month['cpzl_zw'] = '山地自行车'
gather_customer_order_mountain_month
旅行自行车
#筛选出旅游自行车
gather_customer_order_road = gather_customer_order[gather_customer_order['cpzl_zw'] == '旅游自行车']
#每个月公路自行车累计销售数量
gather_customer_order_tour_month = gather_customer_order_tour.groupby(['product_name','create_year_month']).agg({'order_num':'sum'}).reset_index()
gather_customer_order_tour_month_sum = gather_customer_order_tour.groupby('create_year_month').agg({'order_num':'sum'}).reset_index()
gather_customer_order_tour_month=pd.merge(gather_customer_order_tour_month,gather_customer_order_tour_month_sum,on='create_year_month')
gather_customer_order_tour_month['cpzl_zw']='旅游自行车'
gather_customer_order_tour_month

合并三表,得出各种类单车的销售详情

#将山地自行车、旅游自行车、公路自行车每月销量信息合并
gather_customer_order_month = pd.concat([gather_customer_order_road_month,gather_customer_order_mountain_month,gather_customer_order_tour_month],axis=0).reset_index(drop=True)
#各类自行车,销售量占每月自行车总销售量比率
gather_customer_order_month['order_num_proportion'] = gather_customer_order_month['order_num_x']/gather_customer_order_month['order_num_y']
#order_month_product当月产品累计销量
#sum_order_month当月自行车总销量
gather_customer_order_month.rename(columns={'order_num_x':'order_month_product','order_num_y':'sum_order_month'},inplace=True)
#导出到数据库
engine=create_engine('mysql+pymysql://root:***@127.0.0.1:3306/adventure')
gather_customer_order_month.to_sql('pt_bicycle_product_sales_order_month_4',con=engine,index=False,if_exists='replace')
计算2019年1月至11月产品累计销量
#计算2019年1月至11月自行车累计销量
gather_customer_order_month_1_11 = gather_customer_order_month[~(gather_customer_order_month['create_year_month']>'2019-11')]
gather_customer_order_month_1_11_sum = gather_customer_order_month_1_11.groupby(by = 'product_name').order_month_product.sum().reset_index()
#2019年11月自行车产品销量、环比、累计销量
#按相同字段product_name产品名,合并两张表
gather_customer_order_month_11 = pd.merge(gather_customer_order_month_11,gather_customer_order_month_1_11_sum,on='product_name')
#导出至数据库
engine=create_engine('mysql+pymysql://root:***@127.0.0.1:3306/adventure')
gather_customer_order_month_11.to_sql('pt_bicycle_product_sales_order_month_11',con=engine,index=False,if_exists='replace')
gather_customer_order_month_11
gather_customer_order_month

powerbi作图分析

细分市场表现:公路自行车

公路自行车
  1. 11月公路自行车,除Road-350-W Yellow外,其他型号的自
    行车环比都呈上升趋势
  2. Road-250 Red较10月增长14.19%,增速最快
  3. Road-150 Red销售占比最高,约为19.57%
细分市场表现:山地自行车

山地自行车
  1. 11月山地自行车,除Mountain-200 Black外,其他型号的
    自行车环比呈上升的趋势
  2. 型号Mountain-500 Silver增速最快,为19.51%
  3. 型号Mountain-200 Silver销售份额占比最大
细分市场表现:旅游自行车

旅游自行车
  1. 11月旅游自行车,除型号Touring-2000 Blue、Touring-3000
    Blue外,其他型号的自行车环呈上升趋势
  2. 型号Touring-1000 Yellow较10月增速最快,为27.18%
  3. 型号Touring-1000 Blue销售份额占比最大,为32.52%

四、2019年11月热品销售分析

找出TOP10销量产品

#筛选11月数据
gather_customer_order_11 = gather_customer_order[gather_customer_order['create_year_month'] == '2019-11']
#计算产品销售数量
#按照销量降序,取TOP10产品
customer_order_11_top10 = gather_customer_order_11.groupby(by = 'product_name').order_num.count().reset_index().\
                        sort_values(by = 'order_num',ascending = False).head(10)
#TOP10销量产品信息
list(customer_order_11_top10['product_name'])
计算TOP10销量及环比

由于前面已经精算出了11月环比,所以直接调取即可

customer_order_month_10_11 = gather_customer_order_month_10_11[['create_year_month','product_name','order_month_product','cpzl_zw','order_num_diff']]
customer_order_month_10_11 = customer_order_month_10_11[customer_order_month_10_11['product_name'].\
                                                        isin(list(customer_order_11_top10['product_name']))]
#增加标签方便区分
customer_order_month_10_11['category'] = '11月TOP10销量'
5.2、11月增速TOP10产品,销售数量及环比

由于前面已经精算出了11月环比,所以直接按环比降序获得增速top10城市

#增速TOP10产品,销售数量及环比
customer_order_month_11 = gather_customer_order_month_10_11.loc[gather_customer_order_month_10_11['create_year_month'] == '2019-11'].\
                            sort_values(by = 'order_num_diff',ascending = False).head(10)
customer_order_month_11_top10_seep = gather_customer_order_month_10_11.loc[gather_customer_order_month_10_11['product_name'].\
                                                        isin(list(customer_order_month_11['product_name']))]
customer_order_month_11_top10_seep = customer_order_month_11_top10_seep[['create_year_month','product_name','order_month_product','cpzl_zw','order_num_diff']]
customer_order_month_11_top10_seep['category'] = '11月TOP10增速'
#合并top10销量和top增速表
hot_products_11 = pd.concat([customer_order_month_10_11,customer_order_month_11_top10_seep],axis = 0)
导出到数据库
engine=create_engine('mysql+pymysql://root:***@127.0.0.1:3306/adventure')
hot_products_11.to_sql('pt_hot_products_november',con=engine,index=False,if_exists='replace')

powerbi作图分析


11月热销产品分析

11月型号为Mountain-200 sliver销售量最多,为395辆,型号为Touring-1000 Yellow增速最快;较10月增长27.18%

五、用户行为分析

获取数据

#读取数据库客户信息表
engine = create_engine('mysql+pymysql://******:******@***:3306/***?charset=gbk')
datafrog=engine
df_CUSTOMER = pd.read_sql_query("select customer_key,birth_date,gender,marital_status from ods_customer where create_date < '2019-12-1'",con = datafrog)
#读取数据库销售订单表
engine = create_engine('mysql+pymysql://******:******@***:3306/***?charset=gbk')
datafrog=engine
df_sales_orders_11 = pd.read_sql_query("select *  from ods_sales_orders where create_date>='2019-11-1' and   create_date<'2019-12-1'",con = datafrog)

合并两表方便分析

#merge
sales_customer_order_11=pd.merge(df_sales_orders_11,df_CUSTOMER,on='customer_key',how='inner')
sales_customer_order_11 = sales_customer_order_11[sales_customer_order_11['cplb_zw']=='自行车']
sales_customer_order_11

4.1、用户年龄分析

计算出用户年龄,按年龄阶段划分顾客,得到各年龄段顾客占比,并分析各年龄段顾客的偏好

#分割birthday获得出生年份
sales_customer_order_11['birth_year'] = sales_customer_order_11.birth_date.str.split('-',expand=True)[0]
sales_customer_order_11['birth_year'] = sales_customer_order_11.birth_year.fillna(0).astype(int)
#计算出生年龄
import datetime
sales_customer_order_11['customer_age'] = datetime.datetime.now().year-sales_customer_order_11['birth_year']
#将年龄划分层次
import numpy as np
bin = np.arange(29,69,5)
sales_customer_order_11['age_level'] = pd.cut(sales_customer_order_11['customer_age'],bins = bin,labels=[u'30-34', u'35-39' , u'40-44' , u'45-49' ,u'50-54' , u'55-59' ,u'60-64'])
age_level_rate = sales_customer_order_11.groupby('age_level').agg({'age_level':'count'}).rename(columns={'age_level':'age_level_count'}).reset_index()
df_customer_order_bycle=pd.merge(sales_customer_order_11,age_level_rate,on='age_level')
# 计算年龄比率
df_customer_order_bycle['age_level_rate'] = df_customer_order_bycle.age_level_count/df_customer_order_bycle.age_level.count()

4.2、用户性别
通过整理数据,得到男女顾客的数量,占比

gender_count = df_customer_order_bycle.groupby(by = 'gender').cplb_zw.count().reset_index()
df_customer_order_bycle = pd.merge(df_customer_order_bycle,gender_count,on = 'gender').rename(columns = {'cplb_zw_y':'gender_count'})
df_customer_order_bycle['gender_rate'] = 1/df_customer_order_bycle['gender_count']
df_customer_order_bycle['age_level2_rate2'] = 1/df_customer_order_bycle['age_level_count']
df_customer_order_bycle=df_customer_order_bycle.drop('age_level2_count',axis=1)
导出到数据库中
engine=create_engine('mysql+pymysql://root:***@127.0.0.1:3306/adventure')
df_customer_order_bycle.to_sql('pt_user_behavior_november',con=engine,index=False,if_exists='replace')
powerbi作图分析
用户行为-年龄
  1. 根据年龄断划分,年龄35-39岁消费人数占比最 高,为
    29.96%;之后随着年龄的增长,占比逐渐下降。
  2. 对应各年龄段顾客,公路自行车都是最畅销的


    用户行为-性别
  3. 男性购买自行车人数比女性多10%
  4. 针对性别和细分市场的关联分析,男性和女性购买公路自
    行车占比最高,购买旅游自行车占比最少

以下为分析报告

幻灯片1.PNG
幻灯片2.PNG
幻灯片3.PNG
幻灯片4.PNG
幻灯片5.PNG
幻灯片6.PNG
幻灯片7.PNG
幻灯片8.PNG
幻灯片9.PNG
幻灯片10.PNG
幻灯片11.PNG
幻灯片12.PNG
幻灯片13.PNG
幻灯片14.PNG
幻灯片15.PNG
幻灯片16.PNG
幻灯片17.PNG
幻灯片18.PNG
幻灯片19.PNG
幻灯片20.PNG
幻灯片21.PNG
幻灯片22.PNG

你可能感兴趣的:(Adventure 2019年11月线上自行车业务分析报告)