2019年11月自行车业务销售分析_个人笔记

2019年11月自行车业务销售分析报告

分析思路

  1. 自行车整体销售表现
  2. 2019年11月自行车地域销售表现
  3. 2019年11月自行车产品销售表现
  4. 用户行为分析
  5. 2019年11月热品销售分析

计算结果存入数据库,对应表名:

  • 自行车整体销售表现:pt_overall_performance_1
  • 2019年11月自行车地域销售表现:pt_bicy_november_territory_2、pt_bicy_novenber_october_city_3
  • 2019年11月自行车产品销售表现:pt_bicycle_product_sales_month_4、pt_bicycle_sales_order_month_4、pt_bicycle_product_sales_order_month_11
  • 用户行为分析:pt_user_behavior_november
  • 2019年11月热品销售分析:pt_hot_products_november

导入模块

import pandas as pd
import numpy as np
import pymysql
# pymysql.install_as_MySQLdb()  # 为了兼容mysqldb;若create_engine()里mysql→mysql+pymysql,则不需要MySQLdb模块。
from sqlalchemy import create_engine 
from datetime import datetime
import matplotlib.pyplot as plt
%matplotlib inline

1. 自行车整体销售表现

engine=create_engine('mysql+pymysql://用户名:密码@ip地址:端口号/数据库名称')
sql='select * from dw_customer_order'
df=pd.read_sql_query(sql,engine)

gather_customer_order=df.copy()  # 复制一份DataFrame,df原始数据在后面可以赋予其他变量。

gather_customer_order.head()
image-20200527105033008.png

查看数据类型和是否有空值

gather_customer_order.info()
image-20200527135548635.png

数据比较纯净,没有空值,所以不用做数据清洗。

为按月维度分析,增加create_year_month字段

gather_customer_order['create_year_month']=gather_customer_order['create_date'].apply(lambda x :x.strftime('%Y-%m'))  # str.strftime();网上基本找到的都是time.strftime(),这里需要着重加深印象。

筛选产品类别为自行车的订单

gather_customer_order=gather_customer_order[gather_customer_order['cplb_zw']=='自行车']

gather_customer_order.head()
image-20200527135631644.png

1.1 自行车整体销售表现(订单量,金额)

overall_sales_performance=gather_customer_order.groupby('create_year_month')[['order_num','sum_amount']].sum().reset_index().sort_values('create_year_month',ascending=False)

pd.set_option('display.float_format', lambda x: '%.3f' % x)  # 科学计数法显示小数点后3位

overall_sales_performance=overall_sales_performance[overall_sales_performance['create_year_month']<'2019-12']

overall_sales_performance
image-20200527142530219.png
x=overall_sales_performance['create_year_month']
y1=overall_sales_performance['order_num']
y2=overall_sales_performance['sum_amount']

fig=plt.figure(figsize=(10,10))
ax1=fig.add_subplot(211)
plt.plot(x,y1)
ax1.set_xlabel('月份')
ax1.set_ylabel('订单量')
ax1.set_title('月份-订单')

ax2=fig.add_subplot(212)
plt.plot(x,y2)
ax2.set_xlabel('月份')
ax2.set_ylabel('金额')
ax2.set_title('月份-金额')
image-20200527165253543.png

从月份-金额和月份-订单量折线图可以看出:

  • 2019年自行车每月的订单量和金额波动较大,销售表现不稳定,这并不是一件好事情。
  • 除6-8月月份-订单和月份-金额折线图整体趋势基本相同,除6月-8月客单价相对稳定;
  • 2019年2月、4月和9月销售状况最差,下跌严重,需分析这几个月前后是否有促销活动;2月是否是由于过年导致销售量低等原因。
  • 2019年11月销售表现为整年最好,需分析是否是年底各销售人员冲业绩、月前后是否有促销、是否在2019年9月有新的产品销售等原因。

1.2 订单量、销售金额环比

# 订单量环比
overall_sales_performance['order_diff']=overall_sales_performance['order_num'].pct_change(-1)
# 销售金额环比
overall_sales_performance['sum_amount_diff']=overall_sales_performance['sum_amount'].pct_change(-1)
# 排序
overall_sales_performance.sort_values('create_year_month',inplace=True)
# 填充空值
overall_sales_performance.fillna(0,inplace=True)

overall_sales_performance
image-20200527193332985.png
x=overall_sales_performance['create_year_month']
y1=overall_sales_performance['order_diff']
y2=overall_sales_performance['sum_amount_diff']

fig=plt.figure(figsize=(10,10))
fig.add_subplot(211)
plt.plot(x,y1)
plt.xlabel('月份')
plt.ylabel('订单环比')
plt.title('2019年每月订单环比')


fig.add_subplot(212)
plt.plot(x,y2)
plt.xlabel('月份')
plt.ylabel('销售金额环比')
plt.title('2019年每月销售金额环比')
image-20200527193850458.png

趋势同订单量、销售金额一致。

2. 2019年11月自行车地域销售表现

筛选10-11月数据

gather_customer_order_10_11=gather_customer_order[(gather_customer_order['create_year_month']=='2019-10')|(gather_customer_order['create_year_month']=='2019-11')]
gather_customer_order_10_11.head()
image-20200527194609900.png

按区域每月分组

gather_customer_order_10_11_group=gather_customer_order_10_11.groupby(['chinese_territory','create_year_month'])['order_num','sum_amount'].sum().reset_index()
gather_customer_order_10_11_group.head()
image-20200527194839570.png

2.1 计算区域每月订单量和销售额环比

# 获取区域
region_list=list(gather_customer_order_10_11_group['chinese_territory'].unique())
# 创建新series,用于存储不同区域的订单量、销售金额环比
order_x=pd.Series([])
amount_x=pd.Series([])
# 遍历区域
for i in region_list:
    a=gather_customer_order_10_11_group[gather_customer_order_10_11_group['chinese_territory']==i]['order_num'].pct_change(1)
    b=gather_customer_order_10_11_group[gather_customer_order_10_11_group['chinese_territory']==i]['sum_amount'].pct_change(1)
    order_x=order_x.append(a)
    amount_x=amount_x.append(b)
# 创建新列
gather_customer_order_10_11_group['order_diff']=order_x
gather_customer_order_10_11_group['amount_diff']=amount_x
# 填充空值
gather_customer_order_10_11_group.fillna(0,inplace=True)

gather_customer_order_10_11_group.head()
image-20200527195820431.png
x=gather_customer_order_10_11_group[gather_customer_order_10_11_group['create_year_month']=='2019-11']['chinese_territory']
y1=gather_customer_order_10_11_group[gather_customer_order_10_11_group['create_year_month']=='2019-11']['order_diff']
y2=gather_customer_order_10_11_group[gather_customer_order_10_11_group['create_year_month']=='2019-11']['amount_diff']

fig=plt.figure(figsize=(10,10))
fig.add_subplot(211)
plt.plot(x,y1)
plt.xlabel('区域')
plt.ylabel('订单环比')
plt.title('2019年11月区域订单环比')


fig.add_subplot(212)
plt.plot(x,y2,color='green')
plt.xlabel('区域')
plt.ylabel('销售金额环比')
plt.title('2019年11月区域销售金额环比')
image-20200527202238346.png

在11月中,西北和西南地区订单量和销售金额增长率过低;台港澳地区订单量和销售金额环比为负,且基数较小。

2.2 2019年11月自行车销量Top10城市

# 筛选11月自行车交易数据
gather_customer_order_11=gather_customer_order_10_11[gather_customer_order_10_11['create_year_month']=='2019-11']

# 按城市分组求总订单量
gather_customer_order_city_11=gather_customer_order_11.groupby('chinese_city')[['order_num']].sum().sort_values('order_num',ascending=False).reset_index()

# 11月自行车销售数量前10城市
gather_customer_order_city_head=gather_customer_order_city_11.head(10)
gather_customer_order_city_head
image-20200527202814044.png

可以看出11月自行车销量前10的城市基本分布在超一线城市例如北上广或省会城市中。其中可以观察下常德市的具体原因,是否是公司政策、政府政策或竞争相对较低等原因具体分析。

2.3 11月销量top10城市在10-11月的订单量、销售金额环比

# 筛选销售前10城市,10月、11月自行车销售数据
gather_customer_order_10_11_head=gather_customer_order_10_11[gather_customer_order_10_11['chinese_city'].isin(gather_customer_order_city_head['chinese_city'])]

# 分组计算10-11月前10城市,自行车销售数量和销售金额
gather_customer_order_city_10_11=gather_customer_order_10_11_head.groupby(['chinese_city','create_year_month'])[['order_num','sum_amount']].sum().reset_index()

# 计算前10城市10、11月销售金额与销售量环比
city_top_list=gather_customer_order_city_10_11['chinese_city'].unique()
top_x=pd.DataFrame()
for i in city_top_list:
    a=gather_customer_order_city_10_11[gather_customer_order_city_10_11['chinese_city']==i][['order_num','sum_amount']].pct_change(1)
    top_x=top_x.append(a)
top_x=top_x.fillna(0)

# order_diff销售数量环比,amount_diff销售金额环比
gather_customer_order_city_10_11=pd.concat([gather_customer_order_city_10_11,top_x],axis=1)  # axis=1为列

gather_customer_order_city_10_11
image-20200527203710075.png

北京、南京、常德、郑州、长沙的11月销售金额增长率较高,成都、武汉11月销售金额增长率为负。

3. 2019年11月自行车产品销售表现

# 求每个月自行车销售数量
gather_customer_order_group_month=gather_customer_order.groupby('create_year_month')['order_num'].sum().reset_index()
# 合并自行车销售信息表+自行车每月销售数量表
order_num_proportion=pd.merge(left=gather_customer_order,right=gather_customer_order_group_month,on='create_year_month')
# 计算自行车销量/自行车每月销量,形成新列order_proportion
order_num_proportion['order_proportion']=order_num_proportion['order_num_x']/order_num_proportion['order_num_y']
# 重命名自行车每月销量为sum_month_order
order_num_proportion.rename(columns={'order_num_y':'sum_month_order'},inplace=True)

order_num_proportion.head()
image-20200527205032219.png

3.1 公路/山地/旅游自行车细分市场表现

3.1.1 公路自行车

# 筛选公路自行车字段
gather_customer_order_road=gather_customer_order[gather_customer_order['cpzl_zw']=='公路自行车']
# 求公路自行车不同型号产品销售数量
gather_customer_order_road_month=gather_customer_order_road.groupby(['create_year_month','product_name'])['order_num'].sum().reset_index()
# 添加一列公路自行车字段
gather_customer_order_road_month['cpzl_zw']='公路自行车'

gather_customer_order_road_month
image-20200527205705488.png

每月公路自行车销售数量

# 每个月公路自行车销售数量
gather_customer_order_road_month_sum=gather_customer_order_road_month.groupby('create_year_month').sum().reset_index()

gather_customer_order_road_month_sum.head()
image-20200527205948348.png
# 合并公路自行车gather_customer_order_road_month与每月公路自行车销售数量
# 用于计算不同型号产品的占比
gather_customer_order_road_month=pd.merge(gather_customer_order_road_month,gather_customer_order_road_month_sum,on='create_year_month')

gather_customer_order_road_month.head()
image-20200527210106520.png

3.1.2 山地自行车

# 筛选山地自行车
gather_customer_order_Moutain=gather_customer_order[gather_customer_order['cpzl_zw']=='山地自行车']
# 求山地自行车不同产品销售数量
gather_customer_order_Moutain_month=gather_customer_order_Moutain.groupby(['create_year_month','product_name'])['order_num'].sum().reset_index()
# 新增一列为山地自行车
gather_customer_order_Moutain_month['cpzl_zw']='山地自行车'
# 每个月公路自行车销售数量
gather_customer_order_Moutain_month_sum=gather_customer_order_Moutain_month.groupby('create_year_month').sum().reset_index()
# 合并山地自行车gather_customer_order_Moutain_month与每月公路自行车销售数量
gather_customer_order_Moutain_month=pd.merge(gather_customer_order_Moutain_month,gather_customer_order_Moutain_month_sum,on='create_year_month')

gather_customer_order_Moutain_month.head()
image-20200527210624155.png

3.1.3 旅游自行车

# 筛选旅游自行车
gather_customer_order_tour=gather_customer_order[gather_customer_order['cpzl_zw']=='旅游自行车']
# 旅游自行车每月不同产品销售数量
gather_customer_order_tour_month=gather_customer_order_tour.groupby(['create_year_month','product_name'])['order_num'].sum().reset_index()
# 新增一列旅游自行车
gather_customer_order_tour_month['cpzl_zw']='旅游自行车'
# 每月旅游自行车销售数量
gather_customer_order_tour_month_sum=gather_customer_order_tour_month.groupby('create_year_month').sum().reset_index()
# 合并旅游自行车gather_customer_order_tour_month与每月旅游自行车销售数量
gather_customer_order_tour_month=pd.merge(gather_customer_order_tour_month,gather_customer_order_tour_month_sum,on='create_year_month')

gather_customer_order_tour_month.head()
image-20200527210740762.png

3.1.4 数据合并

# 将山地自行车、旅游自行车、公路自行车每月销售信息合并
gather_customer_order_month=pd.concat([gather_customer_order_road_month,gather_customer_order_Moutain_month,gather_customer_order_tour_month],axis=0)  # axis=0:index
# 新增一列,每月各产品销售量占每月各类自行车总销售量比率
gather_customer_order_month['order_num_proportio']=gather_customer_order_month['order_num_x']/gather_customer_order_month['order_num_y']
# 重命名
gather_customer_order_month.rename(columns={'order_num_x':'order_month_product','order_num_y':'sum_order_month'},inplace=True)

gather_customer_order_month.head()
image-20200527211005481.png

3.2 2019年11月自行车环比

3.2.1 2019年11月产品销售数量环比

# 计算11月环比,先筛选10、11月数据
gather_customer_order_month_10_11=gather_customer_order_month[gather_customer_order_month['create_year_month'].isin(['2019-10','2019-11'])]
# 排序
gather_customer_order_month_10_11.sort_values(['product_name','create_year_month'],inplace=True)
# 计算产品销售数量环比
product_name=gather_customer_order_month_10_11['product_name'].unique()
order_top_x=pd.Series()
for i in product_name:
    a=gather_customer_order_month_10_11[gather_customer_order_month_10_11['product_name']==i]['order_month_product'].pct_change(1)
    order_top_x=order_top_x.append(a)
order_top_x.fillna(0,inplace=True)
# 新建列
gather_customer_order_month_10_11['order_num_diff']=order_top_x

gather_customer_order_month_10_11.head()
image-20200527211604311.png
# 筛选出11月自行车数据
gather_customer_order_month_11=gather_customer_order_month_10_11[gather_customer_order_month_10_11['create_year_month']=='2019-11']

gather_customer_order_month_11.head()
image-20200527211715275.png

3.3 2019年1至11月产品总销量

# 筛选2019年1月至11月自行车数据
gather_customer_order_month_1_11=gather_customer_order_month[gather_customer_order_month['create_year_month']<'2019-12']
# 计算2019年1月至11月产品总销量
gather_customer_order_month_1_11_sum=gather_customer_order_month_1_11.groupby('product_name')['order_month_product'].sum().reset_index()
# 重命名sum_order_1_11:1-11月产品总销量
gather_customer_order_month_1_11_sum.rename(columns={'order_month_product':'sum_order_1_11'},inplace=True)

gather_customer_order_month_1_11_sum.head()
image-20200527213658058.png

3.4 2019年11月产品销量占各类自行车总销量比例+产品全年销售量

# 按相同字段product_name,合并两张表
gather_customer_order_month_1_11_sum=pd.merge(gather_customer_order_month_11,gather_customer_order_month_1_11_sum,on='product_name')

gather_customer_order_month_1_11_sum.head()
image-20200527214048419.png

4. 用户行为分析

# 读取数据库客户信息表(2019-1-11客户)
sql1="select customer_key,birth_date,gender,marital_status from ods_customer where create_date<'2019-12-1'"
df2=pd.read_sql_query(sql1,engine)

# 读取数据库销售订单表(2019-11订单)
sql2="select * from ods_sales_orders where create_date>='2019-11-1' and create_date<'2019-12-1'"
df3=pd.read_sql_query(sql2,engine)

df_customer=df2.copy()
df_sales_orders_11=df3.copy()
# 合并2张表
sales_customer_order_11=pd.merge(df_sales_orders_11,df_customer,on='customer_key',how='left')

sales_customer_order_11.head()
image-20200527214356405.png
# 新建列出生年
birth_year=sales_customer_order_11['birth_date'].str.split('-',expand=True).drop([1,2],axis=1).rename(columns={0:'birth_year'})  # expand=True:series→dataframe
# 合并
sales_customer_order_11=pd.concat([sales_customer_order_11,birth_year],axis=1)

sales_customer_order_11.head()
image-20200527215445987.png

4.1 用户年龄分析

4.1.1 年龄分组

# 填充缺失客户数据
sales_customer_order_11.fillna('0',inplace=True)
# 修改出生年为int数据类型
sales_customer_order_11['birth_year']=sales_customer_order_11['birth_year'].astype('int64')
# 计算用户年龄
sales_customer_order_11['customer_age']=2020-sales_customer_order_11['birth_year']
# 年龄分层
bins=[30,34,39,44,49,54,59,64]
sales_customer_order_11['age_level']=pd.cut(sales_customer_order_11['customer_age'],bins,labels=['30-34','35-39','40-44','45-49','50-54','55-59','60-64'])

sales_customer_order_11.head()
image-20200527215628458.png

4.1.2 计算年龄组数和占比

# 筛选销售订单为自行车的订单
df_customer_order_bycle=sales_customer_order_11[sales_customer_order_11['cplb_zw']=='自行车']
# 计算年龄比例
age_group_count=df_customer_order_bycle.groupby('age_level')['customer_age'].count().reset_index().rename(columns={'customer_age':'age_group_count'})
# 合并
df_customer_order_bycle=pd.merge(df_customer_order_bycle,age_group_count,on='age_level')
# 每位用户的年龄占该年龄组的比例
df_customer_order_bycle['age_level_rate']=1/df_customer_order_bycle['age_group_count']

df_customer_order_bycle.head()
image-20200527215831302.png

4.1.3 年龄分层

# 将年龄分为3个层次
df_customer_order_bycle.loc[df_customer_order_bycle['customer_age']>=40,'age_level2']='>=40'
df_customer_order_bycle.loc[df_customer_order_bycle['customer_age']<=29,'age_level2']='<=29'
df_customer_order_bycle.loc[(df_customer_order_bycle['customer_age']>=30)&(df_customer_order_bycle['customer_age']<=39),'age_level2']='30-39'

df_customer_order_bycle.head()
image-20200527220037858.png

4.1.4 各年龄分层人数

# 求每个年龄段人数
age_level2_count=pd.DataFrame(df_customer_order_bycle['age_level2'].value_counts()).reset_index()
# 重命名
age_level2_count.rename(columns={'index':'age_level2','age_level2':'sales_order_key'},inplace=True)

age_level2_count
image-20200527220144937.png

4.2 用户性别

gender_count=pd.DataFrame(df_customer_order_bycle['gender'].value_counts()).reset_index()
gender_count.rename(columns={'index':'gender','gender':'cplb_zw'},inplace=True)

gender_count
image-20200527220309889.png
df_customer_order_bycle=pd.merge(df_customer_order_bycle,age_level2_count,on='age_level2').rename(columns={'sales_order_key_y':'age_level2_count'})
df_customer_order_bycle['age_level2_rate']=1/df_customer_order_bycle['age_level2_count']
df_customer_order_bycle=pd.merge(df_customer_order_bycle,gender_count,on='gender').rename(columns={'cplb_zw_y':'gender_count'})
df_customer_order_bycle['gender_rate']=1/df_customer_order_bycle['gender_count']

df_customer_order_bycle.head()
image-20200527220442717.png

5. 2019年11月热品销售分析

5.1 11月产品销量top10产品销量与环比

我们在分析2019年11月自行车产品销售表现时已计算出11月所有产品的销量与环比,这里不重复计算,直接使用gather_customer_order_month_10_11、gather_customer_order_month_11

5.1.1 计算top10产品

# 计算产品销售数量
# 按照销量降序,取TOP10产品
customer_order_11_top10=gather_customer_order_11.groupby('product_name')['order_num'].sum().reset_index().\
                        sort_values('order_num',ascending=False).head(10)

customer_order_11_top10
image-20200527221025581.png

5.1.2 计算top10产品销量及环比

这里我们只需要四个字段:create_year_month月份、product_name产品名、order_month_product本月销量、cpzl_zw产品类别、order_num_diff本月产品销量环比(上月)

customer_order_month_10_11=gather_customer_order_month_10_11.drop(['sum_order_month','order_num_proportio'],axis=1)
customer_order_month_10_11=customer_order_month_10_11[customer_order_month_10_11['product_name'].\
                                                     isin(customer_order_11_top10['product_name'].unique())]
customer_order_month_10_11['category']='本月TOP10销量'

customer_order_month_10_11
image-20200527221311658.png

5.2 11月增速top10产品销售数量与环比

customer_order_month_11=gather_customer_order_month_11.sort_values('order_num_diff',ascending=False).head(10)
customer_order_month_11_top10_seep=gather_customer_order_month_10_11[gather_customer_order_month_10_11['product_name']\
.isin(customer_order_month_11['product_name'])]

筛选我们需要的4个字段:create_year_month月份、product_name产品名、order_month_product本月销量、cpzl_zw产品类别、order_num_diff本月产品销量环比

customer_order_month_11_top10_seep.drop(['sum_order_month','order_num_proportio'],axis=1,inplace=True)
customer_order_month_11_top10_seep['category']='本月Top10增速'

customer_order_month_11_top10_seep.head()
image-20200527222455470.png

5.3 合并TOP10销量表customer_order_month_10_11,TOP增速customer_order_month_11_top10_seep

# axis=0按行合并,axis=1按列合并
hot_product_11=pd.concat([customer_order_month_10_11,customer_order_month_11_top10_seep],axis=0)

hot_product_11.tail()
image-20200527222542204.png
# axis=0按行合并,axis=1按列合并
hot_product_11=pd.merge(customer_order_month_10_11,customer_order_month_11_top10_seep,on=['product_name','create_year_month'])

hot_product_11
image-20200527222916808.png

最后这些变量都可以批量存入数据库,用power bi 画图。

你可能感兴趣的:(2019年11月自行车业务销售分析_个人笔记)