货架分类

为了规范化货架管理,根据货架的销量和货损将货架分类,为不同质量的货架提供不同的服务

为了不使货架分类过多(预计大概分为3-6类),选用可以指定聚类类数的K-Means算法进行聚类,选出最佳聚类数

import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
from sklearn import metrics
import matplotlib.pyplot as plt
#读入数据
datas = pd.read_csv('C:/Users/acer/Desktop/sf-data.CSV')
datas
 
  ID GMV LOSE LR
0 A275000 4481 578 0.128989
1 A132634 4383 399 0.091034
2 A165561 4300 348 0.080930
3 A524005 4285 592 0.138156
4 A450039 4273 564 0.131992
5 A402586 4229 680 0.160795
6 A389480 4207 552 0.131210
7 A346645 4176 523 0.125239
8 A773916 4124 361 0.087536
9 A323355 4101 693 0.168983
10 A506280 4092 612 0.149560
11 A271269 3765 408 0.108367
12 A716208 3712 537 0.144666
13 A520253 3690 680 0.184282
14 A372283 3553 414 0.116521
15 A271421 3550 413 0.116338
16 A397239 3538 359 0.101470
17 A358485 3528 462 0.130952
18 A526219 3498 534 0.152659
19 A508231 3436 344 0.100116
20 A510649 3341 537 0.160730
21 A538729 3302 432 0.130830
22 A668628 3279 527 0.160720
23 A782193 3264 316 0.096814
24 A561473 3259 366 0.112304
25 A424867 3253 341 0.104826
26 A470397 3197 554 0.173287
27 A630001 3171 570 0.179754
28 A433220 3158 558 0.176694
29 A365840 3091 604 0.195406
... ... ... ... ...
584 A341195 104 11 0.105769
585 A799917 100 13 0.130000
586 A538103 96 34 0.354167
587 A217965 94 41 0.436170
588 A493991 93 21 0.225806
589 A228910 88 28 0.318182
590 A726803 71 37 0.521127
591 A738820 67 68 1.014925
592 A643449 66 67 1.015152
593 A508898 62 35 0.564516
594 A286848 58 44 0.758621
595 A407985 54 21 0.388889
596 A464971 51 52 1.019608
597 A201682 47 21 0.446809
598 A516307 47 11 0.234043
599 A725856 39 11 0.282051
600 A612603 34 13 0.382353
601 A663156 27 28 1.037037
602 A167316 23 4 0.173913
603 A161877 22 39 1.772727
604 A706346 0 29 0.000000
605 A444550 0 4 0.000000
606 A559078 0 26 0.000000
607 A574825 0 61 0.000000
608 A203674 0 57 0.000000
609 A351185 0 8 0.000000
610 A614387 0 24 0.000000
611 A707170 0 20 0.000000
612 A681535 0 9 0.000000
613 A727140 0 70 0.000000

614 rows × 4 columns

 

根据要求销量过低货损过高的货架都给予撤架处理,所以将货损率LR大于25%且销量低于500和销量低于100的数据剔除。

datas = datas[datas.GMV>100]
datas = datas.drop(datas[(datas.GMV<500) & (datas.LR>0.25)].index)
data = datas[['GMV','LOSE','LR']]
data
 
  GMV LOSE LR
0 4481 578 0.128989
1 4383 399 0.091034
2 4300 348 0.080930
3 4285 592 0.138156
4 4273 564 0.131992
5 4229 680 0.160795
6 4207 552 0.131210
7 4176 523 0.125239
8 4124 361 0.087536
9 4101 693 0.168983
10 4092 612 0.149560
11 3765 408 0.108367
12 3712 537 0.144666
13 3690 680 0.184282
14 3553 414 0.116521
15 3550 413 0.116338
16 3538 359 0.101470
17 3528 462 0.130952
18 3498 534 0.152659
19 3436 344 0.100116
20 3341 537 0.160730
21 3302 432 0.130830
22 3279 527 0.160720
23 3264 316 0.096814
24 3259 366 0.112304
25 3253 341 0.104826
26 3197 554 0.173287
27 3171 570 0.179754
28 3158 558 0.176694
29 3091 604 0.195406
... ... ... ...
531 177 13 0.073446
532 176 40 0.227273
533 176 36 0.204545
535 174 8 0.045977
538 164 41 0.250000
540 163 34 0.208589
541 158 21 0.132911
542 158 2 0.012658
543 156 34 0.217949
545 156 7 0.044872
549 151 33 0.218543
550 151 17 0.112583
551 149 31 0.208054
552 149 37 0.248322
553 146 17 0.116438
556 139 21 0.151079
557 135 24 0.177778
559 133 21 0.157895
562 131 20 0.152672
564 130 11 0.084615
565 127 21 0.165354
566 123 25 0.203252
569 118 13 0.110169
571 117 23 0.196581
572 115 1 0.008696
573 111 10 0.090090
576 110 23 0.209091
580 106 7 0.066038
582 105 8 0.076190
584 104 11 0.105769

541 rows × 3 columns


#由于销量GMV和货损LOSE的量纲相差太大,所以将数据标准化后训练模型
scaler = StandardScaler().fit(data.astype(float))
data = scaler.transform(data.astype(float))
data = pd.DataFrame({'GMV':data[:,0], 'LOSE':data[:,1], 'LR':data[:,1]})
data
 
  GMV LOSE LR
0 4.236391 3.468713 3.468713
1 4.124695 2.175132 2.175132
2 4.030095 1.806570 1.806570
3 4.012999 3.569887 3.569887
4 3.999322 3.367539 3.367539
5 3.949173 4.205837 4.205837
6 3.924098 3.280819 3.280819
7 3.888766 3.071244 3.071244
8 3.829499 1.900517 1.900517
9 3.803284 4.299784 4.299784
10 3.793027 3.714421 3.714421
11 3.420327 2.240173 2.240173
12 3.359920 3.172418 3.172418
13 3.334846 4.205837 4.205837
14 3.178700 2.283533 2.283533
15 3.175280 2.276306 2.276306
16 3.161603 1.886064 1.886064
17 3.150206 2.630415 2.630415
18 3.116013 3.150738 3.150738
19 3.045349 1.777663 1.777663
20 2.937072 3.172418 3.172418
21 2.892622 2.413614 2.413614
22 2.866407 3.100151 3.100151
23 2.849311 1.575316 1.575316
24 2.843612 1.936651 1.936651
25 2.836774 1.755983 1.755983
26 2.772948 3.295272 3.295272
27 2.743314 3.410899 3.410899
28 2.728497 3.324179 3.324179
29 2.652134 3.656607 3.656607
... ... ... ...
511 -0.669107 -0.614377 -0.614377
512 -0.670246 -0.419256 -0.419256
513 -0.670246 -0.448163 -0.448163
514 -0.672526 -0.650511 -0.650511
515 -0.683923 -0.412029 -0.412029
516 -0.685063 -0.462616 -0.462616
517 -0.690762 -0.556563 -0.556563
518 -0.690762 -0.693871 -0.693871
519 -0.693041 -0.462616 -0.462616
520 -0.693041 -0.657737 -0.657737
521 -0.698740 -0.469843 -0.469843
522 -0.698740 -0.585470 -0.585470
523 -0.701020 -0.484296 -0.484296
524 -0.701020 -0.440936 -0.440936
525 -0.704439 -0.585470 -0.585470
526 -0.712417 -0.556563 -0.556563
527 -0.716976 -0.534883 -0.534883
528 -0.719256 -0.556563 -0.556563
529 -0.721535 -0.563790 -0.563790
530 -0.722675 -0.628830 -0.628830
531 -0.726094 -0.556563 -0.556563
532 -0.730653 -0.527657 -0.527657
533 -0.736352 -0.614377 -0.614377
534 -0.737492 -0.542110 -0.542110
535 -0.739771 -0.701098 -0.701098
536 -0.744330 -0.636057 -0.636057
537 -0.745470 -0.542110 -0.542110
538 -0.750029 -0.657737 -0.657737
539 -0.751169 -0.650511 -0.650511
540 -0.752309 -0.628830 -0.628830

541 rows × 3 columns


#作出散点图
plt.scatter(data['GMV'],data['LOSE'])
  货架分类_第1张图片

将货架分为3-12类

n = 0
for k in range(3,13):
    n +=1
    plt.subplot(5,2,n)
    models = KMeans(n_clusters = k).fit(data)
    sorts = models.predict(data)
    scores = metrics.calinski_harabaz_score(data, sorts)
    print(k,'类得分:',scores)
    plt.scatter(data['GMV'], data['LOSE'], c=sorts)
    plt.text(.99, .01, ('k=%d, scores: %.2f' % (k,scores)),
                 transform=plt.gca().transAxes, size=10,
                 horizontalalignment='right')
3 类得分: 2410.732723770415
4 类得分: 2386.4614419681943
5 类得分: 2522.589657882834
6 类得分: 2667.9083791621647
7 类得分: 2525.5496643415527
8 类得分: 2631.022208672163
9 类得分: 2640.4237671093597
10 类得分: 2784.043529129476
11 类得分: 2885.534381287436
12 类得分: 2887.893470979444
货架分类_第2张图片

将货架分为3-12类时,从得分增长看到,对货架分类是分类越细得分越高即分类效果越好,在超过9类之后,得分差不多维持稳定

但考虑到业务会随着货架种类的增多变得复杂化,所以当选择尽可能最少的分类并达到简化业务的分类方法,所以当从3、5、6中选择一种分类

建议选择5个分类,因为6个分类类别过多,且6个分类的与3、5分类的得分差距来看,差距不大,而3、5两个分类得分几乎相同,但3个分类类别过少,

可能无法精细化业务,而5个分类则可以保证分类不导致使业务太过复杂的情况下精细化服务

 

各货架的分类

model = KMeans(n_clusters = 5)
K_Means = model.fit(data)
K_Means.labels_
array([3, 4, 4, 3, 3, 3, 3, 3, 4, 3, 3, 4, 3, 3, 4, 4, 4, 4, 3, 4, 3, 4,
       3, 4, 4, 4, 3, 3, 3, 3, 4, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 2, 2, 0, 2, 0, 2, 0, 2, 0, 2, 0, 2, 0, 2, 0, 0, 2,
       2, 0, 2, 2, 0, 0, 0, 2, 0, 2, 0, 2, 0, 0, 0, 2, 2, 0, 0, 2, 0, 2,
       0, 0, 2, 2, 0, 0, 2, 0, 2, 2, 0, 0, 2, 2, 0, 2, 2, 0, 2, 0, 0, 0,
       2, 2, 0, 0, 0, 0, 2, 2, 2, 2, 2, 2, 2, 0, 0, 2, 2, 2, 0, 0, 2, 0,
       2, 0, 0, 2, 2, 2, 2, 0, 2, 0, 2, 0, 2, 0, 0, 0, 0, 2, 0, 0, 2, 0,
       2, 0, 2, 2, 2, 0, 0, 2, 0, 0, 0, 0, 2, 2, 0, 2, 0, 0, 0, 2, 0, 0,
       2, 2, 0, 2, 0, 2, 0, 2, 2, 0, 0, 2, 2, 2, 0, 0, 2, 2, 2, 2, 2, 0,
       2, 0, 0, 2, 2, 2, 2, 0, 0, 2, 0, 2, 2, 2, 2, 2, 0, 0, 2, 0, 2, 0,
       0, 0, 0, 2, 2, 2, 2, 0, 2, 0, 2, 2, 0, 2, 0, 0, 0, 2, 2, 0, 0, 2,
       2, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])
data.insert(3,column = 'grade', value = K_Means.labels_)
data.head()
  GMV LOSE LR grade
0 4.236391 3.468713 3.468713 3
1 4.124695 2.175132 2.175132 4
2 4.030095 1.806570 1.806570 4
3 4.012999 3.569887 3.569887 3
4 3.999322 3.367539 3.367539 3

各类货架及其销量/货损散点图

plt.figure(figsize=(14,7), facecolor='w')
plt.ylim(-1,5)
plt.plot(data['LR'], data['grade'], 'bo',markersize = 8, zorder=2, label='LR')
plt.plot(data['GMV'], data['grade'], 'go', markersize = 16, zorder=1, label='GMV' )
plt.legend(loc = 'upper left')
plt.xlabel('grade', fontsize=18)
plt.ylabel('GMV or LR', fontsize=18)
plt.title('classfication and indicators', fontsize=20)
Text(0.5, 1.0, 'classfication and indicators')
货架分类_第3张图片
#各类货架收益与货损占比
GMV_sum = sum(abs(data['GMV']))
LR_sum = sum(abs(data['LR']))
print('一类货架收益占比:',(sum(abs(data[data.grade == 0].GMV))/GMV_sum)*100)
print('一类货架货损占比:',(sum(abs(data[data.grade == 0].LR))/LR_sum)*100)
print('二类货架收益占比:',(sum(abs(data[data.grade == 1].GMV))/GMV_sum)*100)
print('二类货架货损占比:',(sum(abs(data[data.grade == 1].LR))/LR_sum)*100)
print('三类货架收益占比:',(sum(abs(data[data.grade == 2].GMV))/GMV_sum)*100)
print('三类货架货损占比:',(sum(abs(data[data.grade == 2].LR))/LR_sum)*100)
print('四类货架收益占比:',(sum(abs(data[data.grade == 3].GMV))/GMV_sum)*100)
print('四类货架货损占比:',(sum(abs(data[data.grade == 3].LR))/LR_sum)*100)
print('五类货架收益占比:',(sum(abs(data[data.grade == 4].GMV))/GMV_sum)*100)
print('五类货架货损占比:',(sum(abs(data[data.grade == 4].LR))/LR_sum)*100)
一类货架收益占比: 6.164401861697537
一类货架货损占比: 6.652831058827551
二类货架收益占比: 9.089795550942922
二类货架货损占比: 15.101246088993388
三类货架收益占比: 45.97416334065695
三类货架货损占比: 46.421436906323414
四类货架收益占比: 19.726729310594852
四类货架货损占比: 19.542840412045173
五类货架收益占比: 19.04490993610771
五类货架货损占比: 12.281645533810488

可以看出根据货架的销量和货损将货架分为五类,分别是:

第一类:低销量中低货损,此类货架中存在部分损失远大于收益的情况,对于这部分货架考虑撤架

第二类:中等销量中上货损,损失大于收益,此类货架中存在部分损失远大于收益的情况,对于这部分货架考虑撤架

第三类:低销量低货损,收益大于损失,可根据维护成本酌情撤架

第四类:高销量高货损,此类货损贡献了很大的GMV但货损也较高几乎抵消掉了收益,应设法降低货损

第五类:高销量低货损,此类货架为优质货架

 

提取第一类货架中货损在总货损中占比大于销量在总销量中占比且单个货架货损占货架销量15%以上的货架,这部分货架是考虑撤架的货架

 

index2 = list(data[data.LR > max(data[data.grade==0].GMV)].index)
datas2 = datas.iloc[index2, :]
datas2[datas2.LR>0.15].sort_values(by=['LR'],ascending=False)

 

  ID GMV LOSE LR
67 A131502 1303 420 0.322333
66 A515389 1310 422 0.322137
55 A505492 1731 444 0.256499
63 A549268 1542 391 0.253567
64 A492069 1437 363 0.252610
53 A275823 1766 431 0.244054
65 A726433 1386 338 0.243867
90 A559101 800 193 0.241250
61 A326788 1551 369 0.237911
49 A266437 1945 449 0.230848
54 A570130 1764 402 0.227891
58 A453264 1632 366 0.224265
88 A625828 831 186 0.223827
57 A431484 1691 375 0.221762
82 A228580 911 199 0.218441
62 A526263 1545 335 0.216828
56 A122083 1704 366 0.214789
32 A301564 3045 643 0.211166
51 A407693 1911 403 0.210884
60 A555168 1559 326 0.209108
45 A405486 2102 435 0.206946
48 A720532 1972 403 0.204361
87 A283051 852 174 0.204225
59 A196705 1569 317 0.202040
50 A446950 1934 379 0.195967
29 A365840 3091 604 0.195406
33 A621554 2990 570 0.190635
44 A298717 2370 444 0.187342
13 A520253 3690 680 0.184282
27 A630001 3171 570 0.179754
43 A398851 2385 425 0.178197
52 A378337 1842 326 0.176982
28 A433220 3158 558 0.176694
42 A162793 2512 442 0.175955
26 A470397 3197 554 0.173287
76 A502892 1132 195 0.172261
79 A338628 1073 183 0.170550
9 A323355 4101 693 0.168983
5 A402586 4229 680 0.160795
20 A510649 3341 537 0.160730
22 A668628 3279 527 0.160720
47 A557736 2005 322 0.160599
74 A620952 1159 186 0.160483
69 A608779 1203 193 0.160432
31 A550157 3076 493 0.160273
70 A371447 1186 188 0.158516
36 A398789 2884 446 0.154646
18 A526219 3498 534 0.152659
38 A222345 2689 409 0.152101

提取第二类货架中货损在总货损中占比大于销量在总销量中占比且单个货架货损占货架销量15%以上的货架,这部分货架是考虑撤架的货架

index4 = list(data[data.LR > max(data[data.grade==1].GMV)].index)
datas4 = datas.iloc[index4, :]
datas4[datas4.LR>0.15].sort_values(by=['LR'],ascending=False)
 
  ID GMV LOSE LR
67 A131502 1303 420 0.322333
66 A515389 1310 422 0.322137
55 A505492 1731 444 0.256499
63 A549268 1542 391 0.253567
53 A275823 1766 431 0.244054
49 A266437 1945 449 0.230848
54 A570130 1764 402 0.227891
57 A431484 1691 375 0.221762
32 A301564 3045 643 0.211166
51 A407693 1911 403 0.210884
45 A405486 2102 435 0.206946
48 A720532 1972 403 0.204361
50 A446950 1934 379 0.195967
29 A365840 3091 604 0.195406
33 A621554 2990 570 0.190635
44 A298717 2370 444 0.187342
13 A520253 3690 680 0.184282
27 A630001 3171 570 0.179754
43 A398851 2385 425 0.178197
28 A433220 3158 558 0.176694
42 A162793 2512 442 0.175955
26 A470397 3197 554 0.173287
9 A323355 4101 693 0.168983
5 A402586 4229 680 0.160795
20 A510649 3341 537 0.160730
22 A668628 3279 527 0.160720
31 A550157 3076 493 0.160273
36 A398789 2884 446 0.154646
18 A526219 3498 534 0.152659
38 A222345 2689 409 0.152101

转载于:https://www.cnblogs.com/aioverg/p/11157272.html

你可能感兴趣的:(货架分类)