英雄联盟(LOL)是一个MOBA(多人在线战斗竞技场),其中2支队伍(蓝色和红色)对峙。有3条车道,一个丛林和5个角色。目标是击倒敌方水晶以赢得比赛.
玩家—>杀死小兵和野怪—>获得金币和buff–>摧毁敌方防御塔------>摧毁敌方水晶(获得胜利)
| | |
| | 击杀敌人
放置视野眼,获得视野优势 | |
|__—>购买装备,提升实力
随着网络游戏在年轻人中的盛行,电子竞技也变得越来越流行.LOL(英雄联盟)作为电子竞技的代表游戏之一,受到越来越多年轻人的关注和喜爱
可行性:在每一次比赛的过程中,最终的胜利受到许许多多因素的影响,不可否认,玩家的操作技术和意识是决定比赛输赢的关键因素,但玩家的所有自身实力都会反映在游戏中的数据里面,而LOL又是一款团队竞技游戏,因此,一局比赛进行到后期时的数据能够比较好的反映出操作者的水平和团队间的配合.此时的数据就可以比较准确的用来对比赛的输赢进行预测.
目的:通过对LOL数据的分析,不仅可以用来对比赛的输赢进行预测,同时,也可以发现对一局比赛输赢影响较大的因素,从而对现实生活中的比赛具有指导意义.
数据来源:网络
**数据集简介:**此数据集包含前10分钟大约统计 从高ELO(钻石I到大师)的10k次排位游戏。玩家的水平大致相同。游戏开始10分钟后,每支队伍收集了19项数据(总共38项)。其中包括杀戮,死亡,金钱,经验,等级……
data = pd.read_csv('high_diamond_ranked_10min.csv', index_col=0)
print(data.head())
gameId blueWins blueWardsPlaced blueWardsDestroyed blueFirstBlood \
0 4519157822 0 28 2 1
1 4523371949 0 12 1 0
2 4521474530 0 15 0 0
3 4524384067 0 43 1 0
4 4436033771 0 75 4 0
blueKills blueDeaths blueAssists blueEliteMonsters blueDragons \
0 9 6 11 0 0
1 5 5 5 0 0
2 7 11 4 1 1
3 4 5 5 1 0
4 6 6 6 0 0
blueHeralds blueTowersDestroyed blueTotalGold blueAvgLevel \
0 0 0 17210 6.6
1 0 0 14712 6.6
2 0 0 16113 6.4
3 1 0 15157 7.0
4 0 0 16400 7.0
blueTotalExperience blueTotalMinionsKilled blueTotalJungleMinionsKilled \
0 17039 195 36
1 16265 174 43
2 16221 186 46
3 17954 201 55
4 18543 210 57
blueGoldDiff blueExperienceDiff blueCSPerMin blueGoldPerMin \
0 643 -8 19.5 1721.0
1 -2908 -1173 17.4 1471.2
2 -1172 -1033 18.6 1611.3
3 -1321 -7 20.1 1515.7
4 -1004 230 21.0 1640.0
redWardsPlaced redWardsDestroyed redFirstBlood redKills redDeaths \
0 15 6 0 6 9
1 12 1 1 5 5
2 15 3 1 11 7
3 15 2 1 5 4
4 17 2 1 6 6
redAssists redEliteMonsters redDragons redHeralds redTowersDestroyed \
0 8 0 0 0 0
1 2 2 1 1 1
2 14 0 0 0 0
3 10 0 0 0 0
4 7 1 1 0 0
redTotalGold redAvgLevel redTotalExperience redTotalMinionsKilled \
0 16567 6.8 17047 197
1 17620 6.8 17438 240
2 17285 6.8 17254 203
3 16478 7.0 17961 235
4 17404 7.0 18313 225
redTotalJungleMinionsKilled redGoldDiff redExperienceDiff redCSPerMin \
0 55 -643 8 19.7
1 52 2908 1173 24.0
2 28 1172 1033 20.3
3 47 1321 7 23.5
4 67 1004 -230 22.5
redGoldPerMin
0 1656.7
1 1762.0
2 1728.5
3 1647.8
4 1740.4
print(data.shape)
数据形状: (9879, 40) #集合共包含9879个元数据,每个数据有40列
print(data.describe)
数据概览: blueWins blueWardsPlaced blueWardsDestroyed blueFirstBlood \
count 9879.000000 9879.000000 9879.000000 9879.000000
mean 0.499038 22.288288 2.824881 0.504808
std 0.500024 18.019177 2.174998 0.500002
min 0.000000 5.000000 0.000000 0.000000
25% 0.000000 14.000000 1.000000 0.000000
50% 0.000000 16.000000 3.000000 1.000000
75% 1.000000 20.000000 4.000000 1.000000
max 1.000000 250.000000 27.000000 1.000000
blueKills blueDeaths blueAssists blueEliteMonsters blueDragons \
count 9879.000000 9879.000000 9879.000000 9879.000000 9879.000000
mean 6.183925 6.137666 6.645106 0.549954 0.361980
std 3.011028 2.933818 4.064520 0.625527 0.480597
min 0.000000 0.000000 0.000000 0.000000 0.000000
25% 4.000000 4.000000 4.000000 0.000000 0.000000
50% 6.000000 6.000000 6.000000 0.000000 0.000000
75% 8.000000 8.000000 9.000000 1.000000 1.000000
max 22.000000 22.000000 29.000000 2.000000 1.000000
blueHeralds blueTowersDestroyed blueTotalGold blueAvgLevel \
count 9879.000000 9879.000000 9879.000000 9879.000000
mean 0.187974 0.051422 16503.455512 6.916004
std 0.390712 0.244369 1535.446636 0.305146
min 0.000000 0.000000 10730.000000 4.600000
25% 0.000000 0.000000 15415.500000 6.800000
50% 0.000000 0.000000 16398.000000 7.000000
75% 0.000000 0.000000 17459.000000 7.200000
max 1.000000 4.000000 23701.000000 8.000000
blueTotalExperience blueTotalMinionsKilled \
count 9879.000000 9879.000000
mean 17928.110133 216.699565
std 1200.523764 21.858437
min 10098.000000 90.000000
25% 17168.000000 202.000000
50% 17951.000000 218.000000
75% 18724.000000 232.000000
max 22224.000000 283.000000
blueTotalJungleMinionsKilled blueGoldDiff blueExperienceDiff \
count 9879.000000 9879.000000 9879.000000
mean 50.509667 14.414111 -33.620306
std 9.898282 2453.349179 1920.370438
min 0.000000 -10830.000000 -9333.000000
25% 44.000000 -1585.500000 -1290.500000
50% 50.000000 14.000000 -28.000000
75% 56.000000 1596.000000 1212.000000
max 92.000000 11467.000000 8348.000000
blueCSPerMin blueGoldPerMin redWardsPlaced redWardsDestroyed \
count 9879.000000 9879.000000 9879.000000 9879.000000
mean 21.669956 1650.345551 22.367952 2.723150
std 2.185844 153.544664 18.457427 2.138356
min 9.000000 1073.000000 6.000000 0.000000
25% 20.200000 1541.550000 14.000000 1.000000
50% 21.800000 1639.800000 16.000000 2.000000
75% 23.200000 1745.900000 20.000000 4.000000
max 28.300000 2370.100000 276.000000 24.000000
redFirstBlood redKills redDeaths redAssists redEliteMonsters \
count 9879.000000 9879.000000 9879.000000 9879.000000 9879.000000
mean 0.495192 6.137666 6.183925 6.662112 0.573135
std 0.500002 2.933818 3.011028 4.060612 0.626482
min 0.000000 0.000000 0.000000 0.000000 0.000000
25% 0.000000 4.000000 4.000000 4.000000 0.000000
50% 0.000000 6.000000 6.000000 6.000000 0.000000
75% 1.000000 8.000000 8.000000 9.000000 1.000000
max 1.000000 22.000000 22.000000 28.000000 2.000000
redDragons redHeralds redTowersDestroyed redTotalGold \
count 9879.000000 9879.000000 9879.000000 9879.000000
mean 0.413098 0.160036 0.043021 16489.041401
std 0.492415 0.366658 0.216900 1490.888406
min 0.000000 0.000000 0.000000 11212.000000
25% 0.000000 0.000000 0.000000 15427.500000
50% 0.000000 0.000000 0.000000 16378.000000
75% 1.000000 0.000000 0.000000 17418.500000
max 1.000000 1.000000 2.000000 22732.000000
redAvgLevel redTotalExperience redTotalMinionsKilled \
count 9879.000000 9879.000000 9879.000000
mean 6.925316 17961.730438 217.349226
std 0.305311 1198.583912 21.911668
min 4.800000 10465.000000 107.000000
25% 6.800000 17209.500000 203.000000
50% 7.000000 17974.000000 218.000000
75% 7.200000 18764.500000 233.000000
max 8.200000 22269.000000 289.000000
redTotalJungleMinionsKilled redGoldDiff redExperienceDiff \
count 9879.000000 9879.000000 9879.000000
mean 51.313088 -14.414111 33.620306
std 10.027885 2453.349179 1920.370438
min 4.000000 -11467.000000 -8348.000000
25% 44.000000 -1596.000000 -1212.000000
50% 51.000000 -14.000000 28.000000
75% 57.000000 1585.500000 1290.500000
max 92.000000 10830.000000 9333.000000
redCSPerMin redGoldPerMin
count 9879.000000 9879.000000
mean 21.734923 1648.904140
std 2.191167 149.088841
min 10.700000 1121.200000
25% 20.300000 1542.750000
50% 21.800000 1637.800000
75% 23.300000 1741.850000
max 28.900000 2273.200000
pd.set_option('display.width', 10) #设置Console每一行展示的最大宽度,屏幕一行显示满之后才会进行换行
print("数据列名:",data.columns)
Index(['gameId', #每局游戏的唯一ID。
#--------------------------------------------------------------------
'blueWins', #蓝方是否获得胜利 1:胜利 0:失败 *****因变量****
#--------------------------------------------------------------------19项
'blueWardsPlaced', #蓝色团队在地图上放置的视野眼数量
'blueWardsDestroyed', #蓝队摧毁的敌方视野眼数量
'blueFirstBlood', #蓝方是否获得一血(游戏的第一杀) 1:获得 0:未获得
'blueKills', #蓝队杀死的敌人数量
'blueDeaths', #死亡人数(蓝队)
'blueAssists', #击杀助攻数(蓝队)
'blueEliteMonsters', #蓝队杀死的精锐怪物数量(龙与先驱队)
'blueDragons', #蓝队杀死的龙数量
'blueHeralds', #蓝队杀死的精英怪物数量
'blueTowersDestroyed', #蓝队摧毁防御塔数量
'blueTotalGold', #蓝队总的金币数量
'blueAvgLevel', #蓝队平均等级
'blueTotalExperience', #蓝队总的经验
'blueTotalMinionsKilled', #蓝队杀死的小兵总数
'blueTotalJungleMinionsKilled', #蓝队杀死的野怪总数
'blueGoldDiff', #蓝队金币与红队差值
'blueExperienceDiff', #蓝队经验差值
'blueCSPerMin', #蓝队每分钟摧毁视野眼数量
'blueGoldPerMin', #蓝队每分钟获得金币数量
#红方与蓝方相同--------------------------------------------------------19项
'redWardsPlaced',
'redWardsDestroyed',
'redFirstBlood',
'redKills',
'redDeaths',
'redAssists',
'redEliteMonsters',
'redDragons',
'redHeralds',
'redTowersDestroyed',
'redTotalGold',
'redAvgLevel',
'redTotalExperience',
'redTotalMinionsKilled',
'redTotalJungleMinionsKilled',
'redGoldDiff',
'redExperienceDiff',
'redCSPerMin',
'redGoldPerMin'],
data.dropna(axis=0, how='any', inplace=True)
print("数据概览:",data.info())
<class 'pandas.core.frame.DataFrame'>
Int64Index: 9879 entries, 0 to 9878
Data columns (total 40 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 gameId 9879 non-null int64
1 blueWins 9879 non-null int64
2 blueWardsPlaced 9879 non-null int64
3 blueWardsDestroyed 9879 non-null int64
4 blueFirstBlood 9879 non-null int64
5 blueKills 9879 non-null int64
6 blueDeaths 9879 non-null int64
7 blueAssists 9879 non-null int64
8 blueEliteMonsters 9879 non-null int64
9 blueDragons 9879 non-null int64
10 blueHeralds 9879 non-null int64
11 blueTowersDestroyed 9879 non-null int64
12 blueTotalGold 9879 non-null int64
13 blueAvgLevel 9879 non-null float64
14 blueTotalExperience 9879 non-null int64
15 blueTotalMinionsKilled 9879 non-null int64
16 blueTotalJungleMinionsKilled 9879 non-null int64
17 blueGoldDiff 9879 non-null int64
18 blueExperienceDiff 9879 non-null int64
19 blueCSPerMin 9879 non-null float64
20 blueGoldPerMin 9879 non-null float64
21 redWardsPlaced 9879 non-null int64
22 redWardsDestroyed 9879 non-null int64
23 redFirstBlood 9879 non-null int64
24 redKills 9879 non-null int64
25 redDeaths 9879 non-null int64
26 redAssists 9879 non-null int64
27 redEliteMonsters 9879 non-null int64
28 redDragons 9879 non-null int64
29 redHeralds 9879 non-null int64
30 redTowersDestroyed 9879 non-null int64
31 redTotalGold 9879 non-null int64
32 redAvgLevel 9879 non-null float64
33 redTotalExperience 9879 non-null int64
34 redTotalMinionsKilled 9879 non-null int64
35 redTotalJungleMinionsKilled 9879 non-null int64
36 redGoldDiff 9879 non-null int64
37 redExperienceDiff 9879 non-null int64
38 redCSPerMin 9879 non-null float64
39 redGoldPerMin 9879 non-null float64
dtypes: float64(6), int64(34)
由于gameId与游戏胜利无关,因此删去
data=data.drop(['gameId'], axis=1)
在相关性矩阵的热力图中可以发现存在高度相关的变量,这些变量解释了相同的事物。因此,如果它们显示的数据与另一列相同,则它们对分类没有帮助。例如在列 RedKills(红色团队击杀的次数)和BlueDeaths(蓝队被击杀的人数)中。红队的击杀人数就是蓝队的死亡人数。因此,正确的做法是删除一个列。注:(有时在游戏redkills和bluedeaths不一定相等,因为玩家可能会被野怪和防御塔杀死,但我们的数据来自于高段位玩家,这种情况可以忽略不记)
plt.figure(figsize=(20,15))
sns.heatmap(round(data.corr(),1), cmap="coolwarm", annot=True, linewidths=.5)
plt.savefig('热力图相关性分析.jpg', bbox_inches='tight')
# data.corr():计算列与列之间的相关系数,返回相关系数矩阵
# sns.heatmap():利用seaborn绘制变量之间相关性的热力图
将数据中相关性较高的数据删去,降低分析难度
#定义一个函数 作用:找出相关系数矩阵中相关性大的一组数据,同时返回其中一列数据
def remove_redundancy(r):
to_remove = []
for i in range(len(r.columns)):
for j in range(i):
if (abs(r.iloc[i,j]) >= 1 and (r.columns[j] not in to_remove)):
print("相关性:",r.iloc[i,j], r.columns[j], r.columns[i])
to_remove.append(r.columns[i])
return to_remove
clean_data = data.drop(remove_redundancy(data.corr()), axis=1) #删去相关性较高项
这几组数据的本质是一样的,故删去
相关性: 1.000000000000002 blueTotalMinionsKilled blueCSPerMin
相关性: 1.0000000000000013 blueTotalGold blueGoldPerMin
相关性: -1.0 blueFirstBlood redFirstBlood
相关性: 1.0 blueDeaths redKills
相关性: 1.0 blueKills redDeaths
相关性: -1.0 blueGoldDiff redGoldDiff
相关性: -1.0 blueExperienceDiff redExperienceDiff
相关性: 1.0000000000000042 redTotalMinionsKilled redCSPerMin
相关性: 1.0000000000000049 redTotalGold redGoldPerMin
print("初步处理后的数据:",clean_data.columns)blueEliteMonsters redEliteMonsters
初步处理后的数据: Index(['blueWins',
'blueWardsPlaced',
'blueWardsDestroyed',
'blueFirstBlood',
'blueKills',
'blueDeaths',
'blueAssists',
'blueEliteMonsters',
'blueDragons',
'blueHeralds',
'blueTowersDestroyed',
'blueTotalGold',
'blueAvgLevel',
'blueTotalExperience',
'blueTotalMinionsKilled',
'blueTotalJungleMinionsKilled',
'blueGoldDiff',
'blueExperienceDiff',
#----------------------------------------------------------
'redWardsPlaced',
'redWardsDestroyed',
'redAssists',
'redEliteMonsters',
'redDragons',
'redHeralds',
'redTowersDestroyed',
'redTotalGold',
'redAvgLevel',
'redTotalExperience',
'redTotalMinionsKilled',
'redTotalJungleMinionsKilled'],
dtype='object')
clean_data['blueMinionsTotales'] = clean_data['blueTotalMinionsKilled'] + clean_data['blueTotalJungleMinionsKilled']
clean_data['redMinionsTotales'] = clean_data['redTotalMinionsKilled'] + clean_data['redTotalJungleMinionsKilled']
clean_data=clean_data.drop(['blueTotalMinionsKilled'], axis=1)
clean_data=clean_data.drop(['blueTotalJungleMinionsKilled'], axis=1)
clean_data=clean_data.drop(['redTotalMinionsKilled'], axis=1)
clean_data=clean_data.drop(['redTotalJungleMinionsKilled'], axis=1)
由热力图分析可知,等级和经验的相关性较高,故进行分析:
#等级和经验分析:
plt.figure(figsize=(12,12))
plt.subplot(121)
sns.scatterplot(x='blueAvgLevel', y='blueTotalExperience', hue='blueWins', data=clean_data)
plt.title('blue')
plt.xlabel('blueAvgLevel')
plt.ylabel('blueTotalExperience')
plt.grid(True)
plt.subplot(122)
sns.scatterplot(x='redAvgLevel', y='redTotalExperience', hue='blueWins', data=clean_data)
plt.title('red')
plt.xlabel('redAvgLevel')
plt.ylabel('redTotalExperience')
plt.grid(True)
plt.savefig('等级和经验分析.jpg', bbox_inches='tight')
可看出等级和经验呈线性关系,并且具有很强的相关性(见热力图),同时由于等级的差异不明显,故删去等级
#删去等级列
clean_data=clean_data.drop(['blueAvgLevel'], axis=1)
clean_data=clean_data.drop(['redAvgLevel'], axis=1)
数据可视化分析:
sns.set(font_scale=1.5)
plt.figure(figsize=(20,20))
sns.set_style("whitegrid")
# 击杀和被击杀数绘制散点图
plt.subplot(321)
sns.scatterplot(x='blueKills', y='blueDeaths', hue='blueWins', data=clean_data)
plt.title('blueKills&&blueDeaths')
plt.xlabel('blueKills')
plt.ylabel('blueDeaths')
plt.grid(True)
# 助攻数绘制散点图
plt.subplot(322)
sns.scatterplot(x='blueAssists', y='redAssists', hue='blueWins', data=clean_data)
plt.title('Assists')
plt.xlabel('blueAssists')
plt.ylabel('redAssists')
plt.tight_layout(pad=1.5)
plt.grid(True)
#双方金币数绘制散点图
plt.subplot(323)
sns.scatterplot(x='blueTotalGold', y='redTotalGold', hue='blueWins', data=clean_data)
plt.title('TotalGold')
plt.xlabel('blueTotalGold')
plt.ylabel('redTotalGold')
plt.tight_layout(pad=1.5)
plt.grid(True)
#双方经验绘制散点图
plt.subplot(324)
sns.scatterplot(x='blueTotalExperience', y='redTotalExperience', hue='blueWins', data=clean_data)
plt.title('Experience')
plt.xlabel('blueTotalExperience')
plt.ylabel('redTotalExperience')
plt.tight_layout(pad=1.5)
plt.grid(True)
# 双方插眼数量绘制散点图
plt.subplot(325)
sns.scatterplot(x='blueWardsPlaced', y='redWardsPlaced', hue='blueWins', data=clean_data)
plt.title('WardsPlaced')
plt.xlabel('blueWardsPlaced')
plt.ylabel('redWardsPlaced')
plt.tight_layout(pad=1.5)
plt.grid(True)
# 击杀的小兵和野怪总数绘制散点图
plt.subplot(326)
sns.scatterplot(x='blueMinionsTotales', y='redMinionsTotales', hue='blueWins', data=clean_data)
plt.title('MinionsTotales')
plt.xlabel('Equipo Azul')
plt.ylabel('Equipo Rojo')
plt.tight_layout(pad=1.5)
plt.grid(True)
plt.savefig('数据分析.jpg', bbox_inches='tight')
#将一些数据转换为它们的差值:
clean_data['WardsPlacedDiff'] = clean_data['blueWardsPlaced'] - clean_data['redWardsPlaced']
clean_data['WardsDestroyedDiff'] = clean_data['blueWardsDestroyed'] - clean_data['redWardsDestroyed']
clean_data['AssistsDiff'] = clean_data['blueAssists'] - clean_data['redAssists']
clean_data['blueHeraldsDiff'] = clean_data['blueHeralds'] - clean_data['redHeralds']
clean_data['blueDragonsDiff'] = clean_data['blueDragons'] - clean_data['redDragons']
clean_data['blueTowersDestroyedDiff'] = clean_data['blueTowersDestroyed'] - clean_data['redTowersDestroyed']
clean_data['EliteMonstersDiff'] = clean_data['blueEliteMonsters'] - clean_data['redEliteMonsters']
clean_data=clean_data.drop(['blueWardsPlaced'], axis=1)
clean_data=clean_data.drop(['redWardsPlaced'], axis=1)
clean_data=clean_data.drop(['blueWardsDestroyed'], axis=1)
clean_data=clean_data.drop(['redWardsDestroyed'], axis=1)
clean_data=clean_data.drop(['blueAssists'], axis=1)
clean_data=clean_data.drop(['redAssists'], axis=1)
clean_data=clean_data.drop(['blueHeralds'], axis=1)
clean_data=clean_data.drop(['redHeralds'], axis=1)
clean_data=clean_data.drop(['blueTowersDestroyed'], axis=1)
clean_data=clean_data.drop(['redTowersDestroyed'], axis=1)
clean_data=clean_data.drop(['blueDragons'], axis=1)
clean_data=clean_data.drop(['redDragons'], axis=1)
clean_data=clean_data.drop(['blueEliteMonsters'], axis=1)
clean_data=clean_data.drop(['redEliteMonsters'], axis=1)
clean_data=clean_data.drop(['redTotalGold'], axis=1)#红队金币数可由蓝队金币数与差值一起得到,故删去
clean_data=clean_data.drop(['redTotalExperience'], axis=1)#红队经验可由蓝队经验与差值一起得到,故删去
blueFirstBlood,blueDragonsDiff ,EliteMonstersDiff分析:
#一血,龙与精英怪物分析
sns.catplot(x="blueWins", y="blueGoldDiff", hue="blueFirstBlood", data=clean_data)
plt.savefig('一血.jpg', bbox_inches='tight')
sns.catplot(x="blueWins", y="blueGoldDiff", hue="blueDragonsDiff", data=clean_data)
plt.savefig('龙.jpg', bbox_inches='tight')
sns.catplot(x="blueWins", y="blueGoldDiff", hue="EliteMonstersDiff", data=clean_data)
plt.savefig('精英怪物.jpg', bbox_inches='tight')
一血:
击杀龙的差值:
击杀精英怪物的差值:
最终数据: Index(['blueWins', #蓝方是否获得胜利 1:胜利 0:失败 *****因变量****
'blueFirstBlood', #蓝方是否获得一血(游戏的第一杀) 1:获得 0:未获得
'blueKills', #蓝队杀死的敌人数量
'blueDeaths', #死亡人数(蓝队)
'blueTotalGold', #蓝队总的金币数量
'blueTotalExperience', #蓝队总的经验
'blueGoldDiff', #蓝队与红队金币差值
'blueExperienceDiff', #蓝队与红队经验差值
'blueMinionsTotales', #蓝队杀死的野怪和小兵总数量
'redMinionsTotales', #红队杀死的野怪和小兵总数量
'WardsPlacedDiff', #两队在地图上放置的视野眼数量差异
'WardsDestroyedDiff', #两队在地图上摧毁的视野眼数量差异
'AssistsDiff', #两队助攻差异
'blueHeraldsDiff', #两队杀死的精英怪物数量差异
'blueDragonsDiff', #两队杀死的龙数量差异
'blueTowersDestroyedDiff', #两队摧毁防御塔数量差异
'EliteMonstersDiff'], #两队杀死的精锐怪物数量(龙与先驱队)差异
dtype='object')
blueWins blueFirstBlood blueKills blueDeaths blueTotalGold \
0 0 1 9 6 17210
1 0 0 5 5 14712
2 0 0 7 11 16113
3 0 0 4 5 15157
4 0 0 6 6 16400
... ... ... ... ...
9874 1 1 7 4 17765
9875 1 0 6 4 16238
9876 0 0 6 7 15903
9877 0 1 2 3 14459
9878 1 1 6 6 16266
blueTotalExperience blueGoldDiff blueExperienceDiff \
0 17039 643 -8
1 16265 -2908 -1173
2 16221 -1172 -1033
3 17954 -1321 -7
4 18543 -1004 230
... ... ...
9874 18967 2519 2469
9875 19255 782 888
9876 18032 -2416 -1877
9877 17229 -839 -1085
9878 17321 927 -58
blueMinionsTotales redMinionsTotales WardsPlacedDiff \
0 231 252 13
1 217 292 0
2 232 231 0
3 256 282 28
4 267 292 58
... ... ...
9874 280 263 -29
9875 281 262 42
9876 255 321 9
9877 272 287 -52
9878 251 247 9
WardsDestroyedDiff AssistsDiff blueHeraldsDiff blueDragonsDiff \
0 -4 3 0 0
1 0 3 -1 -1
2 -3 -10 0 1
3 -1 -5 1 0
4 2 -1 0 -1
... ... ... ...
9874 -1 -2 0 1
9875 -21 5 0 1
9876 1 -6 0 -1
9877 0 2 0 1
9878 -2 1 0 -1
blueTowersDestroyedDiff EliteMonstersDiff
0 0 0
1 -1 -2
2 0 1
3 0 1
4 0 -1
... ...
9874 0 1
9875 0 1
9876 0 -1
9877 0 1
9878 0 -1
logistic回归又称logistic回归分析,是一种广义的线性回归分析模型,常用于数据挖掘,疾病自动诊断,经济预测等领域。例如,探讨引发疾病的危险因素,并根据危险因素预测疾病发生的概率等。以胃癌病情分析为例,选择两组人群,一组是胃癌组,一组是非胃癌组,两组人群必定具有不同的体征与生活方式等。因此因变量就为是否胃癌,值为“是”或“否”,自变量就可以包括很多了,如年龄、性别、饮食习惯、幽门螺杆菌感染等。自变量既可以是连续的,也可以是分类的。然后通过logistic回归分析,可以得到自变量的权重,从而可以大致了解到底哪些因素是胃癌的危险因素。同时根据该权值可以根据危险因素预测一个人患癌症的可能性。
在对LOL游戏胜利预测的分析中,有非常多的自变量,而应变量只有blueWins,即游戏是否取得胜利这一个应变量,值为“是”或“否”,因此,宜采用Logistic Regression模型进行分析.
简介:数据的标准化(normalization)是将数据按比例缩放,使之落入一个小的特定区间。在某些比较和评价的指标处理中经常会用到,去除数据的单位限制,将其转化为无量纲的纯数值,便于不同单位或量级的指标能够进行比较和加权。其中最典型的就是数据的归一化处理,即将数据统一映射到[0,1]区间上。数据集的标准化对于众多机器学习评估器来说是必须的;如果各独立特征不进行标准化,结果标准正态分布数据差距很大:比如使用均值为0、方差为1的高斯分布.
标准化的流程简单来说可以表达为:将数据按其属性(按列进行)减去其均值,然后除以其方差。最后得到的结果是,对每个属性/每列来说所有数据都聚集在0附近,方差值为1
# 创建自定义缩放器类(标准化)
class CustomScaler(BaseEstimator, TransformerMixin):
# 声明一些基本内容和信息
def __init__(self, columns, copy=True, with_mean=True, with_std=True):
# scaler是Standard Scaler对象
self.scaler = StandardScaler(copy, with_mean, with_std)
self.columns = columns
self.mean_ = None
self.var_ = None
# 基于StandardScale的拟合方法
def fit(self, X, y=None):
self.scaler.fit(X[self.columns], y)
self.mean_ = np.mean(X[self.columns])
self.var_ = np.var(X[self.columns])
return self
# 进行实际缩放的变换方法
def transform(self, X, y=None, copy=None):
# 记录列的初始顺序
init_col_order = X.columns
# 缩放创建类实例时选择的所有功能
X_scaled = pd.DataFrame(self.scaler.transform(X[self.columns]), columns=self.columns)
# 声明一个包含所有未缩放信息的变量
X_not_scaled = X.loc[:, ~X.columns.isin(self.columns)]
# 返回包含所有已缩放要素和所有未缩放要素的数据框
return pd.concat([X_not_scaled, X_scaled], axis=1)[init_col_order]
# 数据缩放要忽略的列
columns_to_omit = ['blueFirstBlood'] # 忽略一血,因为它是分类变量
# 根据要缩放的列创建列表
columns_to_scale = [x for x in unscaled_inputs.columns.values if x not in columns_to_omit]
blue_scaler = CustomScaler(columns_to_scale)
blue_scaler.fit(unscaled_inputs)
scaled_inputs = blue_scaler.transform(unscaled_inputs)
pd.set_option('display.width', 80) # 设置Console每一行展示的最大宽度,屏幕一行显示满之后才会进行换行
print("标准化处理后的数据:", scaled_inputs)
标准化处理后的数据:blueFirstBlood blueKills blueDeaths blueTotalGold \
0 1 0.935301 -0.046926 0.460179
1 0 -0.393216 -0.387796 -1.166792
2 0 0.271042 1.657424 -0.254307
3 0 -0.725346 -0.387796 -0.876959
4 0 -0.061087 -0.046926 -0.067382
... ... ... ...
9874 1 0.271042 -0.728666 0.821656
9875 0 -0.061087 -0.728666 -0.172894
9876 0 -0.061087 0.293944 -0.391082
9877 1 -1.389604 -1.069536 -1.331573
9878 1 -0.061087 -0.046926 -0.154657
blueTotalExperience blueGoldDiff blueExperienceDiff \
0 -0.740639 0.256228 0.013342
1 -1.385391 -1.191254 -0.593342
2 -1.422043 -0.483614 -0.520436
3 0.021567 -0.544350 0.013863
4 0.512211 -0.415133 0.137283
... ... ...
9874 0.865408 1.020936 1.303263
9875 1.105315 0.312888 0.479942
9876 0.086541 -0.990702 -0.959957
9877 -0.582367 -0.347874 -0.547516
9878 -0.505730 0.371994 -0.012696
blueMinionsTotales redMinionsTotales WardsPlacedDiff \
0 -1.419968 -0.651842 0.503853
1 -1.968987 0.912988 0.003069
2 -1.380753 -1.473378 0.003069
3 -0.439577 0.521780 1.081682
4 -0.008205 0.912988 2.237338
... ... ...
9874 0.501598 -0.221514 -1.114066
9875 0.540814 -0.260635 1.620988
9876 -0.478793 2.047489 0.349766
9877 0.187873 0.717384 -2.000069
9878 -0.635655 -0.847446 0.349766
WardsDestroyedDiff AssistsDiff blueHeraldsDiff blueDragonsDiff \
0 -1.436801 0.523196 -0.047412 0.058162
1 -0.035635 0.523196 -1.744448 -1.079624
2 -1.086510 -1.731206 -0.047412 1.195948
3 -0.385927 -0.864129 1.649624 0.058162
4 0.664947 -0.170466 -0.047412 -1.079624
... ... ... ...
9874 -0.385927 -0.343882 -0.047412 1.195948
9875 -7.391756 0.870027 -0.047412 1.195948
9876 0.314656 -1.037544 -0.047412 -1.079624
9877 -0.035635 0.349780 -0.047412 1.195948
9878 -0.736218 0.176365 -0.047412 -1.079624
blueTowersDestroyedDiff EliteMonstersDiff
0 -0.025866 0.021707
1 -3.104510 -1.851163
2 -0.025866 0.958142
3 -0.025866 0.958142
4 -0.025866 -0.914728
... ...
9874 -0.025866 0.958142
9875 -0.025866 0.958142
9876 -0.025866 -0.914728
9877 -0.025866 0.958142
9878 -0.025866 -0.914728
FutureWarning警告不影响代码运行,可忽略
#数据切片
x_train, x_test, y_train, y_test = train_test_split(scaled_inputs, target, train_size=0.8, random_state=2)
print("训练数据:",x_train.shape,y_train.shape,"测试数据:",x_test.shape,y_test.shape)
训练数据: (7903, 16) (7903, 1) 测试数据 (1976, 16) (1976, 1)
#模型训练
reg = LogisticRegression()
reg.fit(x_train, y_train)
#创建一个汇总表以可视化变量以及各自的系数和几率
variables = unscaled_inputs.columns.values
summary_table = pd.DataFrame(columns=['Variables'], data = variables)
summary_table['Coef'] = np.transpose(reg.coef_)
# add the intercept at index 0
summary_table.index = summary_table.index + 1
summary_table.loc[0] = ['Intercept', reg.intercept_[0]]
# calculate the Odds Ratio and add to the table
summary_table['Odds Ratio'] = np.exp(summary_table.Coef)
summary_table.sort_values(by=['Odds Ratio'], ascending=False)
可视化变量:
模型变量评价: Variables Coef Odds Ratio
6 blueGoldDiff 1.211278 3.357772
7 blueExperienceDiff 0.473859 1.606180
14 blueDragonsDiff 0.181283 1.198754
9 redMinionsTotales 0.156352 1.169237
16 EliteMonstersDiff 0.143389 1.154178
3 blueDeaths 0.071080 1.073667
4 blueTotalGold 0.064979 1.067136
1 blueFirstBlood 0.062815 1.064830
11 WardsDestroyedDiff 0.031517 1.032019
10 WardsPlacedDiff 0.002818 1.002822
5 blueTotalExperience -0.002409 0.997594
0 Intercept -0.029890 0.970552
13 blueHeraldsDiff -0.045495 0.955524
8 blueMinionsTotales -0.079767 0.923332
12 AssistsDiff -0.092587 0.911570
15 blueTowersDestroyedDiff -0.110353 0.895518
2 blueKills -0.114022 0.892239
# 模型测试
print("训练数据评分:", reg.score(x_train, y_train))
print("训练数据评分:", reg.score(x_test, y_test))
#将测试结果写入到原始数据集中
predicted_prob = reg.predict_proba(x_test)
data['predicted'] = reg.predict_proba(scaled_inputs)[:, 1]
print("经过预测后的包含预测结果的完整数据集:", data)
#原始数据和胜率分析对比
col_n = ['blueWins','predicted']
a = pd.DataFrame(data,columns = col_n)
print("原始数据和胜率分析对比:", a)
训练数据评分: 0.7327597115019613
训练数据评分: 0.7358299595141701
可见两个数据的模型评分都非常相似,说明模型拟合得还不错
在LOL等MOBA(多人在线战斗竞技场)中,一局游戏的胜利受到非常多因素的影响,在这类团队竞技游戏中,游戏的胜利与否非常考验玩家的操作,意识和相互之间的配合,由于游戏的参与者是人,因此会存在许多未知因素并会受到许多不确定因素的影响,如玩家的心情,状态甚至是网络情况等.
因此,对此类游戏的胜利预测不可能达到100%的准确度,可见以上模型的拟合结果还过得去
结论及规律分析:
由以上的分析可看出,一血,龙,小兵野怪等都对经济有一定的贡献率.在一局比赛的第10分钟时,影响双方胜率最大的因素是blueGoldDiff(经济差),并且当双方的经济差增加一个标准化单位的时候,胜率增加235%,同时,双方的经验差blueExperienceDiff对数据也有很大影响,当双方经验差增加一个标准化单位的时候,胜率增加60.6%
在比赛中击杀一条龙的时候,可以为你的胜率增加20%左右,同时击杀EliteMonstersDiff(精英怪物)也可以为比赛带来比较大的助力.但在分析结果中也出现了一些反常现象,在一局比赛中,如果对方击杀的野怪和小兵数redMinionsTotales越高,我方击杀的野怪和小兵数blueMinionsTotales越少,则我方的胜率越高,我猜测可能是由于胜率高的一方在前期更喜欢进行团战取得经济优势的原因,在比赛中blueKills击杀数反而与胜率呈反比,我猜测可能是由于这是一个推塔游戏,而击杀数比较高的一些队伍可能更关注于击杀而忽略了推塔
因此,在游戏时要注意与对方拉开经济差,经济差越大,则胜率越高.
原始数据集及胜率预测结果:
经过预测后的包含预测结果的完整数据集: blueWins blueWardsPlaced blueWardsDestroyed blueFirstBlood \
0 0 28 2 1
1 0 12 1 0
2 0 15 0 0
3 0 43 1 0
4 0 75 4 0
... ... ... ...
9874 1 17 2 1
9875 1 54 0 0
9876 0 23 1 0
9877 0 14 4 1
9878 1 18 0 1
blueKills blueDeaths blueAssists blueEliteMonsters blueDragons \
0 9 6 11 0 0
1 5 5 5 0 0
2 7 11 4 1 1
3 4 5 5 1 0
4 6 6 6 0 0
... ... ... ... ...
9874 7 4 5 1 1
9875 6 4 8 1 1
9876 6 7 5 0 0
9877 2 3 3 1 1
9878 6 6 5 0 0
blueHeralds blueTowersDestroyed blueTotalGold blueAvgLevel \
0 0 0 17210 6.6
1 0 0 14712 6.6
2 0 0 16113 6.4
3 1 0 15157 7.0
4 0 0 16400 7.0
... ... ... ...
9874 0 0 17765 7.2
9875 0 0 16238 7.2
9876 0 0 15903 7.0
9877 0 0 14459 6.6
9878 0 0 16266 7.0
blueTotalExperience blueTotalMinionsKilled \
0 17039 195
1 16265 174
2 16221 186
3 17954 201
4 18543 210
... ...
9874 18967 211
9875 19255 233
9876 18032 210
9877 17229 224
9878 17321 207
blueTotalJungleMinionsKilled blueGoldDiff blueExperienceDiff \
0 36 643 -8
1 43 -2908 -1173
2 46 -1172 -1033
3 55 -1321 -7
4 57 -1004 230
... ... ...
9874 69 2519 2469
9875 48 782 888
9876 45 -2416 -1877
9877 48 -839 -1085
9878 44 927 -58
blueCSPerMin blueGoldPerMin redWardsPlaced redWardsDestroyed \
0 19.5 1721.0 15 6
1 17.4 1471.2 12 1
2 18.6 1611.3 15 3
3 20.1 1515.7 15 2
4 21.0 1640.0 17 2
... ... ... ...
9874 21.1 1776.5 46 3
9875 23.3 1623.8 12 21
9876 21.0 1590.3 14 0
9877 22.4 1445.9 66 4
9878 20.7 1626.6 9 2
redFirstBlood redKills redDeaths redAssists redEliteMonsters \
0 0 6 9 8 0
1 1 5 5 2 2
2 1 11 7 14 0
3 1 5 4 10 0
4 1 6 6 7 1
... ... ... ... ...
9874 0 4 7 7 0
9875 1 4 6 3 0
9876 1 7 6 11 1
9877 0 3 2 1 0
9878 0 6 6 4 1
redDragons redHeralds redTowersDestroyed redTotalGold redAvgLevel \
0 0 0 0 16567 6.8
1 1 1 1 17620 6.8
2 0 0 0 17285 6.8
3 0 0 0 16478 7.0
4 1 0 0 17404 7.0
... ... ... ... ...
9874 0 0 0 15246 6.8
9875 0 0 0 15456 7.0
9876 1 0 0 18319 7.4
9877 0 0 0 15298 7.2
9878 1 0 0 15339 6.8
redTotalExperience redTotalMinionsKilled redTotalJungleMinionsKilled \
0 17047 197 55
1 17438 240 52
2 17254 203 28
3 17961 235 47
4 18313 225 67
... ... ...
9874 16498 229 34
9875 18367 206 56
9876 19909 261 60
9877 18314 247 40
9878 17379 201 46
redGoldDiff redExperienceDiff redCSPerMin redGoldPerMin predicted
0 -643 8 19.7 1656.7 0.549459
1 2908 1173 24.0 1762.0 0.170271
2 1172 1033 20.3 1728.5 0.387228
3 1321 7 23.5 1647.8 0.393685
4 1004 -230 22.5 1740.4 0.356486
... ... ... ... ...
9874 -2519 -2469 22.9 1524.6 0.892957
9875 -782 -888 20.6 1545.6 0.610317
9876 2416 1877 26.1 1831.9 0.178771
9877 839 1085 24.7 1529.8 0.433183
9878 -927 58 20.1 1533.9 0.511128
原始数据和胜率分析对比:
blueWins predicted
0 0 0.549459
1 0 0.170271
2 0 0.387228
3 0 0.393685
4 0 0.356486
... ...
9874 1 0.892957
9875 1 0.610317
9876 0 0.178771
9877 0 0.433183
9878 1 0.511128
1762.0 0.170271
2 1172 1033 20.3 1728.5 0.387228
3 1321 7 23.5 1647.8 0.393685
4 1004 -230 22.5 1740.4 0.356486
… … … … …
9874 -2519 -2469 22.9 1524.6 0.892957
9875 -782 -888 20.6 1545.6 0.610317
9876 2416 1877 26.1 1831.9 0.178771
9877 839 1085 24.7 1529.8 0.433183
9878 -927 58 20.1 1533.9 0.511128
**原始数据和胜率分析对比:**
```python
blueWins predicted
0 0 0.549459
1 0 0.170271
2 0 0.387228
3 0 0.393685
4 0 0.356486
... ...
9874 1 0.892957
9875 1 0.610317
9876 0 0.178771
9877 0 0.433183
9878 1 0.511128