本次项目分析所用数据集来源于kaggle,官网未提出明确的任务需求。但基于国内游戏行业迅猛发展趋势,可以通过不同国家、发布者、游戏类型等因素,对销售数据进行分析,提出有效的建议,帮助电子游戏销量提高。
(1)电子游戏行业近年来的发展状况
(2)电子游戏市场分析:受欢迎的游戏、类型、发布平台、发行人等;
(3)top发行商的主导什么类型游戏
(4)【高级】预测每年电子游戏销售额。
#导入库
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
#读取数据
data = pd.read_csv('vgsales.csv')
display('{}records in the dataset'.format(len(data)))
data.head(5)
'16598records in the dataset'
Rank | Name | Platform | Year | Genre | Publisher | NA_Sales | EU_Sales | JP_Sales | Other_Sales | Global_Sales | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Wii Sports | Wii | 2006.0 | Sports | Nintendo | 41.49 | 29.02 | 3.77 | 8.46 | 82.74 |
1 | 2 | Super Mario Bros. | NES | 1985.0 | Platform | Nintendo | 29.08 | 3.58 | 6.81 | 0.77 | 40.24 |
2 | 3 | Mario Kart Wii | Wii | 2008.0 | Racing | Nintendo | 15.85 | 12.88 | 3.79 | 3.31 | 35.82 |
3 | 4 | Wii Sports Resort | Wii | 2009.0 | Sports | Nintendo | 15.75 | 11.01 | 3.28 | 2.96 | 33.00 |
4 | 5 | Pokemon Red/Pokemon Blue | GB | 1996.0 | Role-Playing | Nintendo | 11.27 | 8.89 | 10.22 | 1.00 | 31.37 |
可以看出:
#查看数据信息
data.info()
RangeIndex: 16598 entries, 0 to 16597
Data columns (total 11 columns):
Rank 16598 non-null int64
Name 16598 non-null object
Platform 16598 non-null object
Year 16327 non-null float64
Genre 16598 non-null object
Publisher 16540 non-null object
NA_Sales 16598 non-null float64
EU_Sales 16598 non-null float64
JP_Sales 16598 non-null float64
Other_Sales 16598 non-null float64
Global_Sales 16598 non-null float64
dtypes: float64(6), int64(1), object(4)
memory usage: 1.4+ MB
data.isnull().sum()
Rank 0
Name 0
Platform 0
Year 271
Genre 0
Publisher 58
NA_Sales 0
EU_Sales 0
JP_Sales 0
Other_Sales 0
Global_Sales 0
dtype: int64
#删除缺失值
data.dropna(inplace=True)
#重置序号
data.reset_index(drop=True,inplace=True)
data.head(10)
Rank | Name | Platform | Year | Genre | Publisher | NA_Sales | EU_Sales | JP_Sales | Other_Sales | Global_Sales | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Wii Sports | Wii | 2006.0 | Sports | Nintendo | 41.49 | 29.02 | 3.77 | 8.46 | 82.74 |
1 | 2 | Super Mario Bros. | NES | 1985.0 | Platform | Nintendo | 29.08 | 3.58 | 6.81 | 0.77 | 40.24 |
2 | 3 | Mario Kart Wii | Wii | 2008.0 | Racing | Nintendo | 15.85 | 12.88 | 3.79 | 3.31 | 35.82 |
3 | 4 | Wii Sports Resort | Wii | 2009.0 | Sports | Nintendo | 15.75 | 11.01 | 3.28 | 2.96 | 33.00 |
4 | 5 | Pokemon Red/Pokemon Blue | GB | 1996.0 | Role-Playing | Nintendo | 11.27 | 8.89 | 10.22 | 1.00 | 31.37 |
5 | 6 | Tetris | GB | 1989.0 | Puzzle | Nintendo | 23.20 | 2.26 | 4.22 | 0.58 | 30.26 |
6 | 7 | New Super Mario Bros. | DS | 2006.0 | Platform | Nintendo | 11.38 | 9.23 | 6.50 | 2.90 | 30.01 |
7 | 8 | Wii Play | Wii | 2006.0 | Misc | Nintendo | 14.03 | 9.20 | 2.93 | 2.85 | 29.02 |
8 | 9 | New Super Mario Bros. Wii | Wii | 2009.0 | Platform | Nintendo | 14.59 | 7.06 | 4.70 | 2.26 | 28.62 |
9 | 10 | Duck Hunt | NES | 1984.0 | Shooter | Nintendo | 26.93 | 0.63 | 0.28 | 0.47 | 28.31 |
#描述性统计
data.describe()
Rank | Year | NA_Sales | EU_Sales | JP_Sales | Other_Sales | Global_Sales | |
---|---|---|---|---|---|---|---|
count | 16291.000000 | 16291.000000 | 16291.000000 | 16291.000000 | 16291.000000 | 16291.000000 | 16291.000000 |
mean | 8290.190228 | 2006.405561 | 0.265647 | 0.147731 | 0.078833 | 0.048426 | 0.540910 |
std | 4792.654450 | 5.832412 | 0.822432 | 0.509303 | 0.311879 | 0.190083 | 1.567345 |
min | 1.000000 | 1980.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.010000 |
25% | 4132.500000 | 2003.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.060000 |
50% | 8292.000000 | 2007.000000 | 0.080000 | 0.020000 | 0.000000 | 0.010000 | 0.170000 |
75% | 12439.500000 | 2010.000000 | 0.240000 | 0.110000 | 0.040000 | 0.040000 | 0.480000 |
max | 16600.000000 | 2020.000000 | 41.490000 | 29.020000 | 10.220000 | 10.570000 | 82.740000 |
#对数据类型为object数据描述
data.describe(include='O')
Name | Platform | Genre | Publisher | |
---|---|---|---|---|
count | 16291 | 16291 | 16291 | 16291 |
unique | 11325 | 31 | 12 | 576 |
top | Need for Speed: Most Wanted | DS | Action | Electronic Arts |
freq | 12 | 2131 | 3251 | 1339 |
可以看出,共有11325款不同游戏,31个平台,12种游戏类型,576个发行商。
#显示电子游戏历史发行量
plt.figure(figsize=(10,5))
data[['Year','Name']].groupby('Year')[['Name']].count().plot()
plt.figure(figsize=(10,10))
data[['Year','NA_Sales','EU_Sales','JP_Sales','Other_Sales','Global_Sales']].groupby('Year')[['NA_Sales','EU_Sales','JP_Sales','Other_Sales','Global_Sales']].sum().plot()
#使用pandas数据透视表:pivot_table()
#游戏类型排序
Genre_df = data.pivot_table(index='Genre',values='Global_Sales',aggfunc=np.sum).sort_values('Global_Sales',ascending=False)
#平台排序
Platform_df = data.pivot_table(index = 'Platform',values='Global_Sales',aggfunc=np.sum).sort_values('Global_Sales',ascending=False)
#发行人排序,前15名
Publisher_df = data.pivot_table(index = 'Publisher',values='Global_Sales',aggfunc=np.sum).sort_values('Global_Sales',ascending=False).iloc[0:15]
fig,(ax1,ax2,ax3) = plt.subplots(3,1,figsize=(12,20)) #fig,(ax1,ax2,ax3) axes用元组,防止赋值不足
Genre_df.plot.bar(color='r',ax=ax1)
Platform_df.plot.bar(ax=ax2)
Publisher_df.plot.bar(ax=ax3)
plt.tight_layout()
Genre_NA_df = data.pivot_table(index=['Genre'],values='NA_Sales',aggfunc=np.sum).sort_values('NA_Sales',ascending=False)
#print(Genre_NA_df)
Genre_EU_df = data.pivot_table(index=['Genre'],values='EU_Sales',aggfunc=np.sum).sort_values('EU_Sales',ascending=False)
Genre_JP_df = data.pivot_table(index=['Genre'],values='JP_Sales',aggfunc=np.sum).sort_values('JP_Sales',ascending=False)
Genre_Other_df = data.pivot_table(index=['Genre'],values='Other_Sales',aggfunc=np.sum).sort_values('Other_Sales',ascending=False)
Genre_name_NA = Genre_NA_df._stat_axis.values.tolist() #提取游戏类型字段
#Genre_name
Genre_name_EU = Genre_EU_df._stat_axis.values.tolist()
Genre_name_JP = Genre_JP_df._stat_axis.values.tolist()
Genre_name_Ot = Genre_Other_df._stat_axis.values.tolist()
plt.figure(figsize=(10,10))
plt.subplot(221)
plt.pie(x=Genre_NA_df,labels=Genre_name_NA,autopct='%0.2f%%')
plt.title('sales of all types of video game genre in NA')
plt.subplot(222)
plt.pie(x=Genre_EU_df,labels=Genre_name_EU,autopct='%0.2f%%')
plt.title('sales of all types of video game genre in EU')
plt.subplot(223)
plt.pie(x=Genre_JP_df,labels=Genre_name_JP,autopct='%0.2f%%')
plt.title('sales of all types of video game genre in JP')
plt.subplot(224)
plt.pie(x=Genre_Other_df,labels=Genre_name_Ot,autopct='%0.2f%%')
plt.title('sales of all types of video game genre in Other')
plt.tight_layout()
top_p = ['Nintendo','Electronic Arts','Activision','Sony Computer Entertainment','Ubisoft']
top_p_df = data[data['Publisher'].isin(top_p)]
top_p_df
top5_genre = pd.pivot_table(data=top_p_df,index=['Genre','Publisher'],values=['NA_Sales','EU_Sales','JP_Sales','Other_Sales','Global_Sales'],aggfunc=np.sum)
order = ['NA_Sales','EU_Sales','JP_Sales','Other_Sales','Global_Sales'] #调整列的顺序
new_top5_genre = top5_genre[order]
new_top5_genre
#new_top5_genre.sort_values(by=['Global_Sales'],ascending=False).groupby(by=['Genre','Publisher'])
NA_Sales | EU_Sales | JP_Sales | Other_Sales | Global_Sales | ||
---|---|---|---|---|---|---|
Genre | Publisher | |||||
Action | Activision | 86.83 | 39.99 | 1.08 | 13.81 | 141.82 |
Electronic Arts | 54.23 | 43.92 | 2.49 | 14.71 | 115.34 | |
Nintendo | 63.49 | 29.02 | 29.16 | 6.48 | 128.10 | |
Sony Computer Entertainment | 46.09 | 29.84 | 3.96 | 14.58 | 94.52 | |
Ubisoft | 69.38 | 54.10 | 2.11 | 17.37 | 142.94 | |
Adventure | Activision | 4.04 | 0.87 | 0.10 | 0.41 | 5.42 |
Electronic Arts | 2.57 | 1.65 | 0.08 | 0.44 | 4.75 | |
Nintendo | 17.72 | 7.66 | 9.01 | 1.28 | 35.71 | |
Sony Computer Entertainment | 4.57 | 4.39 | 2.73 | 1.85 | 13.55 | |
Ubisoft | 11.41 | 8.08 | 0.28 | 2.43 | 22.19 | |
Fighting | Activision | 1.94 | 0.76 | 0.00 | 0.15 | 2.86 |
Electronic Arts | 19.45 | 8.65 | 0.03 | 2.76 | 30.85 | |
Nintendo | 27.10 | 8.64 | 14.94 | 2.65 | 53.35 | |
Sony Computer Entertainment | 10.30 | 8.53 | 7.24 | 2.08 | 28.15 | |
Ubisoft | 3.89 | 1.95 | 0.41 | 0.66 | 6.90 | |
Misc | Activision | 48.68 | 17.63 | 0.05 | 10.07 | 76.55 |
Electronic Arts | 11.29 | 6.49 | 0.01 | 1.92 | 19.74 | |
Nintendo | 61.98 | 51.62 | 55.25 | 11.78 | 180.67 | |
Sony Computer Entertainment | 25.47 | 33.81 | 5.58 | 15.92 | 80.80 | |
Ubisoft | 58.49 | 29.65 | 0.29 | 9.17 | 97.53 | |
Platform | Activision | 23.45 | 7.85 | 0.07 | 2.01 | 33.40 |
Electronic Arts | 3.16 | 2.55 | 0.05 | 0.74 | 6.53 | |
Nintendo | 219.46 | 84.90 | 102.36 | 19.43 | 426.18 | |
Sony Computer Entertainment | 53.04 | 32.87 | 7.57 | 10.50 | 104.06 | |
Ubisoft | 10.05 | 9.03 | 0.04 | 1.93 | 21.06 | |
Puzzle | Activision | 0.78 | 0.11 | 0.15 | 0.02 | 1.06 |
Electronic Arts | 2.66 | 1.43 | 0.00 | 0.46 | 4.55 | |
Nintendo | 55.74 | 26.42 | 37.09 | 5.53 | 124.88 | |
Sony Computer Entertainment | 0.64 | 0.78 | 1.97 | 0.34 | 3.71 | |
Ubisoft | 2.63 | 3.05 | 0.03 | 0.57 | 6.33 | |
Racing | Activision | 11.88 | 3.64 | 0.04 | 1.35 | 16.99 |
Electronic Arts | 75.52 | 51.20 | 1.20 | 17.90 | 145.77 | |
Nintendo | 73.55 | 39.75 | 29.22 | 8.81 | 151.30 | |
Sony Computer Entertainment | 42.43 | 35.42 | 13.89 | 18.82 | 110.57 | |
Ubisoft | 7.73 | 6.27 | 0.10 | 1.71 | 15.83 | |
Role-Playing | Activision | 23.58 | 18.40 | 0.26 | 4.55 | 46.79 |
Electronic Arts | 17.82 | 11.39 | 2.66 | 3.42 | 35.30 | |
Nintendo | 105.63 | 63.92 | 101.95 | 13.03 | 284.57 | |
Sony Computer Entertainment | 15.90 | 9.21 | 16.08 | 2.84 | 44.00 | |
Ubisoft | 9.06 | 4.05 | 2.15 | 1.48 | 16.76 | |
Shooter | Activision | 159.15 | 96.86 | 4.64 | 34.66 | 295.40 |
Electronic Arts | 81.15 | 56.03 | 2.93 | 18.20 | 158.26 | |
Nintendo | 51.39 | 9.85 | 6.03 | 2.39 | 69.69 | |
Sony Computer Entertainment | 31.22 | 16.70 | 2.80 | 6.86 | 57.52 | |
Ubisoft | 35.01 | 24.04 | 0.96 | 7.57 | 67.65 | |
Simulation | Activision | 5.33 | 2.14 | 0.02 | 0.80 | 8.26 |
Electronic Arts | 44.03 | 35.83 | 0.88 | 8.72 | 89.53 | |
Nintendo | 29.70 | 26.05 | 23.65 | 5.86 | 85.25 | |
Sony Computer Entertainment | 3.13 | 2.40 | 2.38 | 0.75 | 8.67 | |
Ubisoft | 27.87 | 11.67 | 0.79 | 4.05 | 44.48 | |
Sports | Activision | 52.19 | 17.53 | 0.13 | 5.39 | 75.16 |
Electronic Arts | 263.50 | 144.14 | 3.21 | 57.69 | 468.69 | |
Nintendo | 98.77 | 66.18 | 35.87 | 17.18 | 218.01 | |
Sony Computer Entertainment | 32.09 | 12.90 | 8.86 | 5.58 | 59.39 | |
Ubisoft | 14.06 | 7.11 | 0.08 | 2.10 | 23.42 | |
Strategy | Activision | 8.16 | 7.94 | 0.00 | 1.57 | 17.70 |
Electronic Arts | 8.84 | 4.10 | 0.44 | 0.67 | 14.08 | |
Nintendo | 11.22 | 4.29 | 10.46 | 0.77 | 26.72 | |
Sony Computer Entertainment | 0.34 | 0.70 | 1.04 | 0.28 | 2.34 | |
Ubisoft | 3.23 | 4.03 | 0.09 | 1.12 | 8.45 |
top_p = ['Nintendo','Electronic Arts','Activision','Sony Computer Entertainment','Ubisoft']
top_p_df = data[data['Publisher'].isin(top_p)]
top5_genre = top_p_df[['Genre','Publisher','NA_Sales','EU_Sales','JP_Sales','Other_Sales','Global_Sales']]
top5_genre
new_top5_genre = top5_genre.groupby(by=['Genre','Publisher'])[['NA_Sales','EU_Sales','JP_Sales','Other_Sales','Global_Sales']].sum()
new_top5_genre
new_top5_genre.groupby(by='Genre').apply(lambda x:x.sort_values('Global_Sales',ascending=False))
NA_Sales | EU_Sales | JP_Sales | Other_Sales | Global_Sales | |||
---|---|---|---|---|---|---|---|
Genre | Genre | Publisher | |||||
Action | Action | Ubisoft | 69.38 | 54.10 | 2.11 | 17.37 | 142.94 |
Activision | 86.83 | 39.99 | 1.08 | 13.81 | 141.82 | ||
Nintendo | 63.49 | 29.02 | 29.16 | 6.48 | 128.10 | ||
Electronic Arts | 54.23 | 43.92 | 2.49 | 14.71 | 115.34 | ||
Sony Computer Entertainment | 46.09 | 29.84 | 3.96 | 14.58 | 94.52 | ||
Adventure | Adventure | Nintendo | 17.72 | 7.66 | 9.01 | 1.28 | 35.71 |
Ubisoft | 11.41 | 8.08 | 0.28 | 2.43 | 22.19 | ||
Sony Computer Entertainment | 4.57 | 4.39 | 2.73 | 1.85 | 13.55 | ||
Activision | 4.04 | 0.87 | 0.10 | 0.41 | 5.42 | ||
Electronic Arts | 2.57 | 1.65 | 0.08 | 0.44 | 4.75 | ||
Fighting | Fighting | Nintendo | 27.10 | 8.64 | 14.94 | 2.65 | 53.35 |
Electronic Arts | 19.45 | 8.65 | 0.03 | 2.76 | 30.85 | ||
Sony Computer Entertainment | 10.30 | 8.53 | 7.24 | 2.08 | 28.15 | ||
Ubisoft | 3.89 | 1.95 | 0.41 | 0.66 | 6.90 | ||
Activision | 1.94 | 0.76 | 0.00 | 0.15 | 2.86 | ||
Misc | Misc | Nintendo | 61.98 | 51.62 | 55.25 | 11.78 | 180.67 |
Ubisoft | 58.49 | 29.65 | 0.29 | 9.17 | 97.53 | ||
Sony Computer Entertainment | 25.47 | 33.81 | 5.58 | 15.92 | 80.80 | ||
Activision | 48.68 | 17.63 | 0.05 | 10.07 | 76.55 | ||
Electronic Arts | 11.29 | 6.49 | 0.01 | 1.92 | 19.74 | ||
Platform | Platform | Nintendo | 219.46 | 84.90 | 102.36 | 19.43 | 426.18 |
Sony Computer Entertainment | 53.04 | 32.87 | 7.57 | 10.50 | 104.06 | ||
Activision | 23.45 | 7.85 | 0.07 | 2.01 | 33.40 | ||
Ubisoft | 10.05 | 9.03 | 0.04 | 1.93 | 21.06 | ||
Electronic Arts | 3.16 | 2.55 | 0.05 | 0.74 | 6.53 | ||
Puzzle | Puzzle | Nintendo | 55.74 | 26.42 | 37.09 | 5.53 | 124.88 |
Ubisoft | 2.63 | 3.05 | 0.03 | 0.57 | 6.33 | ||
Electronic Arts | 2.66 | 1.43 | 0.00 | 0.46 | 4.55 | ||
Sony Computer Entertainment | 0.64 | 0.78 | 1.97 | 0.34 | 3.71 | ||
Activision | 0.78 | 0.11 | 0.15 | 0.02 | 1.06 | ||
Racing | Racing | Nintendo | 73.55 | 39.75 | 29.22 | 8.81 | 151.30 |
Electronic Arts | 75.52 | 51.20 | 1.20 | 17.90 | 145.77 | ||
Sony Computer Entertainment | 42.43 | 35.42 | 13.89 | 18.82 | 110.57 | ||
Activision | 11.88 | 3.64 | 0.04 | 1.35 | 16.99 | ||
Ubisoft | 7.73 | 6.27 | 0.10 | 1.71 | 15.83 | ||
Role-Playing | Role-Playing | Nintendo | 105.63 | 63.92 | 101.95 | 13.03 | 284.57 |
Activision | 23.58 | 18.40 | 0.26 | 4.55 | 46.79 | ||
Sony Computer Entertainment | 15.90 | 9.21 | 16.08 | 2.84 | 44.00 | ||
Electronic Arts | 17.82 | 11.39 | 2.66 | 3.42 | 35.30 | ||
Ubisoft | 9.06 | 4.05 | 2.15 | 1.48 | 16.76 | ||
Shooter | Shooter | Activision | 159.15 | 96.86 | 4.64 | 34.66 | 295.40 |
Electronic Arts | 81.15 | 56.03 | 2.93 | 18.20 | 158.26 | ||
Nintendo | 51.39 | 9.85 | 6.03 | 2.39 | 69.69 | ||
Ubisoft | 35.01 | 24.04 | 0.96 | 7.57 | 67.65 | ||
Sony Computer Entertainment | 31.22 | 16.70 | 2.80 | 6.86 | 57.52 | ||
Simulation | Simulation | Electronic Arts | 44.03 | 35.83 | 0.88 | 8.72 | 89.53 |
Nintendo | 29.70 | 26.05 | 23.65 | 5.86 | 85.25 | ||
Ubisoft | 27.87 | 11.67 | 0.79 | 4.05 | 44.48 | ||
Sony Computer Entertainment | 3.13 | 2.40 | 2.38 | 0.75 | 8.67 | ||
Activision | 5.33 | 2.14 | 0.02 | 0.80 | 8.26 | ||
Sports | Sports | Electronic Arts | 263.50 | 144.14 | 3.21 | 57.69 | 468.69 |
Nintendo | 98.77 | 66.18 | 35.87 | 17.18 | 218.01 | ||
Activision | 52.19 | 17.53 | 0.13 | 5.39 | 75.16 | ||
Sony Computer Entertainment | 32.09 | 12.90 | 8.86 | 5.58 | 59.39 | ||
Ubisoft | 14.06 | 7.11 | 0.08 | 2.10 | 23.42 | ||
Strategy | Strategy | Nintendo | 11.22 | 4.29 | 10.46 | 0.77 | 26.72 |
Activision | 8.16 | 7.94 | 0.00 | 1.57 | 17.70 | ||
Electronic Arts | 8.84 | 4.10 | 0.44 | 0.67 | 14.08 | ||
Ubisoft | 3.23 | 4.03 | 0.09 | 1.12 | 8.45 | ||
Sony Computer Entertainment | 0.34 | 0.70 | 1.04 | 0.28 | 2.34 |
视频游戏销量在1995年开始逐渐上升,到2009年左右达到顶峰,此后开始阶段性下滑。
在视频游戏繁荣时期,action类游戏成为大众欢迎的类型,其次是运动类和射击类。
PS2平台在全球销量遥遥领先。
在销量方面,任天堂,EA,暴雪占据前三,且差距显著。任天堂在大部分类型游戏领域都处于龙头地位,占比第一;EA在运动和模拟领域站稳脚跟,暴雪则在射击领域一马当先。
各大游戏厂商应该进一步挖掘销量下滑原因,是否是手游、网游等影响,并针对性采取措施。
关于【高级】预测每年电子游戏销售额。等学习了statsmodels等相关知识再来分析。