Kaggle赛题-“疯狂的三月”NACC篮球赛预测(2)

上一篇对本赛题的基础数据做了一些简单地了解,并做了一些探索性可视化分析。

本节继续对比赛的详细数据进行探索分析,得到一些更棒更有趣的结论。

Game detail infomation visualization(比赛详细统计数据可视化)

  • WFGM:投篮命中数。
  • WFGA:未命中数。
  • WFGM3:3分命中数,注意这部分数据是包含在WFGM中的。
  • WFGA3:3分未命中数。
  • WFTM:罚球命中数。
  • WFTA:罚球未命中数。
  • WOR:进攻篮板。
  • WDR:防守篮板。
  • WAst:助攻数。
  • WTO:失误数。
  • WStl:抢断数。
  • WBlk:盖帽数。
  • WPF:个人犯规数。

分别对常规赛、NACC赛(除决赛)、NACC决赛的以上数据进行均值统计 并进行可视化

cols = ['TeamID','Score','FGM','FGA','FGM3','FGA3','FTM','FTA','OR','DR','Ast','TO','Stl','Blk','PF']
cols = ['W'+col for col in cols] + ['L'+col for col in cols]

regular_data = section1_detail.query('is_regular == 1')[cols].mean().rename('value')
tournament_data = section1_detail.query('is_regular == 0 and DayNum < 154')[cols].mean().rename('value')
final_data = section1_detail.query('DayNum == 154')[cols].mean().rename('value')

tmp = pd.DataFrame({})
tmp = tmp.append([['regular']+list(regular_data.values)],ignore_index=True)
tmp = tmp.append([['tournament']+list(tournament_data.values)],ignore_index=True)
tmp = tmp.append([['final']+list(final_data.values)],ignore_index=True)
tmp.columns = ['match_type']+list(regular_data.index)
tmp

plt.subplots(figsize=(20, 3*5))
plt.subplots_adjust(wspace=0.2, hspace=0.4)

for i,col in enumerate(['FGM','FGA','FGM3','FGA3','FTM','FTA','OR','DR','Ast','TO','Stl','Blk','PF']):
#     plt.subplot(5,3,i*3+1)
#     sns.barplot(x="match_type", y='W'+col, data=tmp)
#     plt.subplot(5,3,i*3+2)
#     sns.barplot(x="match_type", y='L'+col, data=tmp)
#     plt.subplot(5,3,i*3+3)
#     tmp['W'+col+'_L'+col] = tmp['W'+col] - tmp['L'+col]
#     sns.barplot(x="match_type", y='W'+col+'_L'+col, data=tmp)
    plt.subplot(5,3,i+1)
    tmp['W'+col+'_L'+col] = tmp['W'+col] - tmp['L'+col]
    sns.barplot(x="match_type", y='W'+col+'_L'+col, data=tmp)
  • FGM&FGA: 命中数上看,决赛中球队的差异最小,未命中上看,决赛中,胜利的一方似乎出手更稳重,也就是投丢了更少的球
  • FGM3&FGA3: 三分球上看,在决赛中,胜利的一方甚至进的三分球要少于失败的一方,这也说明一个现象,通常来说决赛因为防守方式、吹罚方式、球员心态等各种因素,一般球队会采取更保守的得分手段,而较少采用不稳定的三分球。
  • FTM&FTA: 罚球上看差异不大,胜利方的罚球命中率更高
  • Rebounds: 篮板球:灌篮高手里面说,赢得篮板的人赢得比赛,看看是不是这样。可以看到对于进攻篮板,胜利的球队总是更少的,决赛尤其如此,通常来说进攻篮板少可以说明球队的战术偏保守,也就是积极退防,以避免因为抢进攻篮板导致对方出现快攻的机会。而防守篮板上看,胜利方明显表现更好
  • 助攻和失误:在决赛,这两个相对都更少,这是因为通常决赛由于防守强度、判罚尺度等问题,球队通常需要采取更多的个人进攻,这也是明星球员的作用,而这种方式通常对应的助攻和失误都会少一些
  • 抢断和盖帽:这是两种不同的防守方式,通常抢断是有一定冒险成分的,因为抢断失败通常意味着失位,因此决赛中抢断更少出现,而盖帽多则表示到油漆区的进攻更多,这也是决赛中更多的进攻方式选择导致的
  • 个人犯规上,决赛中胜利方和失败方相比差异更小,一定程度上说也是因为不想把比赛的结果交给裁判,而是更多的给球员发挥,当然这对一些喜欢造犯规的球员来说就不是很友好

Event Data

Each MEvents & WEvents file lists the play-by-play event logs for more than 99.5% of games from that season. Each event is assigned to either a team or a single one of the team's players. Thus if a basket is made by one player and an assist is credited to a second player, that would show up as two separate records. The players are listed by PlayerID within the xPlayers.csv file.

Mens Event Files:

  • MEvents2015.csv, MEvents2016.csv, MEvent2017.csv, MEvents2018.csv, MEvents2019.csv
    Womens Event Files:

  • WEvents2015.csv, WEvents2016.csv, WEvents2017.csv, WEvents2018.csv, WEvents2019.csv

We can read in all files and combine into one huge dataframe, one for womens and one for mens.

  • EventID - this is a unique ID for each logged event. The EventID's are different within each year and uniquely identify each play-by-play event. They ought to be listed in chronological order for the events within their game.

  • Season, DayNum, WTeamID, LTeamID - these four columns are sufficient to uniquely identify each game. The games are a mix of Regular Season, NCAA® Tourney, and Secondary Tourney games.

  • WFinalScore, LFinalScore

  • WCurrentScore, LCurrentScore

  • ElapsedSeconds - 这是从比赛开始到事件发生所经过的秒数。(this is the number of seconds that have elapsed from the start of the game until the event occurred. With a 20-minute half, that means that an ElapsedSeconds value from 0 to 1200 represents an event in the first half, a value from 1200 to 2400 represents an event in the second half, and a value above 2400 represents an event in overtime. For example, since overtime periods are five minutes long (that's 300 seconds), a value of 2699 would represent one second left in the first overtime.)

  • EventTeamID - this is the ID of the team that the event is logged for, which will either be the WTeamID or the LTeamID.

  • EventPlayerID - this is the ID of the player that the event is logged for, as described in the MPlayers.csv file.

  • EventType, EventSubType - these indicate the type of the event that was logged (see listing below).

  • assist - 助攻

  • block - 盖帽

  • steal - 抢断

  • sub - 换人

  • timeout - 超时: unk=unknown type of timeout; comm=commercial timeout; full=full timeout; short= short timeout

  • turnover -失误: unk=unknown type of turnover; 10sec=10 second violation; 3sec=3 second violation; 5sec=5 second violation; bpass=bad pass turnover; dribb=dribbling turnover; lanev=lane violation; lostb=lost ball; offen=offensive turnover (?); offgt=offensive goaltending; other=other type of turnover; shotc=shot clock violation; trav=travelling

  • foul - 犯规: unk=unknown type of foul; admT=administrative technical; benT=bench technical; coaT=coach technical; off=offensive foul; pers=personal foul; tech=technical foul

  • fouled 被犯规

  • reb - 篮板: deadb=a deadball rebound; def=a defensive rebound; defdb=a defensive deadball rebound; off=an offensive rebound; offdb=an offensive deadball rebound

  • made1, miss1 - a one-point free throw was made or missed, with one of the following subtypes: 1of1=the only free throw of the trip to the line; 1of2=the first of two free throw attempts; 2of2=the second of two free throw attempts; 1of3=the first of three free throw attempts; 2of3=the second of three free throw attempts; 3of3=the third of three free throw attempts; unk=unknown what the free throw sequence is

  • made2, miss2 - a two-point field goal was made or missed, with one of the following subtypes: unk=unknown type of two-point shot; dunk=dunk; lay=layup; tip=tip-in; jump=jump shot; alley=alley-oop; drive=driving layup; hook=hook shot; stepb=step-back jump shot; pullu=pull-up jump shot; turna=turn-around jump shot; wrong=wrong basket

  • made3, miss3 - a three-point field goal was made or missed, with one of the following subtypes: unk=unknown type of three-point shot; jump=jump shot; stepb=step-back jump shot; pullu=pull-up jump shot; turna=turn-around jump shot; wrong=wrong basket

  • jumpb 跳球: start=start period; block=block tie-up; heldb=held ball; lodge=lodged ball; lost=jump ball lost; outof=out of bounds; outrb=out of bounds rebound; won=jump ball won

读取Event Data数据

mens_events = []
for year in [2015, 2016, 2017, 2018, 2019]:
    mens_events.append(pd.read_csv(Mfolder_path+f'MEvents{year}.csv'))
MEvents = pd.concat(mens_events)
print(MEvents.shape)
MEvents.head()
(13149684, 17)

womens_events = []
for year in [2015, 2016, 2017, 2018, 2019]:
    womens_events.append(pd.read_csv(Wfolder_path+f'WEvents{year}.csv'))
WEvents = pd.concat(womens_events)
print(WEvents.shape)
WEvents.head()
(12744264, 17)

del mens_events
del womens_events
gc.collect()

对EventType进行统计并可视化

EventType = pd.DataFrame({'MEvents' : MEvents['EventType'].value_counts(),'WEvents': WEvents['EventType'].value_counts()})
EventType = EventType.sort_values('MEvents').reset_index()
#EventType.sort_values('MEvents')   这里可视化更好一些

plt.figure(figsize=(15,6))
plt.subplots(figsize=(20, 10))

plt.subplot(2,1,1)
sns.barplot(x ='index', y = 'MEvents', data = EventType)

plt.subplot(2,1,2)
sns.barplot(x ='index', y = 'WEvents', data = EventType)

篮球场上换人发生是最多的,其次才是篮板球。甚至犯规次数都要比投进和投失两分球的次数还要多。

  • 值得注意的是:
    该数据还给定了XY坐标和坐标对应的区域名称,球场左下角为(0,0),右上角为(100,100),中心为(50,50),将这个坐标和标准NACC球场进行尺度统一,再结合数据点将事件可视化将会更加直观。

    • X, Y - for games where it is available, this describes an X/Y position on the court where the lower-left corner of the full court is (0,0), the upper-right corner of the full court is (100,100), the exact middle of the full court (where the initial jump ball happens) is (50,50), and so on. The X/Y position is provided for fouls, turnovers, and field-goal attempts (either 2-point or 3-point).
    • Area - for events where an X/Y position is provided, this position is more generally categorized into one of 13 "areas" of the court, as follows: 1=under basket; 2=in the paint; 3=inside right wing; 4=inside right; 5=inside center; 6=inside left; 7=inside left wing; 8=outside right wing; 9=outside right; 10=outside center; 11=outside left; 12=outside left wing; 13=backcourt

坐标示意图


区域示意图


参考链接
参考链接的参考链接

给定区域为数字,构建一个映射利用map得到区域的名称 并分组进行可视化化

#MEvents['Area'].value_counts()
area_mapping = {0: np.nan,
                1: 'under basket',
                2: 'in the paint',
                3: 'inside right wing',
                4: 'inside right',
                5: 'inside center',
                6: 'inside left',
                7: 'inside left wing',
                8: 'outside right wing',
                9: 'outside right',
                10: 'outside center',
                11: 'outside left',
                12: 'outside left wing',
                13: 'backcourt'}
MEvents['Area_Name'] = MEvents['Area'].map(area_mapping)

fig, ax = plt.subplots(figsize=(15, 8))
MEvents_X_Y = MEvents.loc[~MEvents['Area_Name'].isna()].groupby('Area_Name')
for i, d in MEvents_X_Y:
    sns.scatterplot(x='X', y='Y', data = d,  label=i, alpha = 0.3)
    #将图例放在图外
    plt.legend(loc=[1, 0])
#plt.legend(bbox_to_anchor=(1.04,1), loc="upper left")
ax.set_xticks([])
ax.set_yticks([])
ax.set_xlabel('')
ax.set_ylabel('')
ax.set_xlim(0, 100)
ax.set_ylim(0, 100)
plt.show()

球场的可视化
刚开始看到这幅图的时候完全被经验到了,大概浏览了一下代码之后发现:就这,这其实就是一个简单的对坐标进行了统一然后利用简单几何图形的叠加。但其中的坐标转化其实也没那么简单,而且配色啊什么的也是一门学问,下面贴出代码,如果后边有时间的话,会对这个代码进行一个分解,将会学到更多关于python-matplotlib绘图的奥秘。


def create_ncaa_full_court(ax=None, three_line='mens', court_color='#dfbb85',
                           lw=3, lines_color='black', lines_alpha=0.5,
                           paint_fill='blue', paint_alpha=0.4,
                           inner_arc=False):
    """
    Creates NCAA Basketball Court
    Dimensions are in feet (Court is 97x50 ft)
    Created by: Rob Mulla / https://github.com/RobMulla

    * Note that this function uses "feet" as the unit of measure.
    * NCAA Data is provided on a x range: 0, 100 and y-range 0 to 100
    * To plot X/Y positions first convert to feet like this:
    ```
    Events['X_'] = (Events['X'] * (94/100))
    Events['Y_'] = (Events['Y'] * (50/100))
    ```
    
    ax: matplotlib axes if None gets current axes using `plt.gca`


    three_line: 'mens', 'womens' or 'both' defines 3 point line plotted
    court_color : (hex) Color of the court
    lw : line width
    lines_color : Color of the lines
    lines_alpha : transparency of lines
    paint_fill : Color inside the paint
    paint_alpha : transparency of the "paint"
    inner_arc : paint the dotted inner arc
    """
    if ax is None:
        ax = plt.gca()

    # Create Pathes for Court Lines
    center_circle = Circle((94/2, 50/2), 6,
                           linewidth=lw, color=lines_color, lw=lw,
                           fill=False, alpha=lines_alpha)
    hoop_left = Circle((5.25, 50/2), 1.5 / 2,
                       linewidth=lw, color=lines_color, lw=lw,
                       fill=False, alpha=lines_alpha)
    hoop_right = Circle((94-5.25, 50/2), 1.5 / 2,
                        linewidth=lw, color=lines_color, lw=lw,
                        fill=False, alpha=lines_alpha)

    # Paint - 18 Feet 10 inches which converts to 18.833333 feet - gross!
    left_paint = Rectangle((0, (50/2)-6), 18.833333, 12,
                           fill=paint_fill, alpha=paint_alpha,
                           lw=lw, edgecolor=None)
    right_paint = Rectangle((94-18.83333, (50/2)-6), 18.833333,
                            12, fill=paint_fill, alpha=paint_alpha,
                            lw=lw, edgecolor=None)
    
    left_paint_boarder = Rectangle((0, (50/2)-6), 18.833333, 12,
                           fill=False, alpha=lines_alpha,
                           lw=lw, edgecolor=lines_color)
    right_paint_boarder = Rectangle((94-18.83333, (50/2)-6), 18.833333,
                            12, fill=False, alpha=lines_alpha,
                            lw=lw, edgecolor=lines_color)

    left_arc = Arc((18.833333, 50/2), 12, 12, theta1=-
                   90, theta2=90, color=lines_color, lw=lw,
                   alpha=lines_alpha)
    right_arc = Arc((94-18.833333, 50/2), 12, 12, theta1=90,
                    theta2=-90, color=lines_color, lw=lw,
                    alpha=lines_alpha)
    
    leftblock1 = Rectangle((7, (50/2)-6-0.666), 1, 0.666,
                           fill=True, alpha=lines_alpha,
                           lw=0, edgecolor=lines_color,
                           facecolor=lines_color)
    leftblock2 = Rectangle((7, (50/2)+6), 1, 0.666,
                           fill=True, alpha=lines_alpha,
                           lw=0, edgecolor=lines_color,
                           facecolor=lines_color)
    ax.add_patch(leftblock1)
    ax.add_patch(leftblock2)
    
    left_l1 = Rectangle((11, (50/2)-6-0.666), 0.166, 0.666,
                           fill=True, alpha=lines_alpha,
                           lw=0, edgecolor=lines_color,
                           facecolor=lines_color)
    left_l2 = Rectangle((14, (50/2)-6-0.666), 0.166, 0.666,
                           fill=True, alpha=lines_alpha,
                           lw=0, edgecolor=lines_color,
                           facecolor=lines_color)
    left_l3 = Rectangle((17, (50/2)-6-0.666), 0.166, 0.666,
                           fill=True, alpha=lines_alpha,
                           lw=0, edgecolor=lines_color,
                           facecolor=lines_color)
    ax.add_patch(left_l1)
    ax.add_patch(left_l2)
    ax.add_patch(left_l3)
    left_l4 = Rectangle((11, (50/2)+6), 0.166, 0.666,
                           fill=True, alpha=lines_alpha,
                           lw=0, edgecolor=lines_color,
                           facecolor=lines_color)
    left_l5 = Rectangle((14, (50/2)+6), 0.166, 0.666,
                           fill=True, alpha=lines_alpha,
                           lw=0, edgecolor=lines_color,
                           facecolor=lines_color)
    left_l6 = Rectangle((17, (50/2)+6), 0.166, 0.666,
                           fill=True, alpha=lines_alpha,
                           lw=0, edgecolor=lines_color,
                           facecolor=lines_color)
    ax.add_patch(left_l4)
    ax.add_patch(left_l5)
    ax.add_patch(left_l6)
    
    rightblock1 = Rectangle((94-7-1, (50/2)-6-0.666), 1, 0.666,
                           fill=True, alpha=lines_alpha,
                           lw=0, edgecolor=lines_color,
                           facecolor=lines_color)
    rightblock2 = Rectangle((94-7-1, (50/2)+6), 1, 0.666,
                           fill=True, alpha=lines_alpha,
                           lw=0, edgecolor=lines_color,
                           facecolor=lines_color)
    ax.add_patch(rightblock1)
    ax.add_patch(rightblock2)

    right_l1 = Rectangle((94-11, (50/2)-6-0.666), 0.166, 0.666,
                           fill=True, alpha=lines_alpha,
                           lw=0, edgecolor=lines_color,
                           facecolor=lines_color)
    right_l2 = Rectangle((94-14, (50/2)-6-0.666), 0.166, 0.666,
                           fill=True, alpha=lines_alpha,
                           lw=0, edgecolor=lines_color,
                           facecolor=lines_color)
    right_l3 = Rectangle((94-17, (50/2)-6-0.666), 0.166, 0.666,
                           fill=True, alpha=lines_alpha,
                           lw=0, edgecolor=lines_color,
                           facecolor=lines_color)
    ax.add_patch(right_l1)
    ax.add_patch(right_l2)
    ax.add_patch(right_l3)
    right_l4 = Rectangle((94-11, (50/2)+6), 0.166, 0.666,
                           fill=True, alpha=lines_alpha,
                           lw=0, edgecolor=lines_color,
                           facecolor=lines_color)
    right_l5 = Rectangle((94-14, (50/2)+6), 0.166, 0.666,
                           fill=True, alpha=lines_alpha,
                           lw=0, edgecolor=lines_color,
                           facecolor=lines_color)
    right_l6 = Rectangle((94-17, (50/2)+6), 0.166, 0.666,
                           fill=True, alpha=lines_alpha,
                           lw=0, edgecolor=lines_color,
                           facecolor=lines_color)
    ax.add_patch(right_l4)
    ax.add_patch(right_l5)
    ax.add_patch(right_l6)
    
    # 3 Point Line
    if (three_line == 'mens') | (three_line == 'both'):
        # 22' 1.75" distance to center of hoop
        three_pt_left = Arc((6.25, 50/2), 44.291, 44.291, theta1=-78,
                            theta2=78, color=lines_color, lw=lw,
                            alpha=lines_alpha)
        three_pt_right = Arc((94-6.25, 50/2), 44.291, 44.291,
                             theta1=180-78, theta2=180+78,
                             color=lines_color, lw=lw, alpha=lines_alpha)

        # 4.25 feet max to sideline for mens
        ax.plot((0, 11.25), (3.34, 3.34),
                color=lines_color, lw=lw, alpha=lines_alpha)
        ax.plot((0, 11.25), (50-3.34, 50-3.34),
                color=lines_color, lw=lw, alpha=lines_alpha)
        ax.plot((94-11.25, 94), (3.34, 3.34),
                color=lines_color, lw=lw, alpha=lines_alpha)
        ax.plot((94-11.25, 94), (50-3.34, 50-3.34),
                color=lines_color, lw=lw, alpha=lines_alpha)
        ax.add_patch(three_pt_left)
        ax.add_patch(three_pt_right)

    if (three_line == 'womens') | (three_line == 'both'):
        # womens 3
        three_pt_left_w = Arc((6.25, 50/2), 20.75 * 2, 20.75 * 2, theta1=-85,
                              theta2=85, color=lines_color, lw=lw, alpha=lines_alpha)
        three_pt_right_w = Arc((94-6.25, 50/2), 20.75 * 2, 20.75 * 2,
                               theta1=180-85, theta2=180+85,
                               color=lines_color, lw=lw, alpha=lines_alpha)

        # 4.25 inches max to sideline for mens
        ax.plot((0, 8.3), (4.25, 4.25), color=lines_color,
                lw=lw, alpha=lines_alpha)
        ax.plot((0, 8.3), (50-4.25, 50-4.25),
                color=lines_color, lw=lw, alpha=lines_alpha)
        ax.plot((94-8.3, 94), (4.25, 4.25),
                color=lines_color, lw=lw, alpha=lines_alpha)
        ax.plot((94-8.3, 94), (50-4.25, 50-4.25),
                color=lines_color, lw=lw, alpha=lines_alpha)

        ax.add_patch(three_pt_left_w)
        ax.add_patch(three_pt_right_w)

    # Add Patches
    ax.add_patch(left_paint)
    ax.add_patch(left_paint_boarder)
    ax.add_patch(right_paint)
    ax.add_patch(right_paint_boarder)
    ax.add_patch(center_circle)
    ax.add_patch(hoop_left)
    ax.add_patch(hoop_right)
    ax.add_patch(left_arc)
    ax.add_patch(right_arc)
    
    if inner_arc:
        left_inner_arc = Arc((18.833333, 50/2), 12, 12, theta1=90,
                             theta2=-90, color=lines_color, lw=lw,
                       alpha=lines_alpha, ls='--')
        right_inner_arc = Arc((94-18.833333, 50/2), 12, 12, theta1=-90,
                        theta2=90, color=lines_color, lw=lw,
                        alpha=lines_alpha, ls='--')
        ax.add_patch(left_inner_arc)
        ax.add_patch(right_inner_arc)

    # Restricted Area Marker
    restricted_left = Arc((6.25, 50/2), 8, 8, theta1=-90,
                        theta2=90, color=lines_color, lw=lw,
                        alpha=lines_alpha)
    restricted_right = Arc((94-6.25, 50/2), 8, 8,
                         theta1=180-90, theta2=180+90,
                         color=lines_color, lw=lw, alpha=lines_alpha)
    ax.add_patch(restricted_left)
    ax.add_patch(restricted_right)
    
    # Backboards
    ax.plot((4, 4), ((50/2) - 3, (50/2) + 3),
            color=lines_color, lw=lw*1.5, alpha=lines_alpha)
    ax.plot((94-4, 94-4), ((50/2) - 3, (50/2) + 3),
            color=lines_color, lw=lw*1.5, alpha=lines_alpha)
    ax.plot((4, 4.6), (50/2, 50/2), color=lines_color,
            lw=lw, alpha=lines_alpha)
    ax.plot((94-4, 94-4.6), (50/2, 50/2),
            color=lines_color, lw=lw, alpha=lines_alpha)

    # Half Court Line
    ax.axvline(94/2, color=lines_color, lw=lw, alpha=lines_alpha)

    # Boarder
    boarder = Rectangle((0.3,0.3), 94-0.4, 50-0.4, fill=False, lw=3, color='black', alpha=lines_alpha)
    ax.add_patch(boarder)
    
    # Plot Limit
    ax.set_xlim(0, 94)
    ax.set_ylim(0, 50)
    ax.set_facecolor(court_color)
    ax.set_xticks([])
    ax.set_yticks([])
    ax.set_xlabel('')
    return ax


fig, ax = plt.subplots(figsize=(15, 8.5))
create_ncaa_full_court(ax, three_line='both', paint_alpha=0.4)
plt.show()

有了上边球场作为基础,在上边绘制XY坐标点将会十分直观。

Plotting X, Y Data

  • 之后的可视化确实比较直观和有趣,请往后看。

  • X, Y points are not available for all games- so this is not a complete sample
    XY坐标并不是对所有比赛都是完整的,有些是缺失的,比如我想对以下的MVP进行可视化,无奈没有数据。。

  • XY坐标仅为以下事件fouls, turnovers, and field-goal attempts (either 2-point or 3-point)提供坐标,其他事件没有坐标。

NACC标准球场大大小为94*50,将其与坐标统一

# Normalize X, Y positions for court dimentions
# Court is 50 feet wide and 94 feet end to end.
MEvents['X_'] = (MEvents['X'] * (94/100))
MEvents['Y_'] = (MEvents['Y'] * (50/100))

WEvents['X_'] = (WEvents['X'] * (94/100))
WEvents['Y_'] = (WEvents['Y'] * (50/100))

分别对男女Turnover失误的位置进行可视化

#fouls, turnovers, and field-goal attempts (either 2-point or 3-point). No X/Y data for other events.
fig, ax = plt.subplots(figsize=(15, 7.8))
ms = 10
ax = create_ncaa_full_court(ax, paint_alpha=0.1)
MEvents.query('EventType == "turnover"') \
    .plot(x='X_', y='Y_', style='X',
          title='Turnover Locations (Mens)',
          c='red',
          alpha=0.3,
          figsize=(15, 9),
          label='Steals',
          ms=ms,#点的大小
          ax=ax)
ax.set_xlabel('')
ax.get_legend().remove()
plt.show()

fig, ax = plt.subplots(figsize = (15,7.8))
ms = 10
ax = create_ncaa_full_court(ax, paint_alpha=0.2)
WEvents[WEvents['EventType'] == 'turnover']\
    .plot(x = 'X_', y = 'Y_', style = 'o',
          title='Turnover Locations (Womens)',
          alpha = 0.2,
          figsize=(15, 9),
          label='Steals',
          ms=ms,
          ax = ax)
ax.set_xlabel('')
ax.get_legend().remove()
plt.show()

下面利用subplot分别将男女两分、三分投进、投失的位置进行可视化

COURT_COLOR = '#dfbb85'
fig, ax = plt.subplots(2, 2, figsize=(20, 10))
ax1 = ax[0,0]
ax2 = ax[0,1]
ax3 = ax[1,0]
ax4 = ax[1,1]

# Where are 3 pointers made from? (This is really cool)
WEvents.query('EventType == "made3"') \
    .plot(x='X_', y='Y_', style='.',
          color='blue',
          title='3 Pointers Made (Womens)',
          alpha=0.01, ax=ax1)
ax1 = create_ncaa_full_court(ax1, lw=0.5, three_line='womens', paint_alpha=0.1)
ax1.set_facecolor(COURT_COLOR)

WEvents.query('EventType == "miss3"') \
    .plot(x='X_', y='Y_', style='.',
          title='3 Pointers Missed (Womens)',
          color='red',
          alpha=0.01, ax=ax2)
ax2.set_facecolor(COURT_COLOR)
ax2 = create_ncaa_full_court(ax2, lw=0.5, three_line='womens', paint_alpha=0.1)

WEvents.query('EventType == "made2"') \
    .plot(x='X_', y='Y_', style='.',
          color='blue',
          title='2 Pointers Made (Womens)',
          alpha=0.01, ax=ax3)
ax3.set_facecolor(COURT_COLOR)
ax3 = create_ncaa_full_court(ax3, lw=0.5, three_line='womens', paint_alpha=0.1)

WEvents.query('EventType == "miss2"') \
    .plot(x='X_', y='Y_', style='.',
          title='2 Pointers Missed (Womens)',
          color='red',
          alpha=0.01, ax=ax4)
ax4.set_facecolor(COURT_COLOR)
ax4 = create_ncaa_full_court(ax4, lw=0.5, three_line='womens', paint_alpha=0.1)

ax1.get_legend().remove()
ax2.get_legend().remove()
ax1.set_xticks([])
ax1.set_yticks([])
ax2.set_xticks([])
ax2.set_yticks([])
ax1.set_xlabel('')
ax2.set_xlabel('')
ax3.get_legend().remove()
ax4.get_legend().remove()
ax3.set_xticks([])
ax4.set_yticks([])
ax3.set_xticks([])
ax4.set_yticks([])
ax3.set_xlabel('')
ax4.set_xlabel('')
plt.show()

以上是对整体数据的XY坐标的可视化,下面结合Players.csv对单个球员的数据进行可视化。

单个球员数据的可视化

Players数据包含 ID、姓、名、所在球队的ID

#男子数据有几行不太好  利用error_bad_lines参数进行读取
MPlayers = pd.read_csv(Mfolder_path+f'MPlayers.csv', error_bad_lines=False)
WPlayers = pd.read_csv(Wfolder_path+f'WPlayers.csv')
MPlayers.head()

与事件数据进行合并

# Merge Player name onto events
MEvents = MEvents.merge(MPlayers,
              how='left',
              left_on='EventPlayerID',
              right_on='PlayerID')

WEvents = WEvents.merge(WPlayers,
              how='left',
              left_on='EventPlayerID',
              right_on='PlayerID')

看一下19年、18年冠军及其MVP的ID

#2019   弗吉尼亚大学骑兵队  MVP 凯尔·盖伊
MPlayers.query('FirstName == "Donte" and LastName == "DiVincenzo"')  #但是没有坐标位置  没法可视化
| 2018  维拉诺瓦大学野猫队  MVP丹特·迪温琴佐
MPlayers.query('FirstName == "Kyle" and LastName == "Guy"')

发现这两个MVP的XY数据恰好均是缺失,所以随意选了一个有数据的可视化,泰·杰罗姆 1997年7月8日出生于美国纽约州新罗谢尔,美国职业篮球运动员,司职控球后卫,效力于NBA菲尼克斯太阳队。 ID为12410

MEvents.query('EventPlayerID == 12410')['EventType'].value_counts()

sub         469
assist      376
reb         311
miss3       254
miss2       208
foul        201
made2       193
made3       164
turnover    144
steal       126
made1       117
fouled       63
miss1        32
block         4

对其数据进行可视化

ms = 10 # Marker Size
fig, ax = plt.subplots(figsize=(15, 8))
ax = create_ncaa_full_court(ax)
MEvents.query('EventPlayerID == 12410 and EventType == "made2"') \
    .plot(x='X_', y='Y_', style='o',
          title='Shots (Ty Jerome)',
          alpha=0.5,
         figsize=(15, 8),
         label='Made 2',
         ms=ms,
         ax=ax)
plt.legend()
MEvents.query('EventPlayerID == 12410 and EventType == "miss2"') \
    .plot(x='X_', y='Y_', style='X',
          alpha=0.5, ax=ax,
         label='Missed 2',
         ms=ms)
plt.legend()
MEvents.query('EventPlayerID == 12410 and EventType == "made3"') \
    .plot(x='X_', y='Y_', style='o',
          c='brown',
          alpha=0.5,
         figsize=(15, 8),
         label='Made 3', ax=ax,
         ms=ms)
plt.legend()
MEvents.query('EventPlayerID == 12410 and EventType == "miss3"') \
    .plot(x='X_', y='Y_', style='X',
          c='green',
          alpha=0.5, ax=ax,
         label='Missed 3',
         ms=ms)
ax.set_xlabel('')
plt.legend()
plt.show()

来看一看 锡安·威廉姆森 ID 2825 的数据

  • 锡安-威廉姆斯(Zion Williamson),2000年7月6日出生于美国南卡罗来纳州斯帕坦堡,美国职业篮球运动员,司职大前锋,效力于NBA[新奥尔良鹈鹕队锡安-威廉姆斯于2019年以选秀状元身份进入NBA。
MPlayers.query('FirstName == "Zion" and  LastName == "Williamson"')
ms = 10 # Marker Size
FirstName = 'Zion'
LastName = 'Williamson'
fig, ax = plt.subplots(figsize=(15, 8))
ax = create_ncaa_full_court(ax)
MEvents.query('EventPlayerID == 2825 and EventType == "made2"') \
    .plot(x='X_', y='Y_', style='o',
          title='Shots (Zion Williamson)',
          alpha=0.5,
         figsize=(15, 8),
         label='Made 2',
         ms=ms,
         ax=ax)
plt.legend()
MEvents.query('EventPlayerID == 2825 and EventType == "miss2"') \
    .plot(x='X_', y='Y_', style='X',
          alpha=0.5, ax=ax,
         label='Missed 2',
         ms=ms)
plt.legend()
MEvents.query('EventPlayerID == 2825 and EventType == "made3"') \
    .plot(x='X_', y='Y_', style='o',
          c='brown',
          alpha=0.5,
         figsize=(15, 8),
         label='Made 3', ax=ax,
         ms=ms)
plt.legend()
MEvents.query('EventPlayerID == 2825 and EventType == "miss3"') \
    .plot(x='X_', y='Y_', style='X',
          c='green',
          alpha=0.5, ax=ax,
         label='Missed 3',
         ms=ms)
ax.set_xlabel('')
plt.legend()
plt.show()

再来看看女子比赛的凯特-萨缪尔森(Katie Lou Samuelson) ID 3163,她以三分射手而出名,并且和库里一样,很擅长三分线外的超远距离三分,下面可视化看看实时是否真的如此?

WPlayers.query('FirstName == "Katie Lou" and  LastName == "Samuelson"')

fig, ax = plt.subplots(figsize=(15, 8))
ax = create_ncaa_full_court(ax, three_line='womens')
WEvents.query('EventPlayerID == 1821 and EventType == "made2"') \
    .plot(x='X_', y='Y_', style='o',
          title='Shots (Katie Lou Samuelson)',
          alpha=0.5,
         figsize=(15, 8),
         label='Made 2',
         ms=ms,
         ax=ax)
plt.legend()
WEvents.query('EventPlayerID == 1821 and EventType == "miss2"') \
    .plot(x='X_', y='Y_', style='X',
          alpha=0.5, ax=ax,
         label='Missed 2',
         ms=ms)
plt.legend()
WEvents.query('EventPlayerID == 1821 and EventType == "made3"') \
    .plot(x='X_', y='Y_', style='o',
          c='brown',
          alpha=0.5,
         figsize=(15, 8),
         label='Made 3', ax=ax,
         ms=ms)
plt.legend()
WEvents.query('EventPlayerID == 1821 and EventType == "miss3"') \
    .plot(x='X_', y='Y_', style='X',
          c='green',
          alpha=0.5, ax=ax,
         label='Missed 3',
         ms=ms)
ax.set_xlabel('')
plt.legend()
plt.show()

很直观的可以看出,确实如此。

投射热力图

无论是否投中,对所有的两分球和三分球的投射位置进行统计画热力图,然后分析比较男女不同运动员饿投射偏好。

N_bins = 100
#将投篮事件  和  坐标不为0 的事件取出
shot_events = MEvents.loc[MEvents['EventType'].isin(['miss3','made3','miss2','made2']) & (MEvents['X_'] != 0)]
fig, ax = plt.subplots(figsize=(15, 7))
ax = create_ncaa_full_court(ax,
                            paint_alpha=0.0,
                            three_line='mens',
                            court_color='black',
                            lines_color='white')
plt.hist2d(shot_events['X_'].values + np.random.normal(0, 0.1, shot_events['X_'].shape), # Add Jitter to values for plotting
           shot_events['Y_'].values + np.random.normal(0, 0.1, shot_events['Y_'].shape),
           bins=N_bins, norm=mpl.colors.LogNorm(),
               cmap='plasma')

# Plot a colorbar with label.
cb = plt.colorbar()
cb.set_label('Number of shots')

ax.set_title('Shot Heatmap (Mens)')
plt.show()

N_bins = 100
shot_events = WEvents.loc[WEvents['EventType'].isin(['miss3','made3','miss2','made2']) & (WEvents['X_'] != 0)]
fig, ax = plt.subplots(figsize=(15, 7))
ax = create_ncaa_full_court(ax, three_line='womens', paint_alpha=0.0,
                            court_color='black',
                            lines_color='white')
plt.hist2d(shot_events['X_'].values + np.random.normal(0, 0.2, shot_events['X_'].shape),
           shot_events['Y_'].values + np.random.normal(0, 0.2, shot_events['Y_'].shape),
           bins=N_bins, norm=mpl.colors.LogNorm(),
               cmap='plasma')

# Plot a colorbar with label.
cb = plt.colorbar()
cb.set_label('Number of shots')

ax.set_title('Shot Heatmap (Womens)')
plt.show()

在将男子和女子比赛进行比较时,有趣的观察是,男子射击的许多镜头都直接在篮筐下方,而女子射击的热点更多地出现在篮筐的左侧和右侧。

下面考察一下每个坐标点的平均得分情况

MEvents['PointsScored'] =  0
MEvents.loc[MEvents['EventType'] == 'made2', 'PointsScored'] = 2
MEvents.loc[MEvents['EventType'] == 'made3', 'PointsScored'] = 3
MEvents.loc[MEvents['EventType'] == 'missed2', 'PointsScored'] = 0
MEvents.loc[MEvents['EventType'] == 'missed3', 'PointsScored'] = 0
avg_pnt_xy = MEvents.loc[MEvents['EventType'].isin(['miss3','made3','miss2','made2']) & (MEvents['X_'] != 0)] \
     .groupby(['X_','Y_'])['PointsScored'].mean().reset_index()
#avg_pnt_xy.plot(x='X_',y='Y_', style='.')

bins = [0,0.5,1,1.33,1.67,2,2.5,3]
avg_pnt_xy['PointsScored'] = pd.cut(avg_pnt_xy['PointsScored'],bins)
fig, ax = plt.subplots(figsize=(15, 8))
ax = sns.scatterplot(data=avg_pnt_xy, x='X_', y='Y_', hue='PointsScored')
ax = create_ncaa_full_court(ax)
plt.legend(loc=[1,0])
plt.show()

下面考察一下对每个坐标点投射次数的统计

MEvents['Made'] = False
MEvents['Made'] = False
MEvents.loc[MEvents['EventType'] == 'made2', 'Made'] = True
MEvents.loc[MEvents['EventType'] == 'made3', 'Made'] = True
MEvents.loc[MEvents['EventType'] == 'missed2', 'Made'] = False
MEvents.loc[MEvents['EventType'] == 'missed3', 'Made'] = False
MEvents.loc[MEvents['EventType'] == 'made2', 'Missed'] = False
MEvents.loc[MEvents['EventType'] == 'made3', 'Missed'] = False
MEvents.loc[MEvents['EventType'] == 'missed2', 'Missed'] = True
MEvents.loc[MEvents['EventType'] == 'missed3', 'Missed'] = True

avg_made_xy = MEvents.loc[MEvents['EventType'].isin(['miss3','made3','miss2','made2']) & (MEvents['X_'] != 0)] \
     .groupby(['X_','Y_'])['Made','Missed'].sum().reset_index()

bins = [0,25,50,100,200,1000,2000]
avg_made_xy['Made'] = pd.cut(avg_made_xy['Made'],bins)

fig, ax = plt.subplots(figsize=(15, 8))
cmap = sns.cubehelix_palette(as_cmap=True)
ax = sns.scatterplot(data=avg_made_xy, x='X_', y='Y_', cmap='plasma', hue='Made')
ax = create_ncaa_full_court(ax, paint_alpha=0)
ax.set_title('Number of Shots Made')
plt.legend(loc=[1, 0])
plt.show()

本节完

下节带来 - 只根据球队的对阵信息 如何确定球队的水平 并给出排名

你可能感兴趣的:(Kaggle赛题-“疯狂的三月”NACC篮球赛预测(2))