讲解:RQ4、Java,Python、c++,JavaProcessing|Processing

ADM / 2019 / Homework_2 / README.MDtlancian [upd] RQ4 - Clarificationa9dc986 4 days ago4 contributorsBranch: master Find file Copy path130 lines (73 sloc) 11.3 KBHomework 2 - Soccer analyticsSoccer analytics is attracting an increasing interest of academiaand industry, thanks to the availability of sensing technologiesthat provide high-fidelity data streams extracted from everymatch.The goal of this assignment is to perform an analysis on thelargest open collection of soccer-logs ever released, collectedby [Wyscout] (https://wyscout.com/) containing all the spatiotemporalevents (passes, shots, fouls, etc.) that occur during allmatches of the entire season 2017-2018 of seven competitions(La Liga, Serie A, Bundesliga, Premier League, Ligue 1, FIFAWorld Cup 2018, UEFA Euro Cup 2016). A match event containsinformation about its position, time, outcome, player andcharacteristics.In particular, we are curious to answer to some specificresearch questions (RQs) that may help us discover andinterpret meaningful patterns in data.Raw Blame HistoryBefore startingAmong all numerous things and good practises a data scientistneeds to do before running any analysis, there is one the is ofuttermost importance: get data and understand it!Here you find the list of tasks you need to perform beforedigging into the rich world of soccer.Get your data! Go to this website and download the filesrelated to Coaches, Players, Events, Teams and Matches.Throughout the analysis we focus only in club teamsinformation. So, there is no need to download/use thefiles relative to the European Cup and the World Cup.Understand your data. Read the legend of each column tounderstand what it refers to. Additional information aboutthe labels can be found here: Coaches, Players, Events,Teams, Matches. Please, be sure that youve understoodthe data before start coding.Handling data. The data are provided in multiple .jsonfiles, with some of the columns present in more than onefile. For this reason, in order to answer the RQs, we kindlysuggest you to import the .json files as pandasDataFrame object and then, based on what you want toanalyze, perform joins among the DataFrames. Here youcan find a quick useful guide. Remember, Google is yourbest friend!VERY VERY IMPORTANT]. !!! Read the entire homework before coding anything!!!^. My solution its not better than yours and yours is notbetter than mine. In any data analysis task, there is not aunique way to answer to RQs. For this reason it is crucial(necessary and mandatory) that you describe any singledecision you take and all the steps you do._. Once performed any exercise, comments about theobtained results are mandatory. We are not always explicitwhere to focus your comments, but we will always wantsome brief sentences about your discoveries.Research questionsExploratory Data AnalysisGeneral Setup: All the analysis requested from RQ1 to RQ5,must be performed only over the Premier League dataset.]. [RQ1] Who wants to be a Champion? During a season couldhappen that a team has bad periods. For example, morethan three consecutive games lost, or it could have apositive trend where it seems to be unbeatable. Letsvisualize this trends!Create a plot where each point (x,y) represents the numberof points obtained by team x at game week y. In order toshow the trends, points related to the same team must beconnected to each other. Remind: in soccer each team gets3 points for a win, 1 point for a tied game, and 0 for a loss.Highlight the two teams that got the longest winning streak(# of consecutive wins), and the two teams that got thelongest losing streak (# of consecutive losses).Below you can see a similar example of what we would likeyou to show us. Keep in mind that you must create this plotfor all the entire season (38 game weeks).^. [RQ2] Is there a home-field advantage? It is generallybelieved that there is an underlying home field advantage insport, i.e. an highest probability of winning of the hometeam. Lets check for this, and see whether the outcome ofthe game (win, draw, lose) is correlated to the playing side(home or away). For 5 different teams of Premier League,show the contingency table (outcome x side). Therefore,perform an overall Chi-squared test in the following way:build a unique contingency table, that contains all thematches in which only one of the 5 teams previouslyselected is involved, to see whether there is home fieldadvantage. State clearly the tested hypothesis and whetherit is accepted or rejected._. [RQ3] Which teams have the youngest coaches? Rank allthe teams by the age of their coach and show the 10 teamswith the youngest coaches. Remember that during aseason a team could have more coaches, in that case pickthe younger of them. AdditioRQ4代做、代写Java,Python编程设计、代做c++,nally, show the distirbutions ofthe ages of all coaches in Premier League, using a boxplot.(Hint: Theres an attribute birthDate).f. [RQ4] Find the top 10 players with the highest ratiobetween completed passes and attempted passes. For thistask, consider all the different types of passes, and asspecified in the website, a completed pass has tag 1801(accurate event).In order to avoid meaningless results (e.g. players whoplayed few minutes, and completed 2 passes over 2,achieving 100% ratio), select an arbitrary threshold ofminimum attempted passes, in order to consider only thesubset of players that played enough. Justify the choicesyou make.i. [RQ5] Does being a tall player mean winning more airduels? Soccer is a physical game, and it happens often in amatch that players are involved in air duels (i.e. when twoplayers are contending for the ball while it is not on theground). Make a plot that shows the dependency betweenheight of the player and the ratio of air duels won with airduels attempted. The visualization should be a scatterplot,where each point (x,y) represent a player whose height isequal to x, and that has a ratio of winning air duels equal toy. Furthermore, color any point according an arbitraryselection of categories of height (e.g. yellow: 160-165cm,orange: 165-170cm, etc.)Remember that the Air Duel is a subevent of the eventDuel and that an air duel is said to be won if it has the tag1801. Same as in RQ4, choose a threshold of minimum airduels attempted, in order filter your data, get reliableresults, and justify your choice.j. [RQ6] Free your mind! Go further with the EDA (ExploratoryData Analysis) showing a new interesting result about thedataset that you found.Core Research Questions[CRQ1] What are the time slots of the match with moregoals? Lets analyse and visualise the goals distribution into9-minutes sets for all the matches. I.e., lets transform theminute of a goal from a continuous variable in a discretevariable (e.g. A goal scored in 5th minute, will end up in theinterval [0-9)). Remind that every match goes usually fromminute 0, to minute 90, but in football it is always added anarbitary amount of extra-time to every half of the match,thus consider also the intervals 45+ and 90+.i. Make a barplot with the absolute frequency of goals inall the time slots.ii. Find the top 10 teams that score the most in theinterval 81-90.iii. Show if there are players that were able to score atleast one goal in 8 different intervals.[CRQ2] Visualize movements and passes on the pitch! Herewe try to focus our attention on the zones that a playercovers during a match. For each event, we have a pair ofcoordinates, that are respectively the starting and endingpoint of that event. It can be helpful to follow this link.Knowing all the different positions where events happen, let usbe able to create different types of visualizations:]. Considering only the match Barcelona - Real Madridplayed on the 6 May 2018:visualize with a heatmap the zones where CristianoRonaldo was more active. The events to be consideredare: passes, shoots, duels, free kicks.compare his map with the one of Lionel Messi.Comment the results and point out the maindifferences (we are not looking for deep and techniqueanalysis, just show us if there are some cleardifferences between the 2 plots).Heres an example of heatmap where are shown all the startingpositions of the goals of Arsenal during the entire season.^. Considering only the match Juventus - Napoli played onthe 22 April 2018:visualize with arrows the starting point and endingpoint of each pass done during the match by Jorginhoand Miralem Pjanic. Is there a huge differencebetween the map with all the passes done and the onewith only accurate passes? Comment the results andpoint out the main differences.Here theres an example of a map with arrows.Theoretical QuestionYou are given the recursive function splitSwap, which acceptsan array a, an index i, and a length n.function splitSwap(a, l, n): if n return splitSwap(a, l, n/2) splitSwap(a, l+ n /2, n/2) swapList(a, l, n)The subroutine swapList is described here:function swapList(a, l, n): for i = 1 to n/2: tmp = a[l + i] a[l + i] = a[l + n/2 + i] a[l + n/2 + i] = tmp]. How much running time does it take to execute splitSwap(a,0, n)? (We want a Big O analysis.)^. What does this algorithm do? Is it optimal? Describe themechanism of the algorithm in details, we do not want toknow only its final result.Bonus]. Repeat the entire analysis for other leagues (La Liga, SerieA, Bundesliga and Ligue 1), aggregating the results andhighlighting the differences you find among the leagues.^. Make nice visualization using libraries like Bokeh andSeaborn.转自:http://www.3daixie.com/contents/11/3444.html

你可能感兴趣的:(讲解:RQ4、Java,Python、c++,JavaProcessing|Processing)