讲解:INF6027、R/RStudio、R、RR|Python

Postgraduate coursework, Information SchoolINF6027 Introduction to Data Science (2018-19)Analysis of the UK Police Dataset1. IntroductionThis part of the assessment for INF6027 Introduction to Data Science comprises a piece of individual coursework to assess yourability to analyse data using R/RStudio and to then communicate your findings. Given a specific topic and dataset (see Section 2),you should identify a specific problem or topic you would like to investigate (e.g., where or when particular types of crime occur, orco-occur). You will then need to pre-process and analyse the dataset to identify patterns and relationships that address your selectedproblem/topic. This should involve using techniques learned throughout the practical sessions that will help you to demonstrate yourR skills, such as summarising datasets, statistical modelling or data visualisation, to highlight and illustrate particular aspects of thedata you want to communicate (e.g., particular patterns or trends).This coursework aims to follow the stages involved in a ‘typical’ data science process: (i) define the question(s) to address (note,sometimes this does not come at the start of the process, but after initial exploration of the data); (ii) gather data; (iii) transform, cleanand structure the data; (iv) explore and analyse the data; and (v) communicate the findings of the data analysis. This often occurs inan iterative manner and centred on one or multiple questions you are seeking to address. For example, the data discovery processin Figure 1 presents an example of the stages involved in data discovery as an iterative process1 and you can find more details inSection 3. This is also similar to the data science process we have been using in class from the “Doing Data Science” book (O’Neil &Schutt, 2013).Fig. 1 Example data discovery process (Jones, 2014: p.2)You should write a 3,000 word structured report (see Section 4) that describes the approach you have taken to explore andanalyse the data for the selected problem/topic. You report should clearly communicate the results of your data analysis and bewritten in a way that helps the reader interpret your findings. Note: charts, tables, and appendices are not included in the word count.This assessment is worth 100% of the overall module mark for INF6027. A pass mark of 50 is required to pass the module as awhole. Submission deadline: 10am Monday 21st January 2019 (Week 14) via Turnitin. See Section 5 for more generalinformation about Coursework Submission Requirements within the Information School.2. The UK Police DatasetThe dataset to be used in this assessment is the UK Police Dataset, which has made public crime data since 2011 (this is anexample of Open Data). There has been a lot of recent interest in analysing publicly available datasets to identify patterns of crimeand gain insights into criminal activity, see for example the crime activity browser by IBM2. If interested in the topic you can also findfurther crime-related datasets produced by the UK Data Service (https://www.ukdataservice.ac.uk/get-data/themes/crime). There isalso an increasing use of crime Open Data used in the media to highlight aspects of policing and criminal activities (see, e.g.,https://www.bbc.co.uk/news/uk-44044537).1You can find out more about this process in (Jones, 2014: p.2): https://tanthiamhuat.files.wordpress.com/2015/07/communicating-data-with-tableau.pdf2Open Crime Data, Free for All: https://developer.ibm.com/clouddataservices/2016/11/03/open-crime-data/ A description of the data is available here: http://data.police.uk/about/ also including an explanation on how to download the data3.The data are provided as CSV files (note that there is also an API available if you prefer) and provide street-level crime, outcomeand stop and search information broken down by police force4 (in the UK there are 45 territorial police forces and 3 Special Forces)and 2011 Lower Layer Super Output Areas (LSOA).The dataset describes crimes reported to UK police during each month in different areas of the UK. Information in the datasetincludes the following: geographical location (longitude and latitude), date (month, year), LSOA code (i.e., the census area), and typeof crime (e.g., vehicle crime, burglary, robbery, etc.). You can select any data from the UK Police Dataset. (This may require multipledownloads.) You can also aggregate the dataset with other data sources if you want (e.g., census data), which would demonstrateyour ability to join datasets (although you don’t have to do this to pass the coursework as the emphasis of the coursework is on howyou carry out your analysis in R/RStudio and communicate your findings on the UK Police Dataset).3. What you need to doThe following sections describe what you need to do in order to carry out the coursework. This roughly follows the steps shown inFig. 1, but you don’t have to be constrained by this or follow them in this particular order; it is just a suggestion. Also, all the R wehave done in the practical sessions (and the final sessions) should be enough to conduct the coursework, although you may need toinvestigate certain areas further that relate specifically to the problem you tackle in your investigation.3.1. Review the literature and identify research question(s)As mentioned previously, you should select a specific problem/topic related to the data (the ‘question’ stage in Fig. 1). To decidewhat area to focus on you could start by undertaking a brief review of the relevant literature around areas, such as analysis of crimedata, geographical analysis of crime, predictive policing, crime sensing, analysis of crime statistics, etc. For example, these articlesmay be a useful starting point:Vandeviver, C., and Bernasco, W. (2017) The geography of crime and crime control, Applied Geography, Volume 86, pp.220-225. (Available online: http://www.sciencedirect.com/science/article/pii/S014362281730838X)Field, S. (1992) The Effect of Temperature on Crime, The British Journal of Criminology, Volume 32, Issue 3, pp. 340–351.(Available online: https://doi.org/10.1093/oxfordjournals.bjc.a048222)Reviewing past literature will help you understand what kinds of analyses are typically undertaken using crime data and provide apossible source of ideas for what you could do with the UK Police Dataset. Examples of possible topics include, but are not restrictedto, the following: Evolution of crimes in an area over time; Trends and predictions of crimes and crime rates; Analysis of certain types of crime (e.g., vehicle crimes); Comparisons of crime types in a region; Clustering and classification of data, e.g., by type of crime; Normalisation and integration with other datasets (e.g., LSOA census statistics); Focus on a certain census dimension (e.g., age of residents in the area); Visualisation of the data (e.g., on maps).3.2. Download, pre-process and explore the dataAs well as reviewing relevant academic literature you should also download some data from http://data.police.uk/ and perform anexploratory analysis (i.e. ‘play’ with the data), to better understand the dataset and also help you to identify a particular problem ortopic you might want to focus on.This part of your investigation will include steps to pre-process and transform the data, such as cleaning up the data, dealing withmissing values, standardising numeric values, etc. This may also include combining or joining the data with further datasets, e.g.census or deprivation data. This reflects the ‘gather’ and ‘structure’ stages in Fig. 1. (Note: this part of the analysis could take a lot oftime so don’t underestimate how much time you will need to spend on this part of the coursework.)3You can also find an article describing the accuracy of the data here: https://www.tandfonline.com/doi/full/10.1080/15230406.2014.9724564https://en.wikipedia.org/wiki/List_of_police_forces_of_the_United_Kingdom3.3. Analyse and explore the dataAs you identify a topic of interest for your analysis then you should identify the most appropriate techniques (using R and associatedpackages) for carrying out your analysis and exploring the data, e.g. you might want to predict crime rates using regression orcompare levels of crime types using statistical tests. This might also be an iterative process whereby you perform some analysis andthen gather (or remove) more data. Where possible relate you analysis to the relevant literature. This relates to the ‘exploring data’stage in Fig. 3.Note that this is often an iterative process: as you explore the data you may end up re-designing your research questions, having togather more data or having to perform further cleaning as more data quality issues arise. Again, this is all a part of the data discoveryprocess.3.4. Write up your findingsOnce you have performed analysis on the data and have some results then you need to write up your investigation into a report (thisis the ‘communicate’ stage of Fig. 1). The report should be structured as outlined in Section 4. You will be evaluated on your ability toplan and undertake data analysis and exploration of crime based on the UK Police Dataset, your ability to engage with the relevantliterature, your use of R (and appropriate packages) and RStudio to process and analyse the data, and the way in which youcommunicate your findings within the report for your given problem/topic.You should also provide your R code as an appendix and marks will be awarded for your clarity, consistency and way in which youcomment your R code (see, e.g. http://stat405.had.co.nz/r-style.html). The specific style you use is not as important as how well youcomment your code so that someone INF6027作业代写、代写R/RStudio留学生作业、代做R程序设计作业、代写R实验作业 代做R语言编程|代写Pytelse can follow what you have done and being consistent in whichever style you adopt.The minimum requirement to pass is to perform at least one type of data analysis (e.g., clustering, prediction, time-series analysis,etc.) and include at least two visualisations (e.g., charts, maps, etc.) in the report. To obtain a higher mark and more effectivelycommunicate your findings, you may decide to use more than one dataset or present more than one type of data analysis and/or usemultiple visualisations. Again, you should also engage as much as possible with the appropriate literature.4. Report structureYou are required to produce a structured report that includes the sections detailed in Table 1. Overall, 90 marks will be awardedbased on the content of be awarded based on the presentation of the report and how well you communicate your findings. You muststate the word count somewhere in the report (or the coversheet). As there is a word count limit (3,000 words) you should aim tomake your writing as concise and informative as possible. Also note that your work will be assessed taking into account the wordlimit; therefore, we are not expecting detailed multiple analyses in the report; rather the emphasis should be on the clarity, accuracyand quality in communicating your findings. Note that words within tables and appendices are not included in the word count.Table 1: Required content of the structured report.Section Description Examples of what we will belooking for and mark allocation Maximum allocatedmarksStructuredabstractThis should provide a summary of your report in astructured manner, e.g. objective, methods, results,conclusions. This is not included in the word count. Brief but informative abstract that isclearly structured.Required, but 0 marksTable ofcontentsThis should include section titles and page numbers.This is not included in the word count. Clearly structured Table of Contentswith use of numbering for sections.Required, but 0 marksIntroduction andaim(s)This section should describe your selected problem ortopic addressed in the report and that forms the focusfor your data analysis. This should include a (brief)summary of the literature around analysis of crime datarelevant to your selected topic that helps to provide thebackground to your chosen topic. You should also statewhy you chose this problem/ topic and why you think itis an important topic to consider in this dataset (ideallysupport by the relevant literature) Clear statement regarding the overallgoal of your investigation. Brief literature review of data andcrime analysis. More marks for engagement with therelevant literature.10 marksMethodology This section should describe the process you haveused to gather the data, pre-process and clean thedata, conduct your analyses and visualise the data(note, you could follow the stages in Fig. 1). This will Expect to see a clear description ofmethodology used in your analyses. Clear list of the datasets used (andlinks to sources) and variables in the20 marksinclude ways in which you gathered, pre-processed,transformed, and sampled/ filtered the data.You should try to justify your choices and includereferences to relevant literature where appropriate. Thisshould also include details of the experimental setup,e.g. which R packages you have used etc. Think of itlike this, if someone else had to replicate yourmethodology have you provided enough details (andclearly enough) for them to reproduce your results.As well as describing the methodology used to generateyour results, you should list all the UK Police datasetsused (e.g., data covering different regions or timeperiods). You should also list any additional externaldatasets used (e.g., shape files or census statistics forLSOA areas). Describe all datasets used, any preprocessingand how they were joined together (e.g.,over LSOA area identifiers).dataset(s). Clear discussion of methods for preprocessingdata (and appropriate useof R packages). More marks for examples of the data. More marks for multiple data sourcesused. More marks for the range oftechniques used, appropriateness,links to supporting literature etc. (e.g.,methods for trend prediction, spatialdata analysis etc.). Techniques caninclude types of visualisation andreferences to which R libraries havebeen used More marks for the detail of thedescription provided, e.g., couldinclude use of group_by(),aggregate() etc. More marks for use of methods todeal with data quality issues, such asmissing values. More marks for discussing use ofappropriate techniques for differenttypes of data, e.g. categorical data.Results anddiscussionIn this section you should present the results of yourdata analysis and exploration (e.g., statistics, maps,trends, predictions). You should use the results toaddress the selected problem by presenting anddiscussing tables and charts as appropriate.You should present your findings in a way that helps thereader interpret the results. You should focus oneffectively communicating the results of the analysis tothe reader by highlighting the trends or patterns youhave observed during your data analysis. More marks for correct use ofstatistics and visualisations. More marks for packaging results etc.into tables rather than simply using Routput or command line code. More marks for a clear narrative andstructure (e.g., adding sections andsub-sections and guiding the readerthrough the analysis). More marks for clearly explaining theresults and graphics used (e.g., useof legends etc.). More marks for using graphics thatconvey information (e.g., combineresults) and help identify insights(e.g., use of log scales to dampeneffects of high values etc.). More marks for bringing out insightsrather than leaving the reader tointerpret the findings. More marks for not over-interpretingthe results and recognising biases. More marks for re-labelling thevariable names in graphs and tables(rather than using default names). More marks for how well the data issummarised and made accessible forcomparison.50 marksConclusion In this section you should summarise the main findingsof your analysis and lessons learned. You should statethe main message the reader should come away withfrom your analysis.You should also highlight any weaknesses of youranalysis and state what you would do to improve youranalysis if you had more time. Summary of the main findings of theanalysis with respect to the originalaim(s) of the investigation. More marks for highlightinglimitations/ weaknesses of yourmethodology and analysis. More marks for a clear set of takeawaymessages.10 marksR code You should include the full R code as an appendix. ? More marks for well-commentedcode. More marks for clarity of presentation. More marks for consistent style.5 marksPresentation The overall presentation of the report will be given a More marks for use of appropriate 5 marksseparate mark, including how well you have presentedyour results, clarity of writing and use of literature.references. More marks for clarity of writing More marks for use of appropriatecharts and tables and theirpresentation quality.5. Information School Coursework Submission RequirementsIt is the student’s responsibility to ensure no aspect of their work is plagiarised or the result of other unfair means. The University’sand Information School’s Advice on unfair means can be found in your Student Handbook, availablevia http://www.sheffield.ac.uk/is/current .Your assignment has a word count limit. A deduction of 3 marks will be applied for coursework that is 5% or more above or below theword count as specified above or that does not state the word count.It is your responsibility to ensure your coursework is correctly submitted before the deadline. It is highly recommended that yousubmit well before the deadline. Coursework submitted after 10am on the stated submission date will result in a deduction of 5% ofthe mark awarded for each working day after the submission date/time up to a maximum of 5 working days, where ‘working day’includes Monday to Friday (excluding public holidays) and runs from 10am to 10am. Coursework submitted after the maximumperiod will receive zero marks.Work submitted electronically, including through Turnitin, should be reviewed to ensure it appears as you intended.Before the submission deadline, you can submit coursework to Turnitin numerous times. Each submission will overwrite theprevious submission. Only your most recent submission will be assessed. However, after the submission deadline, the courseworkcan only be submitted once.During your first Semester at the School, when submitting a piece of work through Turnitin, you will only be able to view a ‘similarityreport’ when submitting your Test Essay. You can then edit and resubmit your Test Essay. For other coursework you will not be ableto view a Turnitin ‘similarity report’. Details about the submission of work via Turnitin can be found at: http://youtu.be/C_wO9vHHheoIf you encounter any problems during the electronic submission of your coursework, you should immediately contact the modulecoordinator and one of the Information School Exams Secretaries (Julie Priestley, [email protected], 0114 2222839 orCorrie Houton, [email protected], 0114 2222640). This does not negate your responsibilities to submit your coursework ontime and correctly. 转自:http://ass.3daixie.com/2019011359683299.html

你可能感兴趣的:(讲解:INF6027、R/RStudio、R、RR|Python)