Guideline of R ProjectOverviewIn this project, you will use R to formulate and answer a series of specific questions about a data setof your choice. You are expected to:- Form a group of 5 teammates.- Only 1 group can be of 4, under the special permission from the instructor.- Identify a dataset of interest- Perform exploratory analysis with R to understand the data- Investigate hypotheses (i.e., potential questions you want to answer by analyzing this dataset),and develop preliminary insights- Prepare a report in Word: include a set of at least 6 visualizations that illustrate your findings,and interpret these visualizations- Prepare a presentation in PPT: to share your findings to classFinal Deliverables and Important Dates1. Proposal- A 1-page proposal consisting of Title team formation Dataset of your choice Background information: what it is about? What attributes/fields are available, how many records? The source of the dataset (e.g., web link) Sample records (e.g., the first 10) Propose an initial set of at least 3 questions you’d like to investigate You also need to submit the downloaded raw dataset with your proposal- due on Nov. 15, submit to Moodle.2. Presentation- Nov. 29 (the last class)- see details below3. R project of solution- A self-contained project file with source code and raw data- due on Dec. 3- submit to Moodle by the team leader4. Final report- See details below- due on Dec. 3, submit to Moodle- submit to Moodle by the team leader5. Peer evaluation- See details below- due on Dec. 3- submit individually to Moodle2 | P a g eDetailsData Selection and Preparation- First, choose a topic of interest to your team and find a dataset that can provide insights intothat topic. See recommended sources at the end of this guideline.- Please check with the instructor to ensure it is appropriate for this assignment, and write a 1-page proposal- Be advised that data collection and preparation (also known as data wrangling) can be a verytime-consuming process. Be sure you have sufficient time to conduct exploratory analysis,after preparing the data.Exploratory and Visual AnalysisYou are expected to perform an exploratory analysis of your dataset using R. You should considertwo different phases of exploration.- In the first phase, you should seek to gain an overview of the shape & structure of your dataset.What variables does the dataset contain? How are they distributed? Are there any notable dataquality issues? Are there any surprising relationships among the variables?- In the second phase, you should investigate your initial questions, as well as any new questionsthat arise during your exploration, if any. For each question, start by creating a visualizationthat might provide a useful answer. Then refine the visualization (for example, by addingadditional variables, changing sorting or axis scales, filtering or subsetting data, etc.) todevelop better perspectives and explore unexpected observations. You should repeat thisprocess for each of your questions, but feel free to revise your questions or branch off toexplore new questions if the data allow.Group Presentation- Design your presentation slides- Presentation: a 5-minutes storytelling of your work; 2 minutes for Q&A- Introduce your data and background information, hypotheses/questions, results, anddiscuss limitations/future directions.- Try to make it interesting and rich in information (if time allows).- Do NOT highlight the technical details of your work (such as code, functions, specialtricks, etc.) during the presentation. Focus on storytelling.- Due to the short time available, choose 1 or 2 representatives to present. However, allmembers must attend and prepare for Q&A.Coding- This is an R project, you are expected to use R to process data and present results throughoutthe entire project (rather than Excel, Power BI, etc.)- Create a self-contained R project folder (refer to the structure requirement in the first Rassignment)- Provide appropriate comments to your code- Working code – your code should run without any error (tip: try it on different computers)- Results should be consistent with those in your report and presentation- Zip the代做data、R设计代写、Moodle代做、R程序语言调试whole project directory into a compressed package, and submit to Moodle, including- Your raw data3 | P a g e- Your code- Anything else you useFinal ReportYour final submission will be a written report. Focus on the answers to your initial questions. Ifapplicable, describe surprises as well as challenges encountered along the way, e.g. data qualityissues. Each visualization image should be accompanied with a title and short caption (sentences). Provide sufficient detail for each caption such that anyone could read through yourreport and understand your findings. Feel free to annotate your images to draw attention to specificfeatures of the data.- Recommended report outline (revise or enhance if needed) Title page. (report title and team members) Abstract (No more than 150 words) Data descriptions – introducing the dataset and related background information. Youshould indicate the source of data. Research Questions – introducing the questions you want to answer, and themotivation. Results – analytical results and visualizations Summary – briefly summarize and discuss your findings Future Work - A description of how your solution could be extended or improved References – literatures you have used Do NOT put code into this report. The code should be submitted separately.General Grading Criteria- Poses clear questions applicable to the chosen dataset.- Appropriate data wrangling (preprocessing) and exploratory data analysis (EDA)- Breadth and depth of analysis- Expressive & effective visualizations appropriate to analysis questions.- Clearly written, understandable captions that communicate primary insights.- Originality. Submissions will be checked by Turnitin for originality report. Remember to citeproperty for any references.Detailed Grading Components (totally 100 points)o Part 1: proposal (10 points)o Part 2: report (30 points)- In general, the report will be graded on its content (correctness and accuracy), breadth anddepth of discussion, report structure, originality, and writing quality.o Part 3: presentation content (20 points, delivered by 1 or 2 representatives)- Slide design- Correct and accurate information, logical arguments- Content richness (relevant and rich information, well-defined terms)- Presentation delivery (preparation, expression clarity)- Ability to answer questions- Time managemento Part 4: coding (30 points)4 | P a g e- Working code- Code readability, necessary comments- Output consistent with report- Originality- A well-structured self-contained projecto Part 5: peer evaluation (10 points, individual-based evaluation)- The evaluation in this part is based on the average contribution percentage (CP) throughintra-group peer evaluation. Each student is expected to submit his/her evaluation separatelyto Moodle.- Your CP = average(intra-group evaluation of your contribution)- For group of 5, for example, the equal-contribution percentage (ECP) is 100% ÷ 5 = 20%- You may gain all 10 points if your CP = ECP. You may gain as high as 15 points in this part,if your CP is significantly higher than ECP; and as low as 0 points in this part, if your CP issignificantly lower than ECP.Data Sources- Open databaseso Kaggle datasetso Awesome Public Datasets: topic-centric list of high-quality open datasets in public domainso Macau government open database: Macau regional statisticso Chinese government open databases: Provided by Chinese National Statistical Bureauo Databases in business-related subjects: commercial databases available in UM library, onlyaccessible in UM- Unopen datasetso You may also choose datasets that are not open to public. In such a case, please indicate thesource of data.- Notes and hintso You are recommended to choose a business-related dataset; Interesting datasets in otherdomains are also good choices.o You are not recommended to choose datasets in a highly specialized domain (e.g., biology,physics, etc.), unless you are very familiar with this domain.o Choose the dataset that comes with sufficient descriptions and/or background information.It is not wise to choose a dataset with little additional descriptions. As such you will have toguess the meaning of its attributes and values.转自:http://www.daixie0.com/contents/18/4358.html