讲解:AD699、Summary Statistics、python, C/C++ Statistics、、|

Semester Project: AD699In many ways, this assignment is intentionally open-ended -- there are several parts thatask you to decide what to do, and there is not necessarily a single right or wrong answer tothe question. As a group, you will decide things like which variables to use for your models,and which to ignore. The outcome matters, but you should focus mainly on the process.Between now and the due date, I will post helpful video tutorials in the AD699 VideoLibrary and/or pointers in the form of bullet points beneath the assignment.If any steps are unclear, or you’re not sure about how to proceed, please reach out to me orto Solomon with your questions.Step I: Data Preparation & Exploration (15 points)Start by downloading two files from Blackboard:walmart and walmart_marketbasket.I. Summary StatisticsA. Choose any five of the summary statistics functions shown in the textbook(or anywhere else) to learn a little bit about your data set. Show screenshotsof the results. Describe your findings in 1-2 paragraphs.II. VisualizationA. Using ggplot, create any 5 plots that help to describe your data (this isintentionally open-ended). Show the plots that you made. Write atwo-paragraph description that explains the choices that you made, andwhat the resulting plots show.III. Data PreparationA. Are there any missing values in your dataset? If so, how did your teamdecide to handle this issue?Step II: Prediction (25 points)I. Create a multiple regression model with the outcome variable weekly_sales thataims to predict weekly sales for any Wal-Mart stores of Type A.A. Describe your process. How did you wind up including the independentvariables that you kept, and discarding the ones that you didn’t keep? In anarrative of at least two paragraphs, discuss your process and yourreasoning.B. Show a screenshot of your regression summary, and explain the regressionequation that it generated.C. In a few sentences, describe/compare your model’s performance againsttraining data and against validation data.II. Create a multiple regression model with the outcome variable weekly_sales thataims to predict weekly sales for any Wal-Mart stores of Type B.A. Describe your process. How did you wind up including the independentvariables that you kept, and discarding the ones that you didn’t keep? In anarrative of at least two paragraphs, discuss your process and yourreasoning.B. Show a screenshot of your regression summary, and explain the regressionequation that it generated.C. In a few sentences, describe/compare your model’s performance againsttraining data and against validation data.III. Create a multiple regression model with the outcome variable weekly_sales thataims to predict weekly sales for any Wal-Mart stores of Type C.A. Describe your process. How did you wind up including the independentvariables that you kept, and discarding the ones that you didn’t keep? In anarrative of at least two paragraphs, discuss your process and your reasoning.B. Show a sAD699作业代做、代写Summary Statistics作业、python, C/C++编程设计作业代做 代写留学生creenshot of your regression summary, and explain the regressionequation that it generated.C. In a few sentences, describe/compare your model’s performance againsttraining data and against validation data.IV. Did you notice any significant differences in terms of the predictors that matteredmore for the different types of stores? If so, speculate about some of the possiblereasons why (one paragraph).Step III: Classification (30 points)I. For just the Type A stores, create categories for your weekly sales total data bybreaking each week into one of four equally-sized groups: Great Week, Good Week,Mediocre Week, and Lousy Week.A. Select any four features from your dataset to use as predictors. Build andrun a k-nearest neighbors model that takes a hypothetical “type” of weekthat you’ve created, and classifies it into one of the four bins that you built inthe previous step.B. Write a two-paragraph narrative that describes how you did this. In yournarrative, be sure to mention how you arrived at the particular k value thatyou used.II. Naive Bayes/Classification Trees.A. For just the type B stores, using any four predictors, build another model,using either a naive bayes or classification tree algorithm, that attempts topredict which bin your hypothetical week (the one you created for your k-nnmodel) would fall into.B. Show a screenshot of the code you used to build your model, the code youused to run the algorithm, and code you used to assess the algorithm.C. Write a two-paragraph narrative that describes how you did this. In yournarrative, be sure to talk about things like factor selection and testing againstyour training data.Step IV: Clustering (10 points)I. Perform either a k-means analysis or a hierarchical clustering analysis to grouptransactions from the walmart_marketbasket dataset. You may wish to usevariables such as day of week, number of transactions, number of returns, anddepartments involved.II. Show your code and results, and write two paragraphs describing what youincluded in your model, and what you found about the transactions. You do notneed to bring in any outside sources here to assess your findings.Please hold any clustering-related questions until after we have gone over clustering inclass.Step V: Conclusions (20 points)I. Write a 3-5 paragraph summary that describes your overall process and experiencewith this assignment. You already summarized your specific steps in some otherparts of the write-up, so focus on the big picture here. Use this section to focus onsome of the big-picture takeaways, conclusions, etc. Excellent conclusions willinclude some original thoughts and analysis, rather than just merely recount thesteps taken in the project.Submit your final report as a PDF to Blackboard before the deadline listed on theassignment.Step VI: PresentationI. Summarize your findings and the overall process of analyzing this data set. Uploadyour slides to Blackboard before the deadline listed in the assignment folder.转自:http://ass.3daixie.com/2018113010001298.html

你可能感兴趣的:(讲解:AD699、Summary Statistics、python, C/C++ Statistics、、|)