讲解:ACCT648、Data Analysis、R、RR|Web

Term 1, 2019/2020ACCT648 Applied Statistics for Data AnalysisAssignment 3Deadline of Submission: Upload your answer file in word-format on 6 November2019 before 5pm in e-Learn, and submit the hard copy during class on that day1. The owner of a moving company typically has his most experienced manager predict thetotal number of labor hours (Hours) that will be required to complete an upcoming move.This approach has proved useful in the past, but the owner has the business objectiveof developing a more accurate method of predicting labor hours. In a preliminary effortto provide a more accurate method, the owner has decided to use the number of cubicfeet moved (Feet), the number of pieces of large furniture (Large) and whether there isan elevator in the apartment building (Elevator) as the independent variables and hascollected data for moves in which the origin and destination were within the boroughof Manhattan in New York City and the travel time was an insignificant portion of thehours worked. The data are organized and stored in Moving2019.csv.(a) Find the multiple regression equation L1 with all the three main independent variables.(b) Find the multiple regression equation L2 with all the three main independent variableswith the interaction effect of Feet and Elevator.(c) Find the multiple regression equation L3 with all the three main independent variableswith the interaction effect of Large and Elevator.(d) Find the multiple regression equation L4 with all the three main independent variableswith the interaction effect of Feet and Large.(e) When comparing all four regression models: L1, L2, L3, L4, explain why model L3is the best model.(f) Perform a residual analysis on the model L3 and determine whether the regressionassumptions are valid.(g) Construct a 95% prediction interval estimate for the labor hours for moving 420cubic feet with 2 large furniture in an apartment building that does not have anelevator in model L3(h) Construct a 95% confidence interval estimate for the average labor hours for moving400 cubic feet with 3 large furniture in an apartment building that has an elevatorin model L3(i) True or False: For a fixed value of cubic feet and at least one large furnituresituations, the total number of labor hours to move in the building with elevatoris on average less than the number of labor hours to move in the building withoutelevator under model L3. Justify your answer.12. Based on data set given in Question (1),(a) Fit the multiple regression equation to predict the total number of labor hours withall independent variables by using the Forward Selection and BIC criterion on thetraining set. Plot the graph to show the number of variables versus BIC in eachselection step.(b) Fit the multiple regression equation to predict the total number of labor hourswith all independent variables by using the Best Subset Selection with adjusted R2criterion on the training set. Plot the graph to show the number of variables versusadjusted R2in each selection step.(c) Use the 5-fold cross-validation approach to fit the models of L1, L2, L3 and L4 anddetermine which model is the best under the criterion of their associated crossvalidationerrors. (Note: use set.seed(1208))(d) Use the Leave-One-Out cross-validation approach to fit the models of L1, L2, L3 andL4 and determine which model is the best under the criterion of their associatedcross-validation errors. (Note: use set.seed(5623))3. Suppose we collect data for a group of 130 students in a statistical class with twoindependent variables X1 = average studying hours per week, X2 = GPA, and onedependent variable Y = Pass (or Fail).We fit a logistic regression model: log(odds ratio) = β0+β1X1+β2X2 ACCT648代写、代做Data Analysis、R编程设to predict whethera student will pass the course. R-outputs produce estimated coefficients, βˆ0 = −9.5447,βˆ1 = 0.5709, and βˆ2 = 1.0682. The observations of the first five students are given asfollows:Student Y X1 X21 Pass 9.4 3.032 Pass 14.5 3.523 Pass 12.2 3.144 Fail 8.4 2.765 Fail 11.3 3.20(a) Based on the estimated logistic regression model, predict the probability that astudent who studies 11 hours per week on average and has a GPA of 3.40 will passthe course.(b) At least how many hours would the student in part (a) need to study to have morethan 70% predicted chance of passing the course?(c) Find the deviance residues of the first five observed students.(d) By using the estimated logistic regression model with the threshold value being0.55 for classification of passing the course, determine whether the model makesany error to predict each of the above five observed students. If there is an error,determine what type of error as well.24. The stock prices of Singapore Telecommunications Limited (SingTel) with code (Z74.SI)and Singapore Airlines Limited (SIA) with code (C6L.SI) from 27 August 2018 to 29July 2019 are stored in SingTelSIA2019.csv. Suppose a portfolio investment has 8,000shares of SingTel at price of $3.34 per share and 5,000 shares of SIA at price of $9.42per share on 29 July 2019. Therefore, the portfolio investment has value of $73,820(8, 000 × 3.34 + 5, 000 × 9.42) on 29 July 2019.(a) Based on the historical approach without any assumption of distribution, calculatethe one-day 99% VaR for this portfolio on 29 July 2019.(b) Without any assumption of distribution, estimate the one-day 99% VaR for thisportfolio on 29 July 2019 based on the Bootstrap approach with 100,000 repetitions.(Note: use set.seed(5483))(c) Obtain a 95% Bootstrap percentile confidence interval for the one-day 99% VaR forthis portfolio on 29 July 2019.5. The director of undergraduate studies at a college of business wants to predict whetherstudents in a BBA program can graduate with a honor degree using independent variables,High school grade point average (GPA), SAT score, gender, and local citizen.Data from a random sample of 90 students, organized and stored in BBA2019.csv,show that 46 successfully completed the program with honor degrees (coded as Yes) and44 without honor degrees (coded as No) under the variable column Graduate.(a) Develop a logistic regression model, L1, to predict the probability of successfullycompleted the BBA program with honor degrees, based on all independent variables.(b) Develop the other logistic regression model, L2, to predict the probability of successfullycompleted the BBA program with honor degrees, based on the SAT, Gender,and Local independent variables.(c) Develop the other logistic regression model, L3, to predict the probability of successfullycompleted the BBA program, based on the SAT and Local independentvariables.(d) Develop the other logistic regression model, L4, to predict the probability of successfullycompleted the BBA program, based on the SAT independent variables.(e) Explain why model L4 is the best model among the four models considered. At the0.05 level of significance, is there evidence that a logistic regression model L4 is agood fitting model?(f) Predict the probability of successfully completed the BBA program with honordegree given that a male local citizen with GPA 3.45 and SAT score 1330 undermodel L4.(g) Find the confusion matrix of model L4 with the threshold value 0.6 for classifyingstudents successfully completed the BBA program with honor degrees.(h) Find the sensitivity, specificity and total error rate of the model L4 with the thresholdvalue 0.6.-END-3转自:http://www.3daixie.com/contents/11/3444.html

你可能感兴趣的:(讲解:ACCT648、Data Analysis、R、RR|Web)