讲解:IIMT2641 R 、R、RSQL|Prolog

IIMT2641Introduction to Business Analytics Due November 7Fall 2019Assignment 4In this problem, we will practice building CART models with a continuous outcome, using the datasetStateData.csv which has data from 1970s on all fifty US states. A description of the variables in the dataset isgiven in Table 1.Variable DescriptionPopulation Population estimate of the state in 1975.Income Per capita income in the state in 1974.Illiteracy Illiteracy rates in 1970, as a percentage of the state’s population.LifeExp The life expectancy in years of residents of the state in 1970.MurderThe murder and non-negligent manslaughter rate per 100,000population in 1976.HighSchoolGrad The high-school graduation rate in the state in 1970.FrostThe mean number of days with minumum temperature belowfreezing from 1931 to 1960 in the capital or a large city of the state.Area The land area (in sqaure miles) of the state.Longitude The longitude of the center of the state.Latitude The latitude of the center of the state.RegionThe region (Northeast, South, North Central, or West)that the state belongs to.Table 1: Variables in the dataset StateData.csv.(a) Let us start by building a linear regression model. Randomly split the dataset into a training set (70%)and a test set (30%).(i) First, build a linear regression model to predict LifeExp using the following several variablesas the independent variables: Population, Murder, Frost, Income, Illiteracy, Area, andHighSchoolGrad. Use the training dataset to build the model. What is the R2 of the model代做IIMT2641 R 语言、代做R、代做代写R 代做数据 onthe test set?(ii) Now, build a linear regression model to predict LifeExp the following four variables as theindependent variables: Population, Murder, Frost, and HighSchoolGrad. Again, use thetraining dataset to build the model. What is the R2 of the model on the test set?(iii) Compare these two models. What are we achieving by removing independent variables? Whatis the equivalent procedure in a CART model?(b) Now, build a CART model to predict LifeExP using the following seven variables as the independentvariables: Population, Murder, Frost, Income, Illiteracy, Area, and HighSchoolGrad. Setthe parameter minbucket to be 5. Make sure that you are building a regression tree, and not aclassification tree, by setting the argument method to “anova” instead of “class”.IIMT2641Introduction to Business AnalyticFall 2019Assignment 4(i) Plot the trees. Which of the independent variables appear in the tree? Do you find the linearregression model or the CART model easier to interpret?(ii) Compute the predicted life expectancies for the test dataset using the CART model, and calculatethe R2 of the predictions.(c) Now, build a random forest model to predict LifeExP using the same severn variables as the inde?pendent variables. Set the parameter nodesize to 5. Compute the predicted life expectancies forthe test dataset using the random forest model, and calculate the R2 of the predictions.(d) Which of the four models you built do you think is the best model, if out-of-sample accuracy is themost important. How about if interpretability is the most important?转自:http://www.3daixie.com/contents/11/3444.html

你可能感兴趣的:(讲解:IIMT2641 R 、R、RSQL|Prolog)