Math 185 Final Project (Due December 8)Problem 1The baseball dataset consists of the statistics of 263 players in Major LeagueBaseball in the season 1986. The dataset (hitters.csv) consist of 20 variables:Variable DescriptionAtBat Number of times at bat in 1986Hits Number of hits in 1986HmRun Number of home runs in 1986Runs Number of runs in 1986RBI Number of runs batted in in 1986Walks Number of walks in 1986Years Number of years in major leaguesCAtBat Number of times at bat during his careerCHits Number of hits during his careerCHmRun Number of home runs during his careerCRuns Number of runs during his careerCRBI Number of runs batted in during his careerCWalks Number of walks during his careerLeague A factor with levels A (coded as 1) and N (coded as 2) indicatingplayer’s league at the end of 1986Division A factor with levels E (coded as 1) and W (coded as 2) indicatingplayer’s division at the end of 1986PutOuts Number of put outs in 1986Assists Number of assists in 1986Errors Number of errors in 1986Salary 1987 annual salary on opening day in thousands of dollarsNewLeague A factor with levels A (coded as 1) and N (coded as 2) indicatingplayer’s league at the beginning of 1987In this problem, we use Salary as the response variable, and the rest 19 variablesas predictors/covariates, which measure the performance of each player in season 1986 and his whole career. Write R functions to perform variable selection usingbest subset selection partnered with BIC (Bayesian Information Criterion):1) Starting from the null model, apply the forward stepwise selection algorithm toproduce a sequence of sub-models iteratively, and select a single best modelusing the BIC. Plot the “BIC vs Number of Variables” curve. Present the selectedmodel with the corresponding BIC.2) Starting from the full model (that is, the one obtained from minimizing theMSE/RSS using all the predictors), apply the backward stepwise selectionalgorithm to produce a sequence of sub-models iteratively, and select a singlebest model using the BIC. Plot the “BIC vs Number of Variables” curve. Presentthe selected model with the corresponding BIC.3) Are the selected models from 1) and 2) the same?Problem 2In this problem, we fit ridge regression on the same dataset as in Problem 1. First,standardize the variables so that they are on the same scale. Next, choose a grid of values ranging from = 1010 to = 10−2, essentially cov代做Math 185、代写dataset、R程序语言调试、Rering the full range ofscenarios from the null model containing only the intercept, to the least squares fit.For example:> grid = 10^seq(10, -2, length=100)1) Write an R function to do the following: associated with each value of ,compute a vector of ridge regression coefficients (including the intercept),stored in a 20 × 100 matrix, with 20 rows (one for each predictor, plus anintercept) and 100 columns (one for each value of ).2) To find the “best” , use ten-fold cross-validation to choose the tuningparameter from the previous grid of values. Set a random seed – set.seed(1),first so your results will be reproducible, since the choice of the cross-validationfolds is random. Plot the “Cross-Validation Error versus ” curve, and report theselected .3) Finally, refit the ridge regression model on the full dataset, using the value of chosen by cross-validation, and report the coefficient estimates.Remark: You should expect that none of the coefficients are zero – ridge regressiondoes not perform variable selection. Problem 3In this problem, we revisit the best subset selection problem. Given a responsevector = (1, … , )and an × design matrix = (1, … , ) with =(1, … , ). For 1 ≤ ≤ , let ̂0, ̂ be the solution to the following sparsityconstrainedleast squares problem:Based on the property ̂0 = ̅ − ̅̂, we can center and first to get rid of theintercept, where ̃ and ̃ represent the centered and , respectively. To solve this, weintroduce the Gradient Hard Thresholding Pursuit (GraHTP) algorithm. Let () =∥ ̃ − ̃ ∥22⁄(2) be the objective function.GraHTP Algorithm.Input: ̃, ̃, sparsity , stepsize > 0(Hint: normalize the columns of ̃ to have variance 1).Initialization: 0 = 0, = 1.repeat1) Compute ̃ = −1 − ∇(−1);2) Let = supp(̃, ) be the indices of ̃ with the largest absolute values;3) Compute = argmin{(); supp() ⊆ }; = + 1;until convergence, i.e. ∥ − −1∥2Output: .1) Write an R function to implement the above GraHTP algorithm.2) Consider again the baseball dataset in Problem 1 with = 263, = 19. For =1, … , , use the above function to find the best -sparse model, denoted by ℳ.Then use BIC to select a single best model among ℳ1, … ,ℳ.3) Compare your result with those obtained in Problem 1. 转自:http://www.daixie0.com/contents/18/4370.html