Topics covered this week
STA 137
Winter Quarter, 2019
Monday, January 7 Times series examples (Handout 1).
Wednesday, January 9 Review of regression (Handout 2).
Friday, January 11 Review of regression (Handouts 2 and 3).
Homework 1: Due on Friday, January 18
You may form a group of 3 students registered in this course and submit
one completed homework for the group. The front page should display only
the names of the students in the group. The actual work should start from the
second page
You will and a data set (JobProaciency) on the job prociency of 25 applicants
for entry level clerical positions in a government agency. The scores
on four tests (X1; X2; X3; X4) and the job proaciency scores (Y ) for the 25 applicants
are given in the data set. A multiple regression is to be ?tted to this
data
Yi = 0 + 1Xi1 + 2Xi2 + 3Xi3 + i4Xi4 + "i
; i = 1; : : : ; n = 25;
where f"ig are independent N(0; 2
) variables.
1. (a) Obtain a histogram for each of the variables. Are there noteworthy
features in the plots? Comment.
(b) Obtain a matrix plot of the data (ie, plot all the variables against each other
(R command: pairs)). Also obtain the correlation matrix. What do the plots
suggest about the nature of relationship between Y and each of the predictor
variables? Discuss. Does it seem that there is a problem of multicollinearity?
Explain.
(c) Fit a multiple regression model to the data. Obtain the parameter estimates,
their standard errors, analysis of variance table, R2 and R2
adj .
(d) Does it seem that all the independent variables need to be retained in the
regression model? If you consider deleting only one independent variable, which
is the best candidate for deletion? Explain your answers.
2. The questions here are on the atted model in (1c).
(a) Obtain a plot of the observed against the atted Y values. Also plot the
residuals against the atted values. Does it seem that the ?tted model is reasonable?
Do you suspect any nonlinearity? Is the assumption of equal variance of
the errors (ie, "iís) reasonable here? Explain your answers.
1(c) Obtain a histogram of the residuals. Also obtain a normal probability plot of
the residuals, and the correlation between the residuals and the normal scores.
Is the assumption of normality of the errors reasonable? Explain.
3. (a) This question is on model selection by backward elimination. Starting
with the full model, delete one variable at a time. At each step, drop the
variable that is the best candidate for deletion. In this way, you will have 5
models: the largest one with all 4 independent variables, and the smallest one
has none. For each model, and the AIC and BIC values. Find the best model(s)
selected by the AIC and BIC criteria. Fit these anal selected model(s), obtain
the parameter estimates, their standard errors, R2 and R2
adj .
(b) Use the AIC and the BIC criteria to select the best among all possible regression
models. Fit these anal selected model(s), obtain the parameter estimates,
their standard errors, R2 and R2
adj .