STAT 361 开发总结

STAT 361 (Fall 2021)
Assignment 3
The assignment is due on Nov. 04 (Thursday) at 23:00 (time of Kingston Ontario). Please submit to
Crowd Mark.
Guidelines for Preparing Solutions
For questions that needs R coding, please only include the important R output and the necessary results in
the main text of your solutions. Present them in a clear and concise fashion (for example, tabulate models
and output).
Give descriptions and discussions for your important exploration and findings.
Put long code and output in an Appendix, at the end of EACH problem.
These Appendix sections will NOT be marked, but will be checked as evidence of your independent work.
Prepare your assignment solutions so that it is easy for the readers (in this case, TAs) to follow, without
having to search everywhere for your answers from lengthy code and output.

  1. Consider the multiple regression model Y = Xβ +, where  ∼ MVNn(0, σ2
    I). See descriptions of model
    forms (1) and (2) in Chapter 4.
    (a) Show that the residual vector r = (I − P)Y, where P = X(XT X)−1XT, and show that I − P is also a
    projection matrix.
    (b) Let U = (βˆ, r)T. Find the joint distribution of the random vector U. It may be helpful to notice that.
    (c) Show that βˆ and r are independent.
    Hint: For (b) and (c), properties of multivariate normal distribution may be useful.
  2. Consider the “Savings.txt” data posted. It is an economic dataset collected in 48 different countries. The
    variable “sr” is ratio of savings (aggregate personal saving divided by disposable income). The variables
    “pop15” and “pop75” are percentages of population under 15 and over 75 respectively. The variable “dpi”
    is disposable income (per-capita, in dollars) while the variable “ddpi” is the rate (percent) of change in
    disposable income (per capita).
    (a) Draw scatter plot matrix for all the variables involved. Comment on the possible relationships between
    variables, focus on those appear interesting to you.
    (b) Fit a simple linear regression model with disposable income (“dpi”) as response and percentage of population
    under 15 as the only covariate. Describe the model clearly. Report and interpret the fitted model:
    is there a significant association between the variables, is this what you expect?
    (c) Fit a regression model with ratio of savings (Y , “sr”) as the response, and all other variables as the
    covariates. Describe the model clearly, report and discuss the fit of the model. Interpret the estimated
    coefficient for the rate of change in disposable income.
    (d) Is it reasonable to drop the covariate disposable income (“dpi”) from the model in (c)? Support your
    answer with a test, describe the test procedure and results clearly; also calculate a confidence interval for
    the regression coefficient for this covariate.
    Added Note: Test at level 0.05, and construct a 95% confidence interval.
    (e) Based on the model for (c), obtain a 95% prediction interval for the ratio of savings of a country with
    x = (20, 3.2, 2200, 2.1)T
    for “pop15”, “pop75”, “dpi”, “ddpi” respectively.
  3. Four objects are weighed 2 at a time on a spring balance. Denote the 4 unknown weights by β1, . . . , β4.
    Six observations are made and are expressed in these forms:
    Y1 = β1 + β2 + 1,
    Y2 = β1 + β3 + 2,
    Y3 = β1 + β4 + 3,
    Y4 = β2 + β3 + 4,
    1
    Y5 = β2 + β4 + 5,
    Y6 = β3 + β4 + 6.
    Assume that i
    iid∼ N (0, σ2
    ), i = 1, . . . , 6.
    (a) Find expressions for the least squares estimators β1, . . . , β4 (specify the expressions in terms of Y1, . . . , Y6).
    (b) Find an expression for Cov(βˆ) (specify matrix entries, may involve σ2).
    (c) Find expressions for the residuals (specify the expressions in terms of Y1, . . . , Y6).
    (d) Create a small data set for this study, for (Y1, . . . , Y6) = (5, 8, 6, 7, 10, 9). Use lm() function in R to fit
    the data. Check the results for (a), (b) and (c). Does the output from lm() fit agree with the corresponding
    calculation results for the data set based on the expressions you derived above?
    (e) Explain how you will construct a 95% confidence interval for β1 + β2. We can still use the tn−k distribution.
    Find the confidence interval for the given data.

你可能感兴趣的:(后端)