讲解:MA308、Software、R、RR|R

MA308: Statistical Calculation and SoftwareAssignment 2 (Oct 9– Nov 6, 2019)2.1 For the “galton” dataset from Using R package,(a) What will be the conclusion for testing the height of the child at ↵ = 0.05 levelof significance,H0 : µ = 68, v.s. H1 : µ 6= 68,given that variance is known to be 1.7873.(b) If the variance is unknown in (a), carry out the likelihood-ratio test and drawthe conclusion at ↵ = 0.05 level of significance. Compare the result with thatof using t-test.(c) Test whether the height of children and parents have the same mean value at↵ = 0.05 level of significance. What if there is a “pairing” between the heightof the child and parent?(d) In order to understand how parent’s height e↵ect a child’s height, first obtain ascatter plot for child against parent, then obtain the Nadaraya-Watson kernelestimator with the choice of two di↵erent kernels by implementing NadarayaWatsonKernel Regression analysis.(e) Test whether the spread of heights for the “parent” group and “child” groupare the same or not.2.2 This question should be answered using the “Carseats” data set.(a) Test whether Sales follow normal distribution.(b) Fit a multiple regression model to predict Sales using Price, Urban, and US.(c) Provide an interpretation of each coecientin the model. Be careful some ofthe variables in the model are qualitative!2(d) Write out the model in equation form, being careful to handle the qualitativevariables properly.(e) For which of the predictors can you reject the null hypothesis H0 : j= 0?(f) On the basis of your response to the previous question, fit a smaller model thatonly uses the predictors for which there is evidence of association with theoutcome.(g) How well do the models in (b) and (f) fit the data?(h) Using the model from (f), obtain 95% confidence intervals for the coecient(s).(i) Is there evidence of outliers or high leverage observations in the MA308代写、Software代做、代做R编程设计、代写Rmodel from (f)?(j) There is an indicator “Urban” in the “Carseat” data set, compare the meanSales of the “Urban” area with that of the “Rural” area, show the results ofthe likelihood ratio test and the Mann-Whitney test for testing the equality ofthese two mean values. Can we use the Wilcoxon’s Signed-Rank test? Why?2.3 This question should be answered using the weekly.csv data set.(a) Produce some numerical and graphical summaries of the Weekly data. Do thereappear to be any patterns?(b) Use the full data set to perform a logistic regression with Direction as theresponse and the five lag variables plus Volume as predictors. Use the summaryfunction to print the results. Do any of the predictors appear to be statisticallysignificant? If so, which ones?(c) Compute the confusion matrix and overall fraction of correct predictions. Explainwhat the confusion matrix is telling you about the types of mistakes madeby logistic regression.(d) Now fit the logistic regression model using a training data period from 1990 to2008, with Lag2 as the only predictor. Compute the confusion matrix and theoverall fraction of correct predictions for the held out data (that is, the datafrom 2009 and 2010).32.4 The “galaxies” data set from MASS package the velocities of 82 galaxies from sixwell-separated conic sections of space (Postman et al., 1986, Roeder, 1990). Thedata are intended to shed light on whether or not the observable universe containssuperclusters of galaxies surrounded by large voids. The evidence for the existence ofsuperclusters would be the multimodality of the distribution of velocities. Constructa histogram of the data and add a variety of kernel estimates of the density function.Estimate the density function of the “galaxies” data using histogram smoothing,and uniform, Epanechnikov, biweight, and Gaussian kernels. What do you concludeabout the possible existence of superclusters of galaxies?转自:http://www.3daixie.com/contents/11/3444.html

你可能感兴趣的:(讲解:MA308、Software、R、RR|R)