讲解:MAT 4378、categorical data、R、RPython|R

MAT 4378 – MAT 5317, Analysis of categorical data, Assignment 3 1MAT 4378 – MAT 5317, Analysis of categorical dataAssignment 3Due date: in class on Monday, November 18, 2019Remark: You can use R for your computations for Questions 2 to 4. If you useR please provide the output. However, the R output is not an answer to a question.Please provide one or two sentences to properly answer the question.1. Consider a ratio estimator h(ˆθ1,ˆθ2) = ˆθ1/ˆθ2, where the estimated variancecovariance2. A carefully controlled experiment was conducted to study the effect of the size ofthe deposit level on the likelihood that a returnable one-liter soft drink bottlewill be returned. The data to follow show the number of bottles that werereturned (Wi) out of 500 sold (ni) at each of size deposit levels (Xiin cents):Deposit level xi 2 5 10 20 25 30Number sold ni 500 500 500 500 500 500Number returned wi 72 103 170 296 406 449An analysist believes that a logistic regression model is appropriate for studyingthe relation between the size of the deposit and the probability a bottle will bereturned.(a) Find the maximum likelihood estimates for β0 and β1. Give the estimatedregression model.(b) Obtain a scatter plot of the sample proportions against the level of thedeposit, and superimpose the estimated logistic response onto the plot.Does the fitted logistic response function appear to fit well?(c) Obtain exp(βˆ1) and interpret this number.(d) What is the estimated probability that a bottle will be returned when thedeposit is 15 cents?(e) Estimate the amount of deposit for which 75% of the bottles are expectedto be returned.MAT 4378 – MAT 5317, Analysis of categorical data, Assignment 3 2(f) In part (e), we have an estimate ˆx = g(βˆ0, βˆ1) for the level of the depositthat corresponds to π = 75% of the bottles are returned. This estimator isa non-linear function of βˆ0, βˆ1. Use the delta-method to find an asymptoticestimated standard error for this estimate. Hint: It will be helpful touse the function vcov on your glm object. Furthermore, to multiply thematrices A and B with R use A %*% B.3. A marketing research firm was engaged by an automobile manufacturer to conducta pilot study to examine the feasibility of using logistic regression forascertaining the likelihood that a family will purchase a new car during thenext year. A random sample of 33 suburban families was selected. Data onannual family income (x1, in thousands of dollars) and the current age of theoldest family automobile (x2, in years) were obtained. A followup interviewconducted 12 months later was used to determine whether the family actuallypurchased a new car (y = 1) or did not purchase a new car (y = 0) during theyear. The data is found in the file CarPurchase.csv.(a) Find the maximum likelihood estimates of β0, β1, and β2. State the estimatedlogistic regression model.(b) Obtain exp(βˆ1) and exp(βˆ2) and interpret these numbers.(c) What is the estimated probability that a family with annual income of $50thousand and an oldest car of 3 years will purchase a new car next year?4. Rather than finding the probability of success at an explanatory variable value,it is often of interest to find the value of an explanatory variable given a desiredprobabMAT 4378代做、代写categorical data、ility of success. This is referred to as inverse prediction. One applicationof inverse prediction involves finding the amount of pesticide or herbicide neededto have a desired kill rate when applied to pests or plants. The lethal dose levelxπ (commonly called “LDz”, where z = 100 π is defined asxπ =(cloglog(π) − β0)β1for the complementary log-log regression modelcloglog(π) = β0 + β1 x.(a) Show how xπ is derived by solving for x in the complementary log-logregression model.(b) We can obtain 95% confidence interval for xπ as follows:Describe how this confidence interval for xπ is derived. (Note that there isgenerally no closed-form solution for the confidence interval limits, whichleads to the use of iterative numerical procedures.)MAT 4378 – MAT 5317, Analysis of categorical data, Assignment 3 3(c) Turner et al. (1992) uses logistic regression to estimate the rate at whichpicloram, a herbicide, kills tall larkspur, a weed. Their data was collectedby applying four different levels of picloram to separate plots, and thenumber of weeds killed out of the number of weeds within the plot wasrecorded. The data are in the file picloram.csv. Complete the following:(i) We will use a cloglog model instead of a logistic regression model. Givethe estimated complementary log-log model.(ii) Compute eβˆ1 and interpret this number within the context of the problem.(iii) Plot the observed proportion of killed weeds and the estimated model.Describe how well the model fits the data.Note: Here are some commands that you might find helpful. We areassuming that the dataframe is called picloram.data and that thefitted model is called mod.## plot proportions versus xwith(picloram.data, plot(x = picloram, y = kill/total,xlab = Picloram, ylab = Proportion of weeds killed,panel.first = grid(col = gray, lty = dotted)))# Put estimated esimated response on the plotcurve(expr = predict(object = mod,newdata = data.frame(picloram = x), type = response),col = red, add = TRUE)(iv) Estimate the 0.9 kill rate level “LD90” for picloram. Add lines to theplot in (iii) to illustrate how it is found (the segments() function canbe useful for this purpose).(v) We are assuming that your fitted model is the glm object mod. Usethe following commands to compute a 95% confidence interval for the0.9 kill rate. Note: The function uniroot solves for the root of afunction over an interval.b0 = summary(mod)$coefficients[1,1]b1 = summary(mod)$coefficients[2,1]LD.xroot.func beta.hat cov.mat var.den 2*x*cov.mat[1,2]abs(beta.hat[1] + beta.hat[2]*x - log(-log(1-pi0)))/sqrt(var.den) - qnorm(1-alpha/2) }lower c(min(picloram.data$picloram), LD.x),mod.obj = mod, pi0 = 0.9, alpha = 0.05)MAT 4378 – MAT 5317, Analysis of categorical data, Assignment 3 4upper c(LD.x, max(picloram.data$picloram)),mod.obj = mod, pi0 = 0.9, alpha = 0.05)lower$rootupper$root(vi) In part (v), we found a 95% CI for x0.9. Explain in a few sentenceshow these commands give us the lower and the upper bound of theconfidence interval.转自:http://www.3daixie.com/contents/11/3444.html

你可能感兴趣的:(讲解:MAT 4378、categorical data、R、RPython|R)