讲解:MATH6152、R、R、Statistical ComputingR|Python

MATH6152: Statistical Computing (R) CourseworkAssignmentThis coursework assignment is worth 50% of the overall mark for the module.The deadline is 1200 on Friday 25th October 2019 and your completed work must be submittedelectronically via Blackboard.Your submission must consist of exactly one script containing R code to answer the two questions below.Your script must have the filename .R. For example, if your student ID is 12345678, thenyour script must have the filename 12345678.R.You must informatively comment your code including indicating where each question and task begins.Your entire script should be reprodicible and run without any errors.Some tasks in the questions require you to create R objects. These should be named exactly as requested inthe question.You must not load any add-on packages, e.g. using library or require.If you are required to import data from a file, assume that this file is in your working directory. This meansyou should not call the function setwd.At the end of each question, there are some hints that you may find useful.The standard Faculty rules on late submissions apply: for every weekday, or part of a weekday, that thecoursework is late, the mark will be reduced by a factor of 10%. No credit will be given for work which ismore than one week late.All coursework must be carried out independently. You are reminded of the University’s Academic IntegrityPolicy.In the interests of fairness, queries which relate directly to the coursework must be made via the MATH6152:Statistical Computing (R) Coursework Assignment Discussion Forum on Blackboard. This ensuresthat everybody has access to the same information.Total marks available: 1001Question 1: Random walk Metropolis-Hastings algorithm (50marks)In this question, you will write a function to simulate a sample from a given distribution using a randomwalk Metropolis-Hastings algorithm.Often in a statistical analysis, it is essential to be able to simulate (or generate) a sample from a givencontinuous probability distribution. For example, R functions rnorm, runif and rgamma can simulate samplesfrom the normal, uniform and gamma distributions, respectively. However, sometimes we encounter acontinuous distribution which does not have a known form and for which an in-built R function does notexist. In this case, we can use, for example, the following random walk Metropolis-Hastings algorithm tosimulate a sample.Suppose we wish to simulate a sample of size M from the probability distribution of the continuous randomvariable X œ (≠Œ,Œ). We are given the function g(x) Ø 0 such that the probability density function of X isf(x) = g(x)s Œ≠Œ g(x)dx.The steps of the random walk Metropolis-Hastings algorithm are as follows.1. Let x0 be the starting value.2. For i = 1,...,M complete the following steps.(a) Simulate a proposalyi ≥ N (xi≠1, exp(„)),i.e. a normal distribution with mean xi≠1 and variance exp(„).(b) Calculate acceptance probability–i = min ;(c) Simulate ui ≥ uniform(0, 1). If –i Ø ui, then accept the proposal and set xi = yi; otherwise rejectthe proposal and set xi = xi≠1.The chain of values x1,...,xM represents an approximate (and not independent) sample from the probabilitydistribution of X (irrespective of the value of „). The proportion of proposals which are accepted is knownas the acceptance rate.Complete the following tasks.(A) Write a function called rwmet that has exactly two arguments:• M: the sample size, M;• phi: the quantity „ controlling the variance of the proposal distribution in step 2(a) of thealgorithm;that will simulate a sample of size M from a probability distribution withg(x) = exp (◊x) exp (≠ exp (◊x)),where ◊ is the last digit of your student ID using the random-walk Metropolis-Hastings algorithm.(If the last digit of your student ID is 0, then ◊ = 0.5). Set the starting value in the random walkMetropolis-Hastings algorithm to be x0 = 1. [20 marks]2(B) We can use a sample simulated from the distribution of X to approximate the expectation of X.This is simply done by approximating E(X) by the sample mean. It is a random (or Monte Carlo)approximation in that if another sample is generated a dierentapproximation of E(X) will be produced.We want to ensure that it is a good approximation, i.e. it has low variance. Use your function togenerate N = 100 samples of size M = 10000 with „ = ≠4. Calculate the variance of the N means ofyour samples. [5 marks](C) In general, if „ isMATH6152代做、代写R程序语言、代做R编程设计、代做S increased then the average acceptance probability will decrease. Conversely, if „is decreased then the average acceptance probability will increase. Theoretical results suggest thatunder certain conditions the optimal acceptance rate for a random walk Metropolis-Hastings algorithmis 0.234. Create a plot of acceptance rate against an equally-spaced sequence of C = 100 dierentvalues of „ ranging from -5 to 10. For each „, the accptance rate should be based on a sample of sizeM = 10000. Based on your plot, approximately what value of „ is optimal? [20 marks](D) Use your function to generate N = 100 samples of size M = 10000 with „ equal to the optimal valuechosen in Task (C). Calculate the variance of the N means of your samples. Comment on the dierencebetween the variance calculated here and in Task (B). [5 marks]Hints:Question 2: Iterated weighted least squares (50 marks)In this question, you will write a function to find the maximum likelihood estimates for a logistic regressionmodel using iterated weighted least squares (IWLS).Under a logistic regression model, the ith response hasyi ≥ Bernoulli(µi), (1)for i = 1,...,n. In (1), the mean µi is linked to the linear predictor, ÷i, viag(µi) = log 3 µiwhere — = (—1,..., —p) is a p ◊ 1 vector of unknown parameters and xi = (xi1,...,xip)T is a p ◊ 1 vector ofexplanatory variables for the i response.IWLS involves iterating the following weighted least squares estimate for j = 1, 2,... until convergencewhere z(j≠1) is an n ◊ 1 vector with ith value given byand W(j≠1) is an n ◊ n diagonal matrix with ith diagonal value given by. (3)Note that, in (2) and (3).Convergence occurs whenmaxk=1,...,p |—ˆ(j≠1)k ≠ —ˆ(j)k | where ‘ > 0 is a pre-specified tolerance and —ˆ(j)k is the kth element of —ˆ(j). At the point of convergence, —ˆ(j)is returned as the maximum likelihood estimates of —. Note that —(j) does not mean — to the power of j, itmeans that —(j) is one vector in a sequence —(0), —(1),..., —(j),.... The same applies for µ(j)i and W(j)ii .Complete the following tasks.(A) Write a function called iwls that given a vector of binary responses (0s and 1s), and a model matrixwill return the maximum likelihood estimates of — for a logistic regression model. The function shouldhave exactly four arguments:4• X: the n ◊ p model matrix X;• y: the n ◊ 1 vector of responses y = (y1,...,yn);• itmax: the maximum number of iterations of the IWLS algorithm, i.e. the highest value that jcan take;• eps: the tolerance, ‘, used to assess convergence.Set itmax and eps to have default values of 25 and 10≠9, respectively. The starting value —(0) shouldbe the corresponding least squares estimate for —. [35 marks](B) The data stored in the file diab.txt is originally from the US National Institute of Diabetes andDigestive and Kidney Diseases and contains measurements from n = 768 female individuals. Theobjective of the formation of the data is to predict whether or not a patient has diabetes based on aseries of eight measurements. The variables in the dataset are described in Table 1.Table 1: Variables in the diab.txt file.R name Symbol Descriptionpreg z1 Number of times pregnantgluc z2 Plasma glucose concentration a 2 hours in an oral glucose tolerance testbp z3 Diastolic blood pressureskin z4 Triceps skin fold thicknessinsulin z5 2-Hour serum insulinbmi z6 Body mass indexped z7 Diabetes pedigree functionage z8 Ageout y Binary response variable (0 = no diabetes, 1 = diabetes)Read in the data in the file diab.txt. Consider a logistic regression model with yi as the response (seeTable 1) and the following linear predictor:÷i = ÿ9k=1—kxik,where, if k = 1, then xik = 1 and if k > 1 then xik is given by the value of zk≠1 for the ith person (seeTable 1). In other words, the model has a linear predictor with an intercept, and a term for each of theexplanatory variables (the z’s) in Table 1.Using your iwls function, find the maximum likelihood estimates of — = (—1,..., —9)T for the abovemodel. Use the default values for itmax and eps. Fit the same model with the in-built glm functionand comment on any dierencesbetween the estimates found by your iwls function and by glm.[15 marks]Hints:• A diagonal matrix is a matrix with non-zero elements on the leading diagonal and zeros elsewhere.• The R function as.vector can convert an R matrix into an R vector.• The R function as.matrix can convert an R data frame into an R matrix.5转自:http://www.3daixie.com/contents/11/3444.html

你可能感兴趣的:(讲解:MATH6152、R、R、Statistical ComputingR|Python)