STA 471

STA 471
STA 471 Due: 5/15/2019
Final Exam
When compiling your answers to the following questions, follow all guidelines for homework
assignments listed in the syllabus. A hard copy of your work is to be turned in to my office
(Kimball 810) by 5:00 PM on the due date. You are not permitted to collaborate on these
questions with another student.

  1. One method used to identify whether a patient requires a hearing aid is to play a recording of
    a set of words being pronounced quietly, then request that the patient repeat those words.
    The number of words correctly identified ("Hearing”) by a set of patients is recorded in
    hearing.txt (UBLearns). Four different recordings (“ListID”) were used, containing different
    sets of words. The purpose of this study is to determine whether the lists of words are
    equally difficult to hear. (15 pts)
    a) Produce side-by-side boxplots of the hearing scores by list ID.
    b) State the hypotheses to be tested in this study.
    c) Fit an appropriate model to address the study question, and present output that displays
    the test statistic and p-value. Also state the conclusion in context.
    d) Reproduce the p-value from part (c) using the pf(.) function.
    e) What percentage of total variability in hearing scores is explained by list ID?
    f) Identify which pairs of lists have mean hearing scores that differ significantly.
    g) Test whether the model residuals are normally distributed.
    h) Test the assumption of constant variance using the Levene test, providing the hypotheses,
    p-value, and conclusion.
  2. The following data are from a study carried out decades ago regarding attitudes toward sex
    education being instituted in public schools. (10 pts)
    Disposition Sex Education
    Favor Oppose
    Conservative 645 142
    Moderate 812 129
    Liberal 766 65
    a) Produce a single barplot that displays all the data
    b) Create a labeled matrix object to store the data.
    c) Carry out the chi-square test for association. Give the hypotheses, test statistic, p-value,
    and conclusion in context.
    d) Reproduce the p-value from part (b) using the pchisq(.) function.
    e) Examine the standardized Pearson residuals from the chi-square test, and describe how
    the observed data depart from independence.
  3. This question will involve a comparison of the two-sample procedures we have considered in
    this course. (15 pts)
    a) Set the randomization seed to 4444. Generate one sample of size 1 = 35 from
    and another sample of size 2 = 35 from . We
    wish to test. Obtain the p-value for this test using each of the following
    procedures:
    i. The two-sample t-test (equal variances).
    ii. The two-sample t-test (unequal variances).
    iii. The paired t-test.
    b) Obtain a fourth p-value, this time using the Wilcoxon rank sum test of whether the two
    population medians are the same.
    c) Generate data and carry out the four tests a large number of times, say ?? = 10,000 (take
    care that you are no longer using a randomization seed). At the end of the simulation,
    you should have 10,000 p-values for each of the four procedures. Report the simulationbased
    type I error rate for each procedure.
    d) Based on your simulation, how do these four procedures perform when the two
    populations have the same location parameters?
    e) Now change the value of 2 to 22, and re-run the simulation, again using = 10,000.
    Report the simulation-based estimates of power for all four procedures in this scenario.
    f) Describe your power results – are they different than you expected
  4. When MRI brain scans first became available, an interesting research question involved the
    relationship between measurable brain size and IQ. Forty psychology students volunteered
    for MRI scans of their brains, and brain size was recorded in terms of the number of pixels
    mapped by the scan (brain_size.txt). (10 pts)
    a) Fit the simple linear regression model:
    and provide the estimated regression coefficients.
    b) Test for a linear relationship between IQ and pixel count. Give the hypotheses, test
    statistic, p-value, and conclusion in context.
    c) There are additional variables present in the data set that may be related to IQ. Use the
    simple linear regression model fit previously as your base model. Use the forward
    selection technique to determine whether any of Height, Weight, or Gender can be
    included as significant predictors of IQ. Do not include excessive output.
    d) Obtain the
    statistic for your final model.
  5. Nerds frequently impersonate fantasy characters and roll dice to determine what happens in
    their silly adventure game. Usually this involves rolling a single 20-sided die. Other times,
    the player may need to roll, for example, eight 6-sided dice. It would be nice to have a way
    to quickly simulate the rolling of multiple dice. (10 pts)
    a) Write a program to simulate rolling a single 20-sided die. The possible outcomes are all
    integers between 1 and 20, and each outcome should be equally likely. While not
    required, you may wish to use existing functions like runif(.) and floor(.).
    b) To give evidence that your program works properly, execute it 10,000 times, store the
    value of each roll, and use them to build a barplot. Use your barplot to make an argument
    that the code works as intended.
    c) Write a function called “roll1” that simulates rolling a single die with a user-supplied
    number of sides (i.e., the single argument passed to the function is the number of sides on
    the die.) For example, the code “roll1(sides=20)” should simulate rolling a single 20-
    sided die.
    d) Write a more general function called “roll” which takes two arguments: the number of
    identical dice to be rolled, and the number of sides on one of the dice. For example, the
    code “roll(number=8, sides=6)” should roll 8 standard six-sided dice.
    e) In nerdy fantasy games, it’s a really big deal when your 20-sided die lands on 20. It is
    extremely rare for a player to roll 20’s on consecutive throws. It is unheard of to roll
    three straight 20’s. Use your “roll” function and a while loop to count how many
    attempts (an attempt is the rolling of three 20-sided dice) it takes to roll three straight 20’s
    in simulation. Call this value “number_of_attempts”.
    f) Store = 1,000 values of “number_of_attempts” and create a histogram. (This will take
    quite a while to run. Go get some lunch; no joke.)

你可能感兴趣的:(算法)