辅导案例-MATH 1309-Assignment 2

辅导案例-MATH 1309-Assignment 2
1

Assignment 2 PG MATH 1309 MULTIVARIATE ANALYSIS
45 points
DUE date October 25, 2019 11.59pm.
Show your SAS code, output and answers within the one attached assignment pdf or docx that you
submit in Canvas.

Question 1 (23 marks)

The file THC.csv contains data on concentrations of 13 different chemical compounds in marijuana
plants own in the same region in Colombia that are derived from three different species varieties.

  1. Compute the mean and standard deviation for the 13 chemical concentrations in the
    sampleTHC data via SAS (1.5 marks)

  2. Produce the correlation matrix and a scatterplot in SAS. Is the correlation matrix suitable for
    a principal component analysis (1.5 marks)

  3. Perform a Principal component analysis using SAS on the raw data and assess how many PCs
    need to retain. Answer the following from the resultant output (10 marks, each part below
    is worth 2 marks)

a) What percentage of the total sample variation is accounted for the first, second and
third PCs?
b) Interpret the first 3 PC’s.
c) Write out the first, second and third PCs as linear functions of the original variables.
d) Can the data be effectively summarised in fewer than 13 dimensions? Justify your
answer Comment on it.
e) Obtain via SAS or sketch the scree plot to confirm your choice of the number of PCs.

  1. Perform a principal component analysis using SAS on the correlation matrix. Answer the
    following from the resultant output (10 marks, each part below is worth 2 marks)

a) What percentage of the total sample variation is accounted for the first, second and
third PCs?
b) Interpret the first 3 PC’s.
c) Write out the first, second and third PCs as linear functions of the standardised
variables.
d) Can the data be effectively summarized in fewer than 13 dimensions? Justify your
answer Comment on it.
e) Obtain via SAS or the scree plot to confirm your choice of the number of PCs.

2

Question 2 (14 marks)
Consider the raw data set with 12 observations, on 5 socio-economic variables, called Population,
School, Employment, Services and HouseValue.

data SocioEconomics;
input Population School Employment Services HouseValue;
datalines;
5700 12.8 2500 270 25000
1000 10.9 600 10 10000
3400 8.8 1000 10 9000
3800 13.6 1700 140 25000
4000 12.8 1600 140 25000
8200 8.3 2600 60 12000
1200 11.4 400 10 16000
9100 11.5 3300 60 14000
9900 12.5 3400 180 18000
9600 13.7 3600 390 25000
9600 9.6 3300 80 12000
9400 11.4 4000 100 13000
;
proc factor data=SocioEconomics simple corr;
run;

Conduct a factor analysis by using the following SAS statements above.
Show your SAS code (it can vary to the one I suggest), output and answers within the ONE
assignment pdf or docx that you submit in Canvas.

  1. Prepare the dataset for a Factor analysis via SAS. (1 mark)

  2. Generate the means and standard deviations of the data. (1 mark)

  3. Perform a Factor analysis on the raw data and the correlation matrix using the code above,
    and answer the following questions. (2 marks)

  4. From the eigenvalues of the correlation matrix and the factor loading matrix and
    communalities outputted answer the following questions.

a) Do the first two principal components (factors) provide an adequate summary of the
data? (1 mark)
b) How much of the variation is accounted for by 2 factors? (1 mark)
c) How much of the variation is accounted for by 3 factors? (1 mark)

  1. To get the scoring coefficients as eigenvalues use PROC PRINCOMP to display the scoring
    coefficients as eigenvectors, use, and answer the following questions
    3

proc princomp data=SocioEconomics;
run;

a) What are the eigenvalues and the respective eigenvectors? (1 mark)
b) What is the proportion of the variance accounted for by the first and second component
respectively? (1 mark)
c) Together how much do the first and second factors together account for the
standardised variance? (1 mark)
d) Do the final communality estimates show that all the variables are well accounted for by
how many components or factors. Justify your answer. (1 mark)

  1. To obtain the component scores as linear combinations of the observed variables request
    the standardized scoring coefficients by adding the SCORE option in the FACTOR statement:
    and run this. Note that the SCORE option in the code below requests the display of the
    standardized scoring coefficients.

proc factor data=SocioEconomics n=5 score;
run;

As each factor/component can expressed as a linear combination of the standardised
observed variables using the code above, answer the following questions:,

a) Write down the first principal component or Factor1 in terms of the standardised
variables. (1 mark)
b) Write down the second principal component or Factor2 in terms of the standardised
variables. (1 mark)
c) Write the first and second PCs in terms of eigenvectors. (1 mark)
NOTES/HINTS:

 The SIMPLE option specified in the PROC FACTOR statement generates the means and
standard deviations of all observed variables in the analysis
 The CORR option specified in the PROC FACTOR statement generates the output of the
observed correlations.
 To express the observed variables as functions of the components (or factors), you inspect
the factor loading matrix.
 To obtain the component scores as linear combinations of the observed variables request
the standardized scoring coefficients by adding the SCORE option in the FACTOR statement:
The SCORE option in the code below requests the display of the standardized scoring
coefficients
proc factor data=SocioEconomics n=5 score;
run;

4

QUESTION 3 (8 marks)

Six variables measured on 100 genuine and 100 forged (counterfeit/fake) old Swiss 1000-franc bank
notes are given in Appendix A of the assignment (also available in R library)

data(banknote)

Length Height Height Inner Frame Inner Frame Diagonal
(left) (right) (lower) (upper)

214.8 131.0 131.1 9.0 9.7 141.0
214.6 129.7 129.7 8.1 9.5 141.7
214.8 129.7 129.7 8.7 9.6 142.2
214.8 129.7 129.6 7.5 10.4 142.0
215.0 129.6 129.7 10.4 7.7 141.8
215.7 130.8 130.5 9.0 10.1 141.4
215.5 129.5 129.7 7.9 9.6 141.6
214.5 129.6 129.2 7.2 10.7 141.7
214.9 129.4 129.7 8.2 11.0 141.9
215.2 130.4 130.3 9.2 10.0 140.7
215.3 130.4 130.3 7.9 11.7 141.8
215.1 129.5 129.6 7.7 10.5 142.2
215.2 130.8 129.6 7.9 10.8 141.4
214.7 129.7 129.7 7.7 10.9 141.7
215.1 129.9 129.7 7.7 10.8 141.8
214.5 129.8 129.8 9.3 8.5 141.6
214.6 129.9 130.1 8.2 9.8 141.7
215.0 129.9 129.7 9.0 9.0 141.9
215.2 129.6 129.6 7.4 11.5 141.5
214.7 130.2 129.9 8.6 10.0 141.9
215.0 129.9 129.3 8.4 10.0 141.4
215.6 130.5 130.0 8.1 10.3 141.6
215.3 130.6 130.0 8.4 10.8 141.5
215.7 130.2 130.0 8.7 10.0 141.6
215.1 129.7 129.9 7.4 10.8 141.1
215.3 130.4 130.4 8.0 11.0 142.3
215.5 130.2 130.1 8.9 9.8 142.4
215.1 130.3 130.3 9.8 9.5 141.9
215.1 130.0 130.0 7.4 10.5 141.8
214.8 129.7 129.3 8.3 9.0 142.0
215.2 130.1 129.8 7.9 10.7 141.8
214.8 129.7 129.7 8.6 9.1 142.3
7

215.0 130.0 129.6 7.7 10.5 140.7
215.6 130.4 130.1 8.4 10.3 141.0
215.9 130.4 130.0 8.9 10.6 141.4
214.6 130.2 130.2 9.4 9.7 141.8
215.5 130.3 130.0 8.4 9.7 141.8
215.3 129.9 129.4 7.9 10.0 142.0
215.3 130.3 130.1 8.5 9.3 142.1
213.9 130.3 129.0 8.1 9.7 141.3
214.4 129.8 129.2 8.9 9.4 142.3
214.8 130.1 129.6 8.8 9.9 140.9
214.9 129.6 129.4 9.3 9.0 141.7
214.9 130.4 129.7 9.0 9.8 140.9
214.8 129.4 129.1 8.2 10.2 141.0
214.3 129.5 129.4 8.3 10.2 141.8
214.8 129.9 129.7 8.3 10.2 141.5
214.8 129.9 129.7 7.3 10.9 142.0
214.6 129.7 129.8 7.9 10.3 141.1
214.5 129.0 129.6 7.8 9.8 142.0
214.6 129.8 129.4 7.2 10.0 141.3
215.3 130.6 130.0 9.5 9.7 141.1
214.5 130.1 130.0 7.8 10.9 140.9
215.4 130.2 130.2 7.6 10.9 141.6
214.5 129.4 129.5 7.9 10.0 141.4
215.2 129.7 129.4 9.2 9.4 142.0
215.7 130.0 129.4 9.2 10.4 141.2
215.0 129.6 129.4 8.8 9.0 141.1
215.1 130.1 129.9 7.9 11.0 141.3
215.1 130.0 129.8 8.2 10.3 141.4
215.1 129.6 129.3 8.3 9.9 141.6
215.3 129.7 129.4 7.5 10.5 141.5
215.4 129.8 129.4 8.0 10.6 141.5
214.5 130.0 129.5 8.0 10.8 141.4
215.0 130.0 129.8 8.6 10.6 141.5
215.2 130.6 130.0 8.8 10.6 140.8
214.6 129.5 129.2 7.7 10.3 141.3
214.8 129.7 129.3 9.1 9.5 141.5
215.1 129.6 129.8 8.6 9.8 141.8
214.9 130.2 130.2 8.0 11.2 139.6
213.8 129.8 129.5 8.4 11.1 140.9
215.2 129.9 129.5 8.2 10.3 141.4
215.0 129.6 130.2 8.7 10.0 141.2
214.4 129.9 129.6 7.5 10.5 141.8
215.2 129.9 129.7 7.2 10.6 142.1
214.1 129.6 129.3 7.6 10.7 141.7
214.9 129.9 130.1 8.8 10.0 141.2
214.6 129.8 129.4 7.4 10.6 141.0
215.2 130.5 129.8 7.9 10.9 140.9
214.6 129.9 129.4 7.9 10.0 141.8
215.1 129.7 129.7 8.6 10.3 140.6
214.9 129.8 129.6 7.5 10.3 141.0
215.2 129.7 129.1 9.0 9.7 141.9
8

215.2 130.1 129.9 7.9 10.8 141.3
215.4 130.7 130.2 9.0 11.1 141.2
215.1 129.9 129.6 8.9 10.2 141.5
215.2 129.9 129.7 8.7 9.5 141.6
215.0 129.6 129.2 8.4 10.2 142.1
214.9 130.3 129.9 7.4 11.2 141.5
215.0 129.9 129.7 8.0 10.5 142.0
214.7 129.7 129.3 8.6 9.6 141.6
215.4 130.0 129.9 8.5 9.7 141.4
214.9 129.4 129.5 8.2 9.9 141.5
214.5 129.5 129.3 7.4 10.7 141.5
214.7 129.6 129.5 8.3 10.0 142.0
215.6 129.9 129.9 9.0 9.5 141.7
215.0 130.4 130.3 9.1 10.2 141.1
214.4 129.7 129.5 8.0 10.3 141.2
215.1 130.0 129.8 9.1 10.2 141.5
214.7 130.0 129.4 7.8 10.0 141.2
214.4 130.1 130.3 9.7 11.7 139.8
214.9 130.5 130.2 11.0 11.5 139.5
214.9 130.3 130.1 8.7 11.7 140.2
215.0 130.4 130.6 9.9 10.9 140.3
214.7 130.2 130.3 11.8 10.9 139.7
215.0 130.2 130.2 10.6 10.7 139.9
215.3 130.3 130.1 9.3 12.1 140.2
214.8 130.1 130.4 9.8 11.5 139.9
215.0 130.2 129.9 10.0 11.9 139.4
215.2 130.6 130.8 10.4 11.2 140.3
215.2 130.4 130.3 8.0 11.5 139.2
215.1 130.5 130.3 10.6 11.5 140.1
215.4 130.7 131.1 9.7 11.8 140.6
214.9 130.4 129.9 11.4 11.0 139.9
215.1 130.3 130.0 10.6 10.8 139.7
215.5 130.4 130.0 8.2 11.2 139.2
214.7 130.6 130.1 11.8 10.5 139.8
214.7 130.4 130.1 12.1 10.4 139.9
214.8 130.5 130.2 11.0 11.0 140.0
214.4 130.2 129.9 10.1 12.0 139.2
214.8 130.3 130.4 10.1 12.1 139.6
215.1 130.6 130.3 12.3 10.2 139.6
215.3 130.8 131.1 11.6 10.6 140.2
215.1 130.7 130.4 10.5 11.2 139.7
214.7 130.5 130.5 9.9 10.3 140.1
214.9 130.0 130.3 10.2 11.4 139.6
215.0 130.4 130.4 9.4 11.6 140.2
215.5 130.7 130.3 10.2 11.8 140.0
215.1 130.2 130.2 10.1 11.3 140.3
214.5 130.2 130.6 9.8 12.1 139.9
214.3 130.2 130.0 10.7 10.5 139.8
214.5 130.2 129.8 12.3 11.2 139.2
214.9 130.5 130.2 10.6 11.5 139.9
214.6 130.2 130.4 10.5 11.8 139.7
9

214.2 130.0 130.2 11.0 11.2 139.5
214.8 130.1 130.1 11.9 11.1 139.5
214.6 129.8 130.2 10.7 11.1 139.4
214.9 130.7 130.3 9.3 11.2 138.3
214.6 130.4 130.4 11.3 10.8 139.8
214.5 130.5 130.2 11.8 10.2 139.6
214.8 130.2 130.3 10.0 11.9 139.3
214.7 130.0 129.4 10.2 11.0 139.2
214.6 130.2 130.4 11.2 10.7 139.9
215.0 130.5 130.4 10.6 11.1 139.9
214.5 129.8 129.8 11.4 10.0 139.3
214.9 130.6 130.4 11.9 10.5 139.8
215.0 130.5 130.4 11.4 10.7 139.9
215.3 130.6 130.3 9.3 11.3 138.1
214.7 130.2 130.1 10.7 11.0 139.4
214.9 129.9 130.0 9.9 12.3 139.4
214.9 130.3 129.9 11.9 10.6 139.8
214.6 129.9 129.7 11.9 10.1 139.0
214.6 129.7 129.3 10.4 11.0 139.3
214.5 130.1 130.1 12.1 10.3 139.4
214.5 130.3 130.0 11.0 11.5 139.5
215.1 130.0 130.3 11.6 10.5 139.7
214.2 129.7 129.6 10.3 11.4 139.5
214.4 130.1 130.0 11.3 10.7 139.2
214.8 130.4 130.6 12.5 10.0 139.3
214.6 130.6 130.1 8.1 12.1 137.9
215.6 130.1 129.7 7.4 12.2 138.4
214.9 130.5 130.1 9.9 10.2 138.1
214.6 130.1 130.0 11.5 10.6 139.5
214.7 130.1 130.2 11.6 10.9 139.1
214.3 130.3 130.0 11.4 10.5 139.8
215.1 130.3 130.6 10.3 12.0 139.7
216.3 130.7 130.4 10.0 10.1 138.8
215.6 130.4 130.1 9.6 11.2 138.6
214.8 129.9 129.8 9.6 12.0 139.6
214.9 130.0 129.9 11.4 10.9 139.7
213.9 130.7 130.5 8.7 11.5 137.8
214.2 130.6 130.4 12.0 10.2 139.6
214.8 130.5 130.3 11.8 10.5 139.4
214.8 129.6 130.0 10.4 11.6 139.2
214.8 130.1 130.0 11.4 10.5 139.6
214.9 130.4 130.2 11.9 10.7 139.0
214.3 130.1 130.1 11.6 10.5 139.7
214.5 130.4 130.0 9.9 12.0 139.6
214.8 130.5 130.3 10.2 12.1 139.1
214.5 130.2 130.4 8.2 11.8 137.8
215.0 130.4 130.1 11.4 10.7 139.1
214.8 130.6 130.6 8.0 11.4 138.7
215.0 130.5 130.1 11.0 11.4 139.3
214.6 130.5 130.4 10.1 11.4 139.3
214.7 130.2 130.1 10.7 11.1 139.5
10

214.7 130.4 130.0 11.5 10.7 139.4
214.5 130.4 130.0 8.0 12.2 138.5
214.8 130.0 129.7 11.4 10.6 139.2
214.8 129.9 130.2 9.6 11.9 139.4
214.6 130.3 130.2 12.7 9.1 139.2
215.1 130.2 129.8 10.2 12.0 139.4
215.4 130.5 130.6 8.8 11.0 138.6
214.7 130.3 130.2 10.8 11.1 139.2
215.0 130.5 130.3 9.6 11.0 138.5
214.9 130.3 130.5 11.6 10.6 139.8
215.0 130.4 130.3 9.9 12.1 139.6
215.1 130.3 129.9 10.3 11.5 139.7
214.8 130.3 130.4 10.6 11.1 140.0
214.7 130.7 130.8 11.2 11.2 139.4
214.3 129.9 129.9 10.2 11.5 139.6

你可能感兴趣的:(辅导案例-MATH 1309-Assignment 2)