Task #3 Principal Component Analysis
[Subject: Applied Econometrics]
Principal Component Analysis, or PCA in short, is a widely used method in applied
econometrics. Let a random vector have the multivariate normal distribution
where the covariance matrix is positive definite. The multivariate normal
distribution is a natural extension of univariate normal distribution. Its corresponding
probability density function (p.d.f.) is for any ,
Please note that when , the p.d.f. is
The spectral decomposition of is written as . Here, the columns,
of are the eigenvectors corresponding to the eigenvalues
which form the main diagonal of the matrix . Assume without loss of generality that the
eigenvalues are decreasing; i.e., .
Define a new random vector as . Given by an important theorem in
statistics
1
, we know that has a distribution. Hence the components
are independent random variables and, for , has a
distribution. The random vector is called the vector of principal components.
You are required to complete the following questions.
- The total variation ( ) of a random vector is the sum of the variances of its
components. For the random vector , prove that , where
. - The first component of , which is given by . This is a linear
combination of the component of with the property
, because is orthogonal. Consider any other linear
combination of , say such that . Show that there are
scalars such that - The exact statement of the theorem refers to Theorem B.10 in a famous econometric textbook written by Greene, namely
Econometric Analysis. ↩
and . - Try to write a MATLAB script to find its principal components of any given
random vector . Compare your results with the official MATLAB function pca. - One important feature of PCA is dimensionality reduction. For example, a CEO in
a given firm may have multiple characteristics, such as education, confidence
and social connections. Scholars prefer one indicator to measure the overall
ability of CEO. PCA could finish this task by taking advantage of the first (and
also the maximal) principal component.
Let's construct a Management Quality Factor measure with PCA using Chinese
public companies. Use the CEO data from CSMAR and construct a measure for
CEO quality.
WX:codehelp