学习笔记:The Elements of Statistical Learning (Chapter 3)

3.1 Introduction

3.2 Linear Regression Models and Least Squares.

In this Chapter, the linear models are of the form

where is unknown parameter and the input can come from

quantitative inputs or their transformations;

basis expansions, such as ;

numeric or dummy coding of the levels of qualitative inputs;

interactions between variables.

And the basic assumption is is linear, or the linear model is a reasonable approximation.

Minimizing , we get , , and is the head matrix.

Greometrical view of least quare: an orthogonal space way.

What if is singular( not full rank)?

Then, is not uniquely defined. However, is still the projection of onto the column space of . Why this happens?

One or more qualitative variables are coded in a reduncant fashion.

dimension exceed the number of training cases .

Basically, we can use filtering methods or regularization to solve this problem.

 

For further work, we need more assumptions about :

are uncorrelated and have constant variance , is non-random.

Then, we know , and an estimation of is

To draw inference about parameters, we need additional assumptions:
, where .

Then, , and .

Simple t test

, where is th diagonal element of .

F test

Then, we can derive the confidence interval as well.

3.2.1 Example: Prostate Cancer

3.2.2 The Gauss-Markov Theorem

In this sunsection, we only focus on setimation of any linear combination of the parameters .

The least squares estimate of is

If we assume that the linear model is correct, then this estimate is unbiased.

Gauss-Markov Theorem For any other linear estimator , unbiased for ,we have

Note: unbiased estimators are not necessarily better than biased estimaters since the unbiased ones may have larger variance.

MSE v.s. EPE

For , the EPE of is

3.2.3 Multiple Regression from Simple Univariate Regression

Univariate model(without intercept)

, and the least squares estimate is , where means innerproduct.

Fact: when inputs are orthogonal, they have no effect on each other's parameter estimates in the model.

The idea of the following algorithm is similar to Gram-Schmidt process, but without normalizing.

你可能感兴趣的:(学习笔记:The Elements of Statistical Learning (Chapter 3))