2018-07-12课程笔记(2):DIMENTION REDUCTION

【关键词:数据降维,PCA】

PCA(Principal Components Analytics)重要数据的定位分析:

详细讲解PCA的博客

意义:取最能表现数据趋势的一些数据。

方法:

找出最能体现数据(variance)的数据。忽略其它

  1. 画出两个维度的拟合直线,L1,L2
  2. 分析数据与L1,L2的 variance(可以简单的计算)


    PCA

3.如果一条拟合直线对于variance没有过多的影响,即可以将此维度去除忽略。

Generalization(概括来进行PCAReduction):一种以相关度排序的方法,用来Reduce Dimention

X1, X2, X3, … Xp, original pvariables
Z1, Z2, Z3, … Zp, weighted averages of original variables
All pairs of Z variables have 0 correlation
Order Z’s by variance (z1 largest, Zp smallest)
Usually the first few Z variables contain most of the information, and so the rest can be dropped.

Normalizing data(规范化数据)

规范化数据

Regression-Based Dimention Reduction

� Multiple Linear Regression or Logistic Regression
� Use subset selection
� Algorithm chooses a subset of variables
� This procedure is integrated directly into the predictive task

你可能感兴趣的:(2018-07-12课程笔记(2):DIMENTION REDUCTION)