读The Elements of Statistical learning



 Chapter 2 Overview of Supervised learning
2.1 几个常用且意义相同的术语:
inputs在统计类的文献中,叫做predictors, 但经典叫法是independently variables,在模式识别中,叫做feature.
outputs,叫做responses, 经典叫法是dependently variables.

2.2 给出了回归和分类问题的基本定义

2.3 介绍两类简单的预测方法: Least square 和 KNN:
Least square产生的linear decision boundary的特点: low variance but potentially high bias;
KNN wiggly and unstabla,也就是high variance and low bias.

这一段总结蛮经典:

 A large subset of the most popular techniques in use today are variants of these two simple procedures. In fact 1-nearest-neighbor, the simplest of all, captures a large percentage of the market for low-dimensional problems. The following list describes some ways in which these simple procedures have been enhanced:

~ Kernel methods use weights that decrease smoothly to zero with distance from the target point,  ather than the e®ective 0=1 weights used by k-nearest neighbors.

~In high-dimensional spaces the distance kernels are modified to emphasize some variable more than others.

~Local regression fits linear models by locally weighted least squares rather than fitting constants locally.

~Linear models fit to a basis expansion of the original inputs allow arbitrarily complex models.

~Projection pursuit and neural network models consist of sums of non-linearly transformed linear models.

 

2.4 统计决策的理论分析

看不进去,没怎么看懂,明天看新内容前再看一遍,今天看的内容 p35-p43.

 

2.5节讨论了local methods KNN在高维特征下的问题, 在维数增大的情况下,要选取r部分的样本,所需要的边长接近1,这样会导致variance非常高.

2.6节分为统计模型,监督学习介绍和函数估计的方法来介绍,统计模型给出一般问的统计概率模型,监督学习说明了用训练样例来拟合函数,函数估计介绍了常用的参数估计,选取使目标函数最大的参数作为估计.

2.7 介绍了structured regression methods,它能解决某些情况下不好解决的问题.
2.8 一些估计器的介绍:
2.8.1 通过roughness penalty, 实质就是regularized methods,通过penalty 项 限制函数空间的复杂度.
2.8.2 kernel methods and local regression kermel function实际上和local neighbor方法类似,kernel反映了样本间的距离
2.8.3 basis functions and Dictionary methods 从dictionary中选出若干basis functions叠加作为得到的function. 单层前反馈神经网络和boosting 还有MARS,MART都属于这一类方法.
2.9 模型选择和bias, variance的折中
往往模型的复杂度越高(例如regularizer控制项越小), bias越低但是variance越高. 造成训练错误率很低但是测试错误率很高. 反之亦然. 简图2.11
读The Elements of Statistical learning_第1张图片
看到61页.主要讲了解回归问题的若干线性方法, 首先是基本回归问题,然后介绍多回归,多输出,接着说subset selection, forward stepwise/stagewise selection(两种的区别是后者更新时不会对其他变量做调整). 3.4 shrinkage methods 便是加入regularizer来smooth化,因为subset selection后的数据偏离散. 如果用平方则是ridge regression, 如果用绝对值就是lasso,还有一种变形least angle regression,和lasso很相关,明天再看看吧.也就是61页到97页的内容.
补充:3.3节对linear regression问题中约束对应的p-norm进行了分析,当p=1.2(文中q表示这里的p)是和elastic net penalty外形很相似,但事实上前者光滑,后者sharp(non-differentiable), (可微意味着无穷阶可导).
3.4节 Least Angle Regression(LAR),和lasso几乎相同,但是在非零取值为0时,相应的变量要从active set中移出,重新计算direction.

你可能感兴趣的:(.net,REST)