Lowess

一、背景

学术上,目前更喜欢将其称作Locally weighted regression

因为历史原因,又叫Loess或者Lowess

LOWESS (locally weighted scatterplot smoothing) : 

methods that combine multiple regression models in a k-nearest-neighbor-based meta-model. LOESS is a later generalization of LOWESS; although it is not a true acronym, it may be understood as standing for "LOcal regrESSion"

nonparamatric regresion中的一种

二、

前言:

通常情况下的线性拟合不能很好地预测所有的值,因为它容易导致欠拟合(under fitting),比如数据集是一个钟形的曲线。一个方法是多项式或者其它函数如正余弦等等,多项式甚至能拟合所有数据,但是在预测新样本的时候又会变得很糟糕,因为它导致模型的过拟合(overfitting),不符合数据背后的“逻辑”。 

2.1 预测问题

对于预测问题,线性回归,是以线性的方法拟合出数据的趋势。比如有周期性,肉眼一看图就不太符合线性特征的数据,如果以简单的线性方式拟合,可能会出现欠拟合的问题。

如果坚持要使用参数学习算法(paramtric learning algorithm),可能你要不停的去调试,根据数据集绘出的图像,半观察半猜测要不要加个自变量的平方项,或者自变量的三角函数等等,作为新的回归算子加入,总之一直尝试,直到一些评价指标上拟合效果较好了,于是你得到特征和参数向量。

然而这个掷骰子般的过程有时候有经验等等各方面的成分在吧,当你把参数调试得“很完美”,又有新问题了,那就是过拟合!

2.2 Loess/Lowess/Locally weighted linear regression

LWE也会存在欠拟合和过拟合的问题。

参数学习方法:

在训练完成所有数据后得到一系列训练参数,然后根据训练参数来预测新样本的值,这时不再依赖之前的训练数据了,参数值是确定的。

非参数学习方法:

在预测新样本值时候每次都会重新训练数据得到新的参数值,也就是说每次预测新样本都会依赖训练数据集合,所以每次得到的参数值是不确定的。 


LOESS and LOWESS thus build on "classical" methods, such as linear and nonlinear least squares regression. They address situations in which the classical procedures do not perform well or cannot be effectively applied without undue labor. LOESS combines much of the simplicity of linear least squares regression with the flexibility of nonlinear regression. It does this by fitting simple models to localized subsets of the data to build up a function that describes the deterministic part of the variation in the data, point by point. In fact, one of the chief attractions of this method is that the data analyst is not required to specify a global function of any form to fit a model to the data, only to fit segments of the data.

The trade-off for these features is increased computation. Because it is so computationally intensive, LOESS would have been practically impossible to use in the era when least squares regression was being developed. Most other modern methods for process modeling are similar to LOESS in this respect. These methods have been consciously designed to use our current computational ability to the fullest possible advantage to achieve goals not easily achieved by traditional approaches.

A smooth curve through a set of data points obtained with this statistical technique is called a Loess Curve, particularly when each smoothed value is given by a weighted quadratic least squares regression over the span of values of the y-axis scattergram criterion variable. When each smoothed value is given by a weighted linear least squares regression over the span, this is known as a Lowess curve; however, some authorities treat Lowess and Loess as synonyms.

1.2 平滑问题

同时,局部加权回归(lowess)也能较好的解决平滑问题。在做数据平滑的时候,会有遇到有趋势或者季节性的数据,对于这样的数据,我们不能使用简单的均值正负3倍标准差以外做异常值剔除,需要考虑到趋势性等条件。使用局部加权回归,可以拟合一条趋势线,将该线作为基线,偏离基线距离较远的则是真正的异常值点。 

实际上,局部加权回归(Lowess)主要还是处理平滑问题的多,因为预测问题,可以有更多模型做的更精确。但就平滑来说,Lowess很直观而且很有说服力。

三、局部加权线性回归

线性回归的一个问题是有可能出现欠拟合现象,因为它求的是具有最小均方误差的无偏估计。如果模型欠拟合将不能取得最好的预测效果。有些方法允许在估计中引入一些偏差,从而降低预测的误差均方。

其中一个方法是局部加权线性回归(Locally weighted Linear Regression,LWLR).在该算法中,我们给待遇测点附近的每一个点赋予一定的权重;


http://www.dsplog.com/2012/02/05/weighted-least-squares-and-locally-weighted-linear-regression/

你可能感兴趣的:(Lowess)