loess局部加权回归
conda install -c conda-forge statsmodels -y
pip install statsmodels
该函数返回的是一个两列的 numpy 数组, 第一列是原始数据, 第二列是平滑后的。
import statsmodels.api as sm
lowess = sm.nonparametric.lowess
lowess(
endog = y值(一维numpy数组)
exog = x值(一维numpy数组)
frac = 使用多少比例的数据来拟合曲线(0~1, 值越大平滑的力度越大)
is_sorted = 告诉程序数据是否按 x值 排序(True, False默认), 默认会把数据排序
missing = (‘none’, ‘drop’, and ‘raise’) 'none'不进行缺失值检查,'drop'丢弃含有缺失值的样本(默认), 'raise'有缺失值会报错
return_sorted = (True(默认), False) 返回的数据按 x值 排序
)
import math
import numpy as np
import statsmodels.api as sm
lowess = sm.nonparametric.lowess
import pylab as pl
n = 100
x = np.linspace(0, 2 * math.pi, n)
y = np.sin(x) + 0.3 * np.random.randn(n)
# yest = lowess(y, x)
yest = lowess(y, x, frac=1./3.)[:,1]
pl.clf()
pl.plot(x, y, label='y noisy')
pl.plot(x, yest, label='y pred')
pl.legend()
pl.show()
from math import ceil
import numpy as np
from scipy import linalg
import math
def lowess(x, y, f=2./3., iter=1):
"""lowess(x, y, f=2./3., iter=3) -> yest
Lowess smoother: Robust locally weighted regression.
The lowess function fits a nonparametric regression curve to a scatterplot.
The arrays x and y contain an equal number of elements; each pair
(x[i], y[i]) defines a data point in the scatterplot. The function returns
the estimated (smooth) values of y.
The smoothing span is given by f. A larger value for f will result in a
smoother curve. The number of robustifying iterations is given by iter. The
function will run faster with a smaller number of iterations.
"""
n = len(x)
r = int(ceil(f * n))
h = [np.sort(np.abs(x - x[i]))[r] for i in range(n)]
w = np.clip(np.abs((x[:, None] - x[None, :]) / h), 0.0, 1.0)
w = (1 - w ** 3) ** 3
yest = np.zeros(n)
delta = np.ones(n)
for iteration in range(iter):
for i in range(n):
weights = delta * w[:, i]
b = np.array([np.sum(weights * y), np.sum(weights * y * x)])
A = np.array([[np.sum(weights), np.sum(weights * x)],
[np.sum(weights * x), np.sum(weights * x * x)]])
beta = linalg.solve(A, b)
yest[i] = beta[0] + beta[1] * x[i]
residuals = y - yest
s = np.median(np.abs(residuals))
delta = np.clip(residuals / (6.0 * s), -1, 1)
delta = (1 - delta ** 2) ** 2
return yest
n = 100
x = np.linspace(0, 2 * math.pi, n)
y = np.sin(x) + 0.3 * np.random.randn(n)
f = 0.25
yest = lowess(x, y, f=f, iter=3)
import pylab as pl
pl.clf()
pl.plot(x, y, label='y noisy')
pl.plot(x, yest, label='y pred')
pl.legend()
pl.show()
参考
statsmodels.nonparametric.smoothers_lowess.lowess
w = (1 - w ** 3) ** 3 是加权函数?
LOESS局部加权非参数回归