【LOESS局部加权非参数回归】

loess局部加权回归

conda install -c conda-forge statsmodels -y

pip install statsmodels

该函数返回的是一个两列的 numpy 数组, 第一列是原始数据, 第二列是平滑后的。

import statsmodels.api as sm
lowess = sm.nonparametric.lowess
lowess( 
  endog = y值(一维numpy数组)
  exog = x值(一维numpy数组)
  frac = 使用多少比例的数据来拟合曲线(0~1, 值越大平滑的力度越大)
  is_sorted = 告诉程序数据是否按 x值 排序(True, False默认), 默认会把数据排序
  missing =  (‘none’, ‘drop’, andraise) 'none'不进行缺失值检查,'drop'丢弃含有缺失值的样本(默认), 'raise'有缺失值会报错
  return_sorted = (True(默认), False) 返回的数据按 x值 排序
)

【LOESS局部加权非参数回归】_第1张图片
实现

import math
import numpy as np
import statsmodels.api as sm
lowess = sm.nonparametric.lowess
import pylab as pl

n = 100
x = np.linspace(0, 2 * math.pi, n)
y = np.sin(x) + 0.3 * np.random.randn(n)
# yest = lowess(y, x)
yest = lowess(y, x, frac=1./3.)[:,1]

pl.clf()
pl.plot(x, y, label='y noisy')
pl.plot(x, yest, label='y pred')
pl.legend()
pl.show()

二、自己实现
【LOESS局部加权非参数回归】_第2张图片

from math import ceil
import numpy as np
from scipy import linalg
import math

def lowess(x, y, f=2./3., iter=1):
    """lowess(x, y, f=2./3., iter=3) -> yest
    Lowess smoother: Robust locally weighted regression.
    The lowess function fits a nonparametric regression curve to a scatterplot.
    The arrays x and y contain an equal number of elements; each pair
    (x[i], y[i]) defines a data point in the scatterplot. The function returns
    the estimated (smooth) values of y.
    The smoothing span is given by f. A larger value for f will result in a
    smoother curve. The number of robustifying iterations is given by iter. The
    function will run faster with a smaller number of iterations.
    """
    n = len(x)
    r = int(ceil(f * n))
    h = [np.sort(np.abs(x - x[i]))[r] for i in range(n)]
    w = np.clip(np.abs((x[:, None] - x[None, :]) / h), 0.0, 1.0)
    w = (1 - w ** 3) ** 3
    yest = np.zeros(n)
    delta = np.ones(n)
    for iteration in range(iter):
        for i in range(n):
            weights = delta * w[:, i]
            b = np.array([np.sum(weights * y), np.sum(weights * y * x)])
            A = np.array([[np.sum(weights), np.sum(weights * x)],
                          [np.sum(weights * x), np.sum(weights * x * x)]])
            beta = linalg.solve(A, b)
            yest[i] = beta[0] + beta[1] * x[i]

        residuals = y - yest
        s = np.median(np.abs(residuals))
        delta = np.clip(residuals / (6.0 * s), -1, 1)
        delta = (1 - delta ** 2) ** 2

    return yest

n = 100
x = np.linspace(0, 2 * math.pi, n)
y = np.sin(x) + 0.3 * np.random.randn(n)

f = 0.25
yest = lowess(x, y, f=f, iter=3)

import pylab as pl
pl.clf()
pl.plot(x, y, label='y noisy')
pl.plot(x, yest, label='y pred')
pl.legend()
pl.show()

参考
statsmodels.nonparametric.smoothers_lowess.lowess

w = (1 - w ** 3) ** 3 是加权函数?
LOESS局部加权非参数回归

你可能感兴趣的:(软件,回归,python,机器学习)