  作为CV小白的我,在面对复杂的数据(尤其是具有时间序列的数据)时,很简单的认为数据很美好,但面对数据集和kinect采集的数据时,还是洗洗睡吧(不,老老实实的平滑滤波吧)。虽然处理后的数据必然会失去很多信息(信息论),但谁又能保证采集的数据一定是True values呢。本着在可接受误差的基础上,开始了我的平滑预处理工作。正在做一个曲线平滑汇总,这篇博文是其中一部分,后续会持续更新。如有出错,希望指正,小白在此由衷的感谢。如有侵权还请谅解,感谢诸位!!!


  最近在看广义曲率分析(GCA)1 相关的论文,有点玄学的感觉,因为是精读+复现,有时需要对其中涉及的其他知识进行学习,所以有点慢:

  1. 奇异值分解(SVD) ,对动作数据进行降维;
  2. savgol_filter,采用Savitzky-Golay卷积平滑算法参看编程字典网站,博客1,博客2,博客3
  3. 分隔窗 ,通过奇异值分解U特征向量,构建数据向量基(basis): E ( t i ) = [ e 1 ( t i ) ∣ . . . ∣ e M ( t i ) ] E(t_i)=[e_1(t_i)|...|e_M(t_i)] E(ti)=[e1(ti)...eM(ti)],在阈值范围内,曲线的线性空间偏离较小,从而确定分隔窗大小。




Savitzky-Golay平滑公式 x k , s m o o t h = x ‾ k = 1 H ∑ i = − w + w x k + i h i x_{k,smooth}=\overline x_k=\frac{1}{H}\sum_{i=-w}^{+w}x_{k+i}h_i xk,smooth=xk=H1i=w+wxk+ihi
它使用最小二乘法将数据的一个小窗口回归到多项式上,然后使用多项式来估计窗口中心的点。其中 h i h_i hi是平滑系数,尽可能减小平滑对有用信息的影响,所以采用基于最小二乘原理,多项式拟合 h i H \frac {h_i}{H} Hhi




假设我们确定滤波的window宽度是 n = 2 m + 1 n = 2m + 1 n=2m+1,则在窗口内的采样点集(samples)为 x = ( − m , − m + 1 , . . . , 0 , . . . , m − 1 , m ) x=(-m,-m+1,...,0,...,m-1,m) x=(m,m+1,...,0,...,m1,m),参照图1。之后采用k-1次多项式对窗口内的数据点进行拟合。 y = a 0 + a 1 x + a 2 x 2 + . . . + a k − 1 x k − 1 y = a_0+a_1x+a_2x^2+...+a_{k-1}x^{k-1} y=a0+a1x+a2x2+...+ak1xk1,生成了k元线性方程组。概率论知识可知可知:若线性方程组有解,则n>k。最后通过最小二乘法拟合参数系数矩阵。
( y − m y − m + 1 ⋮ y m ) = ( 1 − m … ( − m ) k − 1 1 − m + 1 … ( − m + 1 ) k − 1 ⋮ ⋮ ⋮ ⋮ 1 m … m k − 1 ) ( a 0 a 1 ⋮ a k − 1 ) + ( e − m e − m + 1 ⋮ e m ) \begin{pmatrix} y_{-m} \\ y_{-m+1} \\ \vdots \\ y_m\end {pmatrix}= \begin{pmatrix} 1 & -m & \dots & (-m)^{k-1} \\ 1 & -m+1 & \dots & (-m+1)^{k-1} \\ \vdots & \vdots & \vdots & \vdots \\ 1 & m & \dots & m^{k-1} \end{pmatrix} \begin{pmatrix} a_{0} \\ a_{1} \\ \vdots \\ a_{k-1}\end {pmatrix}+\begin{pmatrix} e_{-m} \\ e_{-m+1} \\ \vdots \\ e_m\end {pmatrix} ymym+1ym=111mm+1m(m)k1(m+1)k1mk1a0a1ak1+emem+1em
Y ( 2 m + 1 ) × 1 = X ( 2 m + 1 ) × k ⋅ A k × 1 + E ( 2 m + 1 ) × 1 Y_{(2m+1)\times1}=X_{(2m+1)\times k} \cdot A_{k \times 1}+E_{(2m+1) \times 1} Y(2m+1)×1=X(2m+1)×kAk×1+E(2m+1)×1
A A A的最小二乘解 A A A2
A ˙ = ( X T ⋅ X ) − 1 ⋅ X T ⋅ Y \dot{A}=(X^T \cdot X)^{-1} \cdot X^T \cdot Y A˙=(XTX)1XTY
Y Y Y的模型预测值或滤波值 Y ˙ \dot{Y} Y˙
Y ˙ = X ⋅ A = X ⋅ ( X T ⋅ X ) − 1 ⋅ X T ⋅ Y = B ⋅ Y \dot{Y}=X \cdot A = X \cdot (X^T \cdot X)^{-1} \cdot X^T \cdot Y=B \cdot Y Y˙=XA=X(XTX)1XTY=BY ⇒ B = X ⋅ ( X T ⋅ X ) − 1 ⋅ X T \Rightarrow B=X \cdot (X^T \cdot X)^{-1} \cdot X^T B=X(XTX)1XT

b i b_i bi是预测值模型, y i y_i yi是观测数据与 x i x_i xi无关,令 v i = y i − b i v_i=y_i-b_i vi=yibi Y = ( y 1 , y 2 , . . . y n ) T Y=(y_1,y_2,...y_n)^T Y=(y1,y2,...yn)T
V = ( v 1 , v 2 , . . . , v n ) T V=(v_1,v_2,...,v_n)^T V=(v1,v2,...,vn)T V = Y − B = Y − X A V=Y-B=Y-XA V=YB=YXA Q = ∑ i r v i 2 = V T V = ( Y − X A ) T ( Y − X A ) = m i n Q=\sum_i^rv_i^2=V^TV=(Y-XA)^T(Y-XA)=min Q=irvi2=VTV=(YXA)T(YXA)=min ∂ ∂ A [ ( Y − X A ) T ( Y − X A ) ] \frac{\partial}{\partial A}[(Y-XA)^T(Y-XA)] A[(YXA)T(YXA)] ∂ ∂ A [ ( Y − X A ) T ( Y − X A ) ] = 2 ∂ ( Y − X A ) T ∂ A ( Y − X A ) = 2 ∂ ( Y T − A T X T ) ∂ A ( Y − X A ) = 2 [ ∂ Y T ∂ A − ∂ ( A T X T ) ∂ A ] ( Y − X A ) = − 2 X T ( Y − X A ) = − 2 X T Y + 2 X T X A = 0 \begin{aligned} \frac{\partial}{\partial A}[(Y-XA)^T(Y-XA)] &=2\frac{\partial(Y-XA)^T}{\partial A}(Y-XA)\\ &= 2\frac{\partial(Y^T-A^TX^T)}{\partial A}(Y-XA)\\ &= 2[\frac{\partial Y^T}{\partial A}-\frac{\partial (A^TX^T)}{\partial A}](Y-XA)\\ &=-2X^T(Y-XA)\\ &=-2X^TY+2X^TXA=0 \end{aligned} A[(YXA)T(YXA)]=2A(YXA)T(YXA)=2A(YTATXT)(YXA)=2[AYTA(ATXT)](YXA)=2XT(YXA)=2XTY+2XTXA=0 A = ( X − 1 ( X T ) − 1 X T Y ) = ( X T X ) − 1 X T Y A=(X^{-1}(X^T)^{-1}X^TY)=(X^TX)^{-1}X^TY A=(X1(XT)1XTY)=(XTX)1XTY


官方帮助文档,解决方法简单,参数一般是(data, window_length, polynomial order)。

x : array_like

The data to be filtered. If x is not a single or double precision floating point array, it will be converted to type numpy.float64 before filtering.

window_length : int

The length of the filter window (i.e. the number of coefficients). window_length must be a positive odd integer.

polyorder : int

The order of the polynomial used to fit the samples. polyorder must be less than window_length.

deriv : int, optional

The order of the derivative to compute. This must be a nonnegative integer. The default is 0, which means to filter the data without differentiating.

delta : float, optional

The spacing of the samples to which the filter will be applied. This is only used if deriv > 0. Default is 1.0.

axis : int, optional

The axis of the array x along which the filter is to be applied. Default is -1.

mode : str, optional

Must be ‘mirror’, ‘constant’, ‘nearest’, ‘wrap’ or ‘interp’. This determines the type of extension to use for the padded signal to which the filter is applied. When mode is ‘constant’, the padding value is given by cval. See the Notes for more details on ‘mirror’, ‘constant’, ‘wrap’, and ‘nearest’. When the ‘interp’ mode is selected (the default), no extension is used. Instead, a degree polyorder polynomial is fit to the last window_length values of the edges, and this polynomial is used to evaluate the last window_length // 2 output values.

cval : scalar, optional

Value to fill past the edges of the input if mode is ‘constant’. Default is 0.0.

y : ndarray, same shape as x

The filtered data.



import numpy as np
import matplotlib.pyplot as plt
from scipy.signal import savgol_filter
# SG算法
x = np.linspace(0,2*np.pi,100)
y = np.sin(x) + np.random.random(100) * 0.2
yhat = savgol_filter(y, 51, 3) # window size 51, polynomial order 3

plt.plot(x,yhat, color='red')
# 移动平均框(普通卷积法)
# 移动平均框(普通卷积法) + SG对比
x = np.linspace(0,2*np.pi,100)
y = np.sin(x) + np.random.random(100) * 0.8

def smooth(y, box_pts):
    box = np.ones(box_pts)/box_pts
    y_smooth = np.convolve(y, box, mode='same')
    return y_smooth

plt.plot(x, y,'o')
plt.plot(x, smooth(y,3), 'r-', lw=2)
plt.plot(x, smooth(y,19), 'g-', lw=2)
plt.plot(x,savgol_filter(y, 51, 3), 'b-', lw=2)# window size 51, polynomial order 3






  1. Arn R T, Narayana P, Emerson T, et al. Motion segmentation via generalized curvatures[J]. IEEE transactions on pattern analysis and machine intelligence, 2018, 41(12): 2919-2932. ↩︎

  2. 矩阵的最小二乘法求解,残差平方和最小 ↩︎ ↩︎
