数据变换
为保证建模的质量与系统分析的正确结果,对收集来的原始数据必须进行数据变换 和处理,使其消除量纲和具有可比性。
设有序列
x = ( x ( 1 ) , x ( 2 ) , x ( 3 ) , ⋯ , x ( n ) ) x = (x(1),x(2),x(3),\cdots,x(n)) x=(x(1),x(2),x(3),⋯,x(n))
则称映射
f → y f \rightarrow y f→y
f ( x ( k ) ) = y ( k ) , k = 1 , 2 , ⋯ , n f(x(k)) = y(k),k=1,2,\cdots,n f(x(k))=y(k),k=1,2,⋯,n
为序列 x x x 到序列 y y y 的数据变换。
下面介绍几个常用变换:
1)初值化变换
f ( x ( k ) ) = x ( k ) x ( 1 ) = y ( k ) f(x(k)) = \frac {x(k)}{x(1)} = y(k) f(x(k))=x(1)x(k)=y(k)
2)均值化变换
f ( x ( k ) ) = x ( k ) x ˉ = y ( k ) , x ˉ = 1 n ∑ k = 1 n x ( k ) f(x(k)) = \frac {x(k)}{\bar x} = y(k),\bar x = \frac{1}{n}\sum\limits_{k=1}^n x(k) f(x(k))=xˉx(k)=y(k),xˉ=n1k=1∑nx(k)
3)归一化变换
f ( x ( k ) ) = x ( k ) x 0 = y ( k ) f(x(k)) = \frac {x(k)}{x_0} = y(k) f(x(k))=x0x(k)=y(k)
其中, x 0 x_0 x0为某个大于0的值。
4)极差最大值化变换
f ( x ( k ) ) = x ( k ) − min k x ( k ) max k x ( k ) = y ( k ) f(x(k))=\frac{x(k) - \mathop{\min }\limits_{k}x(k)}{\mathop{\max}\limits_{k}x(k)} = y(k) f(x(k))=kmaxx(k)x(k)−kminx(k)=y(k)
5)区间值化变换
f ( x ( k ) ) = x ( k ) − min k x ( k ) max k x ( k ) − min k x ( k ) = y ( k ) f(x(k))=\frac{x(k) - \mathop{\min }\limits_{k}x(k)}{\mathop{\max}\limits_{k}x(k)-\mathop{\min}\limits_{k}x(k)} = y(k) f(x(k))=kmaxx(k)−kminx(k)x(k)−kminx(k)=y(k)
关联分析
选取参考列
x 0 = ( x 0 ( 1 ) , x 0 ( 2 ) , ⋯ , x 0 ( n ) ) x_0=(x_0(1),x_0(2),\cdots,x_0(n)) x0=(x0(1),x0(2),⋯,x0(n))
假设有 m m m个比较列
x i = ( x i ( 1 ) , x i ( 2 ) , ⋯ , x i ( n ) ) , i = 1 , 2 , ⋯ , m x_i = (x_i(1),x_i(2),\cdots,x_i(n)),i=1,2,\cdots,m xi=(xi(1),xi(2),⋯,xi(n)),i=1,2,⋯,m
则称
ξ i ( k ) = min s min t ∣ x 0 ( t ) − x s ( t ) ∣ + ρ max s max t ∣ x 0 ( t ) − x s ( t ) ∣ ∣ x 0 ( k ) − x i ( k ) ∣ + ρ max s max t ∣ x 0 ( t ) − x s ( t ) ∣ \xi_i(k) = \frac{ \min\limits_{s}\min\limits_{t}|x_0(t)-x_s(t)|+\rho\max\limits_{s}\max\limits_{t}|x_0(t)-x_s(t)|}{|x_0(k)-x_i(k)|+\rho\max\limits_{s}\max\limits_{t}|x_0(t)-x_s(t)|} ξi(k)=∣x0(k)−xi(k)∣+ρsmaxtmax∣x0(t)−xs(t)∣smintmin∣x0(t)−xs(t)∣+ρsmaxtmax∣x0(t)−xs(t)∣
为比较序列 x i x_i xi对参考序列 x 0 x_0 x0在 k k k时刻的关联系数,其中,分辨系数 ρ ∈ [ 0 , 1 ] \rho\in[0,1] ρ∈[0,1]。一般来讲,分辨系数 ρ \rho ρ越大,分辨率越大; ρ \rho ρ 越小,分辨率越小。
因此,关联度
r i = 1 n ∑ k = 1 n ξ i ( k ) r_i = \frac{1}{n}\sum\limits_{k=1}^n \xi_i(k) ri=n1k=1∑nξi(k)
Matlab代码:
clear
clc
x = [] %数据,一种因素一行
[num,~] = size(x)
%数据处理,采用初值化变化
for i = 1:num
%如果有负相关
% if i == 3
% x(i,:) = x(i,1)./x(i,:);
% else
x(i,:) = x(i,:)./x(i,1);
% end
end
x_new = x;
H = size(x_new,1); %1行数x_new
L = size(x_new,2); %2列数x_new
for i=1:H-1
for k=1:L
cha(i,k) = abs(x(1,k) - x(i+1,k));
end
end
min_cha=min(min(cha));
max_cha=max(max(cha));
%取分辨系数0.5
p=0.5;
for i=1:H-1
for k=1:L
r(i,k) = (min_cha + p*max_cha)./(cha(i,k)+p*max_cha);
end
end
r1 = sum(r(1:H-1,:),2)/(L); %2一行求和
% [rs,rind] = sort(r,'descend');
累加生成AGO
原始数列
x 0 = ( x 0 ( 1 ) , x 0 ( 2 ) , ⋯ , x 0 ( n ) ) x^0 = (x^0(1),x^0(2),\cdots,x^0(n)) x0=(x0(1),x0(2),⋯,x0(n))
通过
x 1 ( k ) = ∑ i = α k x 0 ( i ) , k = α , ⋯ , n x^1(k) = \sum\limits_{i=\alpha} ^kx^0(i),k=\alpha,\cdots,n x1(k)=i=α∑kx0(i),k=α,⋯,n
其中, α ⩽ n \alpha \leqslant n α⩽n为正整数。上述累加过程当 1 ⩽ α ⩽ k 1 \leqslant \alpha \leqslant k 1⩽α⩽k 时称为去首累加生成,当 α = 1 \alpha = 1 α=1时称为一般累加生成.。(这里我们只讨论 α = 1 \alpha = 1 α=1的情况)
得到一次累加数列,1-AGO
x 1 = ( x 1 ( 1 ) , x 1 ( 2 ) , ⋯ , x 1 ( n ) ) x^1=(x^1(1),x^1(2),\cdots,x^1(n)) x1=(x1(1),x1(2),⋯,x1(n))
在一次累加的基础上继续累加得到二次累加生成2-AGO。以后同理。
累减生成IAGO
记第 r r r次累加为 r r r-AGO,则称
x ( r − 1 ) ( k ) = x ( r ) ( k ) − x ( r ) ( k − 1 ) , k = 2 , 3 , ⋯ , n x^{(r-1)}(k) = x^{(r)}(k)-x^{(r)}(k-1),k=2,3,\cdots,n x(r−1)(k)=x(r)(k)−x(r)(k−1),k=2,3,⋯,n
为 r r r次累减生成数列。
均值生成
原始数列
x 0 = ( x 0 ( 1 ) , x 0 ( 2 ) , ⋯ , x 0 ( n ) ) x^0 = (x^0(1),x^0(2),\cdots,x^0(n)) x0=(x0(1),x0(2),⋯,x0(n))
对于常数 α ∈ [ 0 , 1 ] \alpha\in[0,1] α∈[0,1],称
z 0 ( k ) = α x 0 ( k ) + ( 1 − α ) x 0 ( k − 1 ) z^0(k) = \alpha x^0(k)+(1-\alpha) x^0(k-1) z0(k)=αx0(k)+(1−α)x0(k−1)
为邻值生成数。
当 α = 0.5 \alpha=0.5 α=0.5时,称
z 0 ( k ) = 0.5 x 0 ( k ) + 0.5 x 0 ( k − 1 ) z^0(k) = 0.5 x^0(k)+0.5 x^0(k-1) z0(k)=0.5x0(k)+0.5x0(k−1)
为(紧)邻均值生成数,即等权邻值生成数。
数据检验
首先,为了保证建模方法的可行性,需要对已知数据列做必要的检验处理。设参考
数据为 x ( 0 ) = ( x ( 0 ) ( 1 ) , x ( 0 ) ( 2 ) , ⋯ , x ( 0 ) ( n ) ) x^{(0)} = (x^{(0)}(1),x^{(0)}(2),\cdots,x^{(0)}(n)) x(0)=(x(0)(1),x(0)(2),⋯,x(0)(n)),计算数列的级比
λ ( k ) = x ( 0 ) ( k − 1 ) x ( 0 ) ( k ) , k = 2 , 3 , ⋯ , n \lambda(k)=\frac{x^{(0)}(k-1)}{x^{(0)}(k)},k=2,3,\cdots,n λ(k)=x(0)(k)x(0)(k−1),k=2,3,⋯,n
如果所有的级比 λ ( k ) \lambda(k) λ(k)都落在可容覆盖 ( e − 2 n + 1 , e 2 n + 2 ) (e^{-\frac{2}{n+1}},e^{\frac{2}{n+2}}) (e−n+12,en+22)内,则数列 x ( 0 ) x^{(0)} x(0)可以作为模型
的数据进行灰色预测。否则,需要对数列 x ( 0 ) x^{(0)} x(0) 做必要的变换处理,使其落入可容覆盖内。即取适当的常数c,作平移变换
y ( 0 ) ( k ) = x ( 0 ) ( k ) + c , k = 2 , 3 , ⋯ , n y^{(0)}(k)=x^{(0)}(k)+c,k=2,3,\cdots,n y(0)(k)=x(0)(k)+c,k=2,3,⋯,n
使数列 y ( 0 ) = ( y ( 0 ) ( 1 ) , y ( 0 ) ( 2 ) , ⋯ , y ( 0 ) ( n ) ) y^{(0)} = (y^{(0)}(1),y^{(0)}(2),\cdots,y^{(0)}(n)) y(0)=(y(0)(1),y(0)(2),⋯,y(0)(n))的级比都落入到 ( e − 2 n + 1 , e 2 n + 2 ) (e^{-\frac{2}{n+1}},e^{\frac{2}{n+2}}) (e−n+12,en+22)内。
x ( 0 ) x^{(0)} x(0)为 n n n个元素的数列 x ( 0 ) = ( x ( 0 ) ( 1 ) , x ( 0 ) ( 2 ) , ⋯ , x ( 0 ) ( n ) ) x^{(0)} = (x^{(0)}(1),x^{(0)}(2),\cdots,x^{(0)}(n)) x(0)=(x(0)(1),x(0)(2),⋯,x(0)(n)), x ( 0 ) x^{(0)} x(0)的 1-AGO 生成数列为 x ( 1 ) = ( x ( 1 ) ( 1 ) , x ( 1 ) ( 2 ) , ⋯ , x ( 1 ) ( n ) ) x^{(1)}=(x^{(1)}(1),x^{(1)}(2),\cdots,x^{(1)}(n)) x(1)=(x(1)(1),x(1)(2),⋯,x(1)(n)),则定义 x ( 0 ) x^{(0)} x(0) 的灰导数为
d ( k ) = x ( 0 ) ( k ) = x ( 1 ) ( k ) − x ( 1 ) ( k − 1 ) d(k) = x^{(0)}(k) = x^{(1)}(k)-x^{(1)}(k-1) d(k)=x(0)(k)=x(1)(k)−x(1)(k−1)
令 z ( 1 ) z^{(1)} z(1) 为数列 x ( 1 ) x^{(1)} x(1)的紧邻均值数列,即
z ( 1 ) ( k ) = 0.5 x ( 1 ) ( k ) + 0.5 x ( 1 ) ( k − 1 ) , k = 2 , 3 , ⋯ , n z^{(1)}(k) = 0.5 x^{(1)}(k)+0.5 x^{(1)}(k-1),k=2,3,\cdots,n z(1)(k)=0.5x(1)(k)+0.5x(1)(k−1),k=2,3,⋯,n
则 z ( 1 ) = ( z ( 1 ) ( 2 ) , z ( 1 ) ( 3 ) , ⋯ , z ( 1 ) ( n ) ) z^{(1)} = (z^{(1)}(2),z^{(1)}(3),\cdots,z^{(1)}(n)) z(1)=(z(1)(2),z(1)(3),⋯,z(1)(n))。于是定义 GM(1,1)的灰微分方程模型为
d ( k ) + a z ( 1 ) ( k ) = b d(k)+az^{(1)}(k)=b d(k)+az(1)(k)=b
即
x ( 0 ) ( k ) + a z ( 1 ) ( k ) = b x^{(0)}(k)+az^{(1)}(k)=b x(0)(k)+az(1)(k)=b
其中 x ( 0 ) ( k ) x^{(0)}(k) x(0)(k)称为灰导数, a a a称为发展系数, z ( 1 ) ( k ) z^{(1)}(k) z(1)(k)称为白化背景值, b b b称为灰作用量。
将时刻 k = 2 , 3 , ⋯ , n k=2,3,\cdots,n k=2,3,⋯,n,代入式中有
{ x ( 0 ) ( 2 ) + a z ( 1 ) ( 2 ) = b x ( 0 ) ( 3 ) + a z ( 1 ) ( 3 ) = b ⋯ ⋯ x ( 0 ) ( n ) + a z ( 1 ) ( n ) = b \begin{cases} x^{(0)}(2)+az^{(1)}(2)=b\\ x^{(0)}(3)+az^{(1)}(3)=b \\ \cdots\cdots\\ x^{(0)}(n)+az^{(1)}(n)=b \end{cases} ⎩⎪⎪⎪⎨⎪⎪⎪⎧x(0)(2)+az(1)(2)=bx(0)(3)+az(1)(3)=b⋯⋯x(0)(n)+az(1)(n)=b
令 Y = ( x ( 0 ) ( 2 ) , x ( 0 ) ( 3 ) , ⋯ , x ( 0 ) ( n ) ) T , u = ( a , b ) T , B = [ − z ( 1 ) ( 2 ) 1 − z ( 1 ) ( 3 ) 1 ⋮ ⋮ − z ( 1 ) ( n ) 1 ] Y=(x^{(0)}(2),x^{(0)}(3),\cdots,x^{(0)}(n))^T,u=(a,b)^T,B=\begin{bmatrix} -z^{(1)}(2) &1\\ -z^{(1)}(3) &1 \\ \vdots&\vdots\\ -z^{(1)}(n) &1 \end{bmatrix} Y=(x(0)(2),x(0)(3),⋯,x(0)(n))T,u=(a,b)T,B=⎣⎢⎢⎢⎡−z(1)(2)−z(1)(3)⋮−z(1)(n)11⋮1⎦⎥⎥⎥⎤,称 Y Y Y为数据向量, B B B为数据矩阵, u u u 为参数向量,则GM(1,1)模型可以表示为矩阵方程 Y = B u Y =Bu Y=Bu。
由最小二乘法可以求得
u = ( a , b ) T = ( B T B ) − 1 B T Y u=(a,b)^T=(B^TB)^{-1}B^TY u=(a,b)T=(BTB)−1BTY
GM(1,1)的白化微分方程为
d x 1 ( 1 ) d t + a x 1 ( 1 ) ( t ) = ∑ i = 2 N b i x 1 ( 1 ) ( t ) \frac{dx_1^{(1)}}{dt}+ax_1^{(1)}(t)=\sum\limits_{i=2}^Nb_ix_1^{(1)}(t) dtdx1(1)+ax1(1)(t)=i=2∑Nbix1(1)(t)
将灰参数带入,解得
x ( 1 ) ^ ( k + 1 ) = ( x 0 ( 1 ) − b a ) e − a k + b a , k = 1 , 2 , ⋯ , n − 1 \hat{x^{(1)}}(k+1) = (x^0(1)-\frac{b}{a})e^{-ak}+\frac{b}{a},k=1,2,\cdots,n-1 x(1)^(k+1)=(x0(1)−ab)e−ak+ab,k=1,2,⋯,n−1
累减IAGO,得到预测值
x ( 0 ) ^ ( k + 1 ) = x ( 1 ) ^ ( k + 1 ) − x ( 1 ) ^ ( k ) , k = 1 , 2 , ⋯ , n − 1 \hat{x^{(0)}}(k+1)=\hat{x^{(1)}}(k+1)-\hat{x^{(1)}}(k),k=1,2,\cdots,n-1 x(0)^(k+1)=x(1)^(k+1)−x(1)^(k),k=1,2,⋯,n−1
残差: x ( 0 ) ( k ) − x ( 0 ) ^ ( k ) {x^{(0)}}(k)-\hat{x^{(0)}}(k) x(0)(k)−x(0)^(k)
相对误差: x ( 0 ) ( k ) − x ( 0 ) ^ ( k ) x ( 0 ) ( k ) \frac{x^{(0)}(k)-\hat{x^{(0)}}(k)}{x^{(0)}(k)} x(0)(k)x(0)(k)−x(0)^(k)
Matlab代码:
级比数据处理:
function [x0] = huise_change(x0)
%级比数据处理
[~,m] = size(x0);
for k = 2:m
c(k) = x0(1,k-1)/x0(1,k);
end
for k = 2:m
if(c(k)exp(2/(m+2)))
while (x0(1,k-1)/x0(1,k))>exp(2/(m+2))
x0(1,k) = x0(1,k)+0.0001;
end
end
end
end
GM(1,1)模型
x0 = [] %数据
%计算级比,平移处理数据
x0 = huise_change(x0);
[~,m] = size(x0); %n行m列
%累加AGO
AGO = cumsum(x0);
for k=2:m %取a = 0.5
Z(k)=(AGO(k-1)+AGO(k))/2; %Z(i)为xi(1)的紧邻均值生成序列
end
Z(1) = []; %去掉第一个数
B = [-Z;ones(1,m-1)];
Y=x0;
Y(1) = [];
Y = Y';
B=B';
%最小二乘计算a(发展系数),b(灰作用量)
%c = (B'*Y)/(B'*B);
c = inv(B'*B)*B'*Y;
c = c';
a = c(1);
b = c(2);
%预测
F = []; F(1) = x0(1);
for i = 2:m+10 %这里10代表向后预测的数目,如果只预测一个的话为1
F(i) = (x0(1)-b/a)/exp(a*(i-1))+ b/a;
end
%对数列 F 累减还原IAGO,得到预测出的数据
G = []; G(1) = x0(1);
for i = 2:m+10 %10同上
G(i) = F(i) - F(i-1); %得到预测出来的数据
end
% disp('预测数据为:');
t = 1:m;
plot(t,x0)
hold on;
t = 1:m+10;
plot(t,G,'r')
同GM(1,1)原理类似,GM(1,1)即表示模型是 1 阶的,且只含 1 个变量的灰色模型。而GM(1,N) 即表示模型是 1 阶的,包含有 N 个变量的灰色模型。
Matlab代码:
function [cancha,G,u] = huise_GM_N(x0,day)
%GM(1,N)
%处理数据
x0 = huise_change(x0);
[n,m]=size(x0); %n行m列
%AGO = zeros(n,m+1);
%n行(个)的影响因子的AGO累加
AGO = cumsum(x0,2); %单行矩阵默认横向,1纵向,2横向
for k=2:m %取a = 0.5
Z(k)=(AGO(1,k-1)+AGO(1,k))/2; %Z(i)为xi(1)的紧邻均值生成序列
end
Z = Z(2:end);
Z = Z';
AGO_ = AGO(2:end,2:end)';
B = [-Z,AGO_];
Y = x0(1,2:end);
C=((B'*B)\(B'*Y'))'; %由回归公式,确定a,b,建立GM(1,n)模型,参数列
a=C(1);
b=C(:,2:end);
F=[];
F(1)=x0(1,1);
u=zeros(1,m);
for i=1:m
for j=1:n-1
u(i)=u(i)+(b(j)*AGO(j+1,i));
end
end
for i=1:m-1
x(i) = u(i+1) - u(i);
end
ave = sum(x)/m-1;
for k=1:m
F(k)=(x0(1,1)-u(k)/a)/exp(a*(k-1))+u(k)/a;
end
G=[];
G(1)=x0(1,1);
b_ = u(m);
%预测
for k=m+1:m+1+day
b_ = b_+ave+rand(1)*100; %需要修改随机数
F(k)=(x0(1,1)-b_/a)/exp(a*(k-1))+b_/a;
end
for k=2:m+1+day
G(k)=F(k)-F(k-1);%两者做差还原原序列,得到预测数据
end
%残差
for k=2:m
cancha(k-1) = abs((x0(1,k-1) - G(k)))/x0(1,k-1);
end
t1=1:m;
t2=1:m+1+day;
plot(t1,x0(1,:),'bo--');
hold on;
plot(t2,G,'r*-');
title('预测结果');
legend('真实值','预测值');
本文对修正、优化模型未作涉及,仅是基础原理。
学习笔记,如有错误,欢迎指正。