求Hurst指数的意义:
作为判断时间序列数据遵从随机游走还是有偏的随机游走过程的指标。简单来说就是为了通过计算的这个指数判断一下未来的趋势,也有用来判断时间序列平稳性的。
Hurst指数估计的方法有很多,按照时域和频域可以分为如下几个【1】:
时域:方差-时间法、聚合序列绝对值法、R/S法(用得最多)和趋势波动分析(DFA)
频域:周期图法、Whittle估计法和小波分析法
Hurst指数在不同范围下的意义:【2】
DFA的计算原理和计算步骤(关于DFA的计算版本有很多,这里只是其中的一种)
已知:一个时间序列,长度为N,记作X,时间序列表示为{xi},i = 1,2,…,N;
①取序列,选择合适的区间长度,也就是时间窗s,将序列划分为长度为s的不重叠等长度子区间,长度为N的子序列就会被分成Ns = N/s个段,但是想一下极端的事件,时间序列的个数是个质数怎么办,不要后面除不尽的又好浪费数据,于是为保证序列信息不丢失,可以取两次数据,第一次丢掉最后几个取不到的数据,第二次就倒着看,正着算,丢掉前面几个取不到的数据。
N s = f l o o r ( N / s ) , 其 中 f l o o r ( ) 表 示 向 下 取 整 Ns = floor(N/s),其中floor()表示向下取整 Ns=floor(N/s),其中floor()表示向下取整
m 1 = N s ∗ s m1 = Ns*s m1=Ns∗s
X 1 = x 1 , x 2 , . . . x m 1 X1 = {x1,x2,...xm1} X1=x1,x2,...xm1
m 2 = N − m 1 m2 = N-m1 m2=N−m1
X 2 = x m 2 , . . . . x N X2 = {xm2,....xN} X2=xm2,....xN
②分别对新序列X1和X2计算累积离差,生成新的序列Y1和Y2
③对每个子区间进行多项式拟合,得到局部趋势韩式,并计算消除趋势的序列:
Python代码
# author: Dominik Krzeminski (dokato)
import numpy as np
import matplotlib.pyplot as plt
import scipy.signal as ss
import xlrd
# detrended fluctuation analysis
def calc_rms(x, scale):
"""
windowed Root Mean Square (RMS) with linear detrending.
Args:
-----
*x* : numpy.array
one dimensional data vector
*scale* : int
length of the window in which RMS will be calculaed
Returns:
--------
*rms* : numpy.array
RMS data in each window with length len(x)//scale
"""
# making an array with data divided in windows
shape = (x.shape[0]//scale, scale)
X = np.lib.stride_tricks.as_strided(x,shape=shape)
# vector of x-axis points to regression
scale_ax = np.arange(scale)
rms = np.zeros(X.shape[0])
for e, xcut in enumerate(X):
coeff = np.polyfit(scale_ax, xcut, 1)
xfit = np.polyval(coeff, scale_ax)
# detrending and computing RMS of each window
rms[e] = np.sqrt(np.mean((xcut-xfit)**2))
return rms
def dfa(x, scale_lim=[4,10], scale_dens=0.25, show=False):
"""
Detrended Fluctuation Analysis - measures power law scaling coefficient
of the given signal *x*.
More details about the algorithm you can find e.g. here:
Hardstone, R. et al. Detrended fluctuation analysis: A scale-free
view on neuronal oscillations, (2012).
Args:
-----
*x* : numpy.array
one dimensional data vector
*scale_lim* = [5,9] : list of length 2
boundaries of the scale, where scale means windows among which RMS
is calculated. Numbers from list are exponents of 2 to the power
of X, eg. [5,9] is in fact [2**5, 2**9].
You can think of it that if your signal is sampled with F_s = 128 Hz,
then the lowest considered scale would be 2**5/128 = 32/128 = 0.25,
so 250 ms.
*scale_dens* = 0.25 : float
density of scale divisions, eg. for 0.25 we get 2**[5, 5.25, 5.5, ... ]
*show* = False
if True it shows matplotlib log-log plot.
Returns:
--------
*scales* : numpy.array
vector of scales (x axis)
*fluct* : numpy.array
fluctuation function values (y axis)
*alpha* : float
estimation of DFA exponent
"""
# cumulative sum of data with substracted offset
y = np.cumsum(x - np.mean(x))
scales = (2**np.arange(scale_lim[0], scale_lim[1], scale_dens)).astype(np.int)
# scales = list(range(4,17))
fluct = np.zeros(len(scales))
# computing RMS for each window
for e, sc in enumerate(scales):
fluct[e] = np.sqrt(np.mean(calc_rms(y, sc)**2))
# fitting a line to rms data
coeff = np.polyfit(np.log2(scales), np.log2(fluct), 1)
if show:
fluctfit = 2**np.polyval(coeff,np.log2(scales))
plt.loglog(scales, fluct, 'bo')
plt.loglog(scales, fluctfit, 'r', label=r'$\alpha$ = %0.2f'%coeff[0])
plt.title('DFA')
plt.xlabel(r'$\log_{10}$(time window)')
plt.ylabel(r'$\log_{10}$' )
plt.legend()
plt.show()
return scales, fluct, coeff[0]
if __name__=='__main__':
# n = 1000
# x = np.random.randn(n)
# # computing DFA of signal envelope
# x = np.abs(ss.hilbert(x))
# 径流输入
YC_flow = 'SX-data/YC.xlsx'
input_filePath = YC_flow
input_data = []
wb = xlrd.open_workbook(filename=input_filePath)
print(wb.sheet_names())
for year in wb.sheet_names():
# year_data = []
sheet = wb.sheet_by_name(year)
cols = sheet.col_values(1)
for col in cols:
# year_data.append(col)
input_data.append(col)
scales, fluct, alpha = dfa(input_data, show=1)
print(scales)
print(fluct)
print("DFA exponent: {}".format(alpha))
matlab代码
function[A,F] = DFA_fun(data,pts,order)
% -----------------------------------------------------
% DESCRIPTION:
% Function for the DFA analysis.
% INPUTS:
% data: a one-dimensional data vector.
% pts: sizes of the windows/bins at which to evaluate the fluctuation
% order: (optional) order of the polynomial for the local trend correction.
% if not specified, order == 1;
% OUTPUTS:
% A: a 2x1 vector. A(1) is the scaling coefficient "alpha",
% A(2) the intercept of the log-log regression, useful for plotting (see examples).
% F: A vector of size Nx1 containing the fluctuations corresponding to the
% windows specified in entries in pts.
% -----------------------------------------------------
% Checking the inputs
if nargin == 2
order = 1;
end
sz = size(data);
if sz(1)< sz(2)
data = data';
end
exit = 0;
if min(pts) == order+1
disp(['WARNING: The smallest window size is ' num2str(min(pts)) '. DFA order is ' num2str(order) '.'])
disp('This severly affects the estimate of the scaling coefficient')
disp('(If order == [] (so 1), the corresponding fluctuation is zero.)')
elseif min(pts) < (order+1)
disp(['ERROR: The smallest window size is ' num2str(min(pts)) '. DFA order is ' num2str(order) ':'])
disp(['Aborting. The smallest window size should be of ' num2str(order+1) ' points at least.'])
exit = 1;
end
if exit == 1
return
end
% DFA
npts = numel(pts); %返回数组pts中元素个数
F = zeros(npts,1);
N = length(data);
for h = 1:npts
w = pts(h);
n = floor(N/w);
Nfloor = n*pts(h);
D = data(1:Nfloor);
y = cumsum(D-mean(D));
bin = 0:w:(Nfloor-1);
vec = 1:w;
coeff = arrayfun(@(j) polyfit(vec',y(bin(j) + vec),order),1:n,'uni',0);
y_hat = cell2mat(cellfun(@(y) polyval(y,vec),coeff,'uni',0));
F(h) = mean((y - y_hat').^2)^0.5;
end
A = polyfit(log(pts),log(F)',1);
end
[1]刘付斌, 高相铭. 基于EEMD与DFA的Hurst指数估计[J]. 测控技术, 2013(10):101-104.
[2]袁全勇,杨阳,李春,阚威,叶柯华.基于Hurst指数的风速时间序列研究[J].应用数学和力学,2018,39(07):798-810.