Buishand U test 变点检测的基本原理如下:
Let X denote a normal random variate, then the following model with a single shift (change-point) can be proposed:
x [ i ] = μ + ε [ i ] x[i] = μ + ε[i] x[i]=μ+ε[i] for i = 1, …, m and x [ i ] = μ + δ + ε i x[i] = μ + δ + ε_i x[i]=μ+δ+εi for i = m + 1, …, n
with ε ≈ N ( 0 , σ ) ε \approx N(0,σ) ε≈N(0,σ). The null hypothesis Δ = 0 is tested against the alternative δ != 0.
In the Buishand U test, the rescaled adjusted partial sums are calculated as
S [ k ] = ∑ ( x [ i ] − m e a n ( x ) ) ( 1 < = i < = n ) S[k] = ∑ (x[i] - mean(x)) (1<= i <= n) S[k]=∑(x[i]−mean(x))(1<=i<=n)
The sample standard deviation is
D ( x ) = n − 1 ∑ ( x − μ ) D(x) = \sqrt{n^{-1} ∑(x - μ)} D(x)=n−1∑(x−μ)
The test statistic is calculated as:
U = 1 n ∗ ( n + 1 ) ∗ ∑ k = 1 n − 1 ( S [ k ] − D x ) 2 U = \frac{1} { n * (n + 1)} * \sum_{k=1}^{n-1} (S[k] - Dx)^2 U=n∗(n+1)1∗∑k=1n−1(S[k]−Dx)2
变点位置K的计算方式为:
K = arg max ∣ S ∣ K=\argmax|S| K=argmax∣S∣
其关键代码如下:
xmean <- mean(x)
n <- length(x)
k <- 1:n
Sk <- sapply(k, function(i) sum(x[1:i] - xmean))
sigma <- sd(x)
U <- 1 / (n * ( n + 1)) * sum((Sk[1:(n-1)] / sigma)^2)
Ska <- abs(Sk)
S <- max(Ska)
K <- k[Ska == S]
## standardised value
Skk <- (Sk / sigma)
if (is.ts(x)){
fr <- frequency(x)
st <- start(x)
ed <- end(x)
Skk <- ts(Sk, start=st, end = ed, frequency= fr)
}
...
attr(Skk, 'nm') <- "Sk**"
R中的bu.test()方法包含在trend包中,使用方法如下:
bu.test(x, m = 20000)
参数说明:
x
a vector of class “numeric” or a time series object of class “ts”
m
numeric, number of Monte-Carlo replicates, defaults to 20000
输出说明:
data.name
character string that denotes the input data
p.value
the p-value
statistic
the test statistic
null.value
the null hypothesis
estimates
the time of the probable change point
alternative
the alternative hypothesis
method
character string that denotes the test
data
numeric vector of Sk for plotting
The p.value is estimated with a Monte Carlo simulation using m replicates.
Critical values based on m = 19999 Monte Carlo simulations are tabulated for U by Buishand (1982, 1984).
我们来使用Nile数据集来测试一下Buishand U Test(完整代码请移步我的github)
data(Nile)
(out <- bu.test(Nile))
print(out)
Buishand U test
data: Nile
U = 2.4764, n = 100, p-value < 2.2e-16
alternative hypothesis: true delta is not equal to 0
sample estimates:
probable change point at time K
28
par(mfrow=c(2,1))
start=1871
cp=unname(out$estimate)
x=start+cp-1
plot(Nile)
abline(v=x,col='red')
plot(out)
abline(v=x,col='red')
可以看出以上方法实际上是检测均值的跃变点。
Buishand U Test for Change-Point Detection