quartz间歇
One challenge that data scientists come across is forecasting an intermittent time series.
数据科学家遇到的一个挑战是预测间歇时间序列。
That is to say — a time series with many 0s present in the data.
也就是说-数据中存在多个0的时间序列。
An example of this is daily rainfall patterns. On days where there is no rainfall, a value of 0 is recorded. This makes for quite a volatile time series with no clearly defined trend and is much more difficult for a conventional time series model such as ARIMA to forecast.
一个例子就是每天的降雨模式。 在没有降雨的日子,记录为0的值。 这造成了一个非常不稳定的时间序列,没有明确定义的趋势,并且对于诸如ARIMA之类的常规时间序列模型进行预测要困难得多。
The below data is sourced from the Irish weather broadcaster Met Éireann:
以下数据来自爱尔兰气象广播公司MetÉireann :
As we can see, forecasting a trend and seasonal patterns would prove quite tricky given that there are many 0 values present in the data at undefined intervals.
如我们所见,鉴于数据中存在许多未定义时间间隔的0值,因此预测趋势和季节模式将非常棘手。
The conventional solution in this case might be to shorten the time series, e.g. add the rainfall in mm every 30 days in order to forecast a monthly time series. However, this would result in significant data loss and in the context of less than three years of data — any forecast may well prove to be quite superficial.
在这种情况下,常规解决方案可能是缩短时间序列,例如,每30天以mm为单位增加降雨,以预测每月的时间序列。 但是,这将导致大量数据丢失,并且在不到三年的数据范围内,任何预测都可能被证明是肤浅的。
When working with a time series such as this, the tsintermittent package in R can come in quite handy.
当时间序列的工作,如这一点, tsintermittent R中的软件包可以派上用场。
In particular, Croston’s method is used on a training set for the 1,000 days of data as shown above. A forecast is then made for the next 50 time steps forward using the model, with the total rainfall forecast compared to the actual.
特别是,如上所示,在1000天数据的训练集上使用了Croston的方法。 然后使用该模型对接下来的50个时间步长进行预测,并将总降雨量与实际降雨量进行比较。
Note that Croston’s method was originally designed for intermittent demand forecasting — i.e. forecasting demand over a certain period in order to promote effective inventory management.
请注意,Croston的方法最初是为间歇性需求预测而设计的,即预测特定时期内的需求以促进有效的库存管理。
With that being said, Croston’s method can be quite effective when it comes to forecasting many time series that have intermittent values. For instance, another potential application of Croston’s method is the forecasting of water utility demand through the forecasting of intermittent rainfall patterns.
话虽如此,克罗斯顿的方法在预测许多具有间歇性值的时间序列时会非常有效。 例如,克罗斯顿方法的另一个潜在应用是通过预测间歇性降雨模式来预测用水需求 。
The tsintermittent library is installed and the relevant data is loaded:
tsintermittent库已安装,相关数据已加载:
install.packages("tsintermittent")
library(tsintermittent)mydata<-read.csv("dly532.csv")
attach(mydata)
train<-mydata$rain[1:1000]
The Croston method is then used to forecast 50 time steps forward:
然后使用Croston方法来预测50个时间步长:
crostonanalysis<-crost(train,h=50)
crostonanalysis
The model defines the initial value and weights, and forecasts a predicted daily value for rainfall:
该模型定义了初始值和权重,并预测了降雨量的预计每日值:
$frc.out
[1] 2.132122 2.132122 2.132122 2.132122 2.132122 2.132122 2.132122 2.132122
[9] 2.132122 2.132122 2.132122 2.132122 2.132122 2.132122 2.132122 2.132122
[17] 2.132122 2.132122 2.132122 2.132122 2.132122 2.132122 2.132122 2.132122
[25] 2.132122 2.132122 2.132122 2.132122 2.132122 2.132122 2.132122 2.132122
[33] 2.132122 2.132122 2.132122 2.132122 2.132122 2.132122 2.132122 2.132122
[41] 2.132122 2.132122 2.132122 2.132122 2.132122 2.132122 2.132122 2.132122
[49] 2.132122 2.132122$weights
[1] 0.011699202 0.005304516$initial
[1] 0.3995137 1.0000000
The Croston model defined above is quite simplistic — we have not defined any additional settings such as the cost function, smoothing parameters, or the number of model parameters.
上面定义的Croston模型非常简单-我们尚未定义任何其他设置,例如成本函数,平滑参数或模型参数的数量。
That said, let’s see how Croston’s method did in forecasting the next 50 days of rainfall.
就是说,让我们看看Croston的方法在预测未来50天的降雨量方面的表现。
Here is the test set (or actual 50 days of subsequent rainfall):
这是测试集(或随后的实际降雨50天):
test<-mydata$rain[1001:1050]
test
Here is the sum of the rainfall values in the test set compared to that forecasted by Croston’s method:
这是测试集中的降雨量值与Croston方法预测的降雨量值之和:
> sum(test)
[1] 102.2
> sum(crostonanalysis$frc.out)
[1] 106.6061
We can see that the forecasted value is quite close to the actual rainfall value — indicating that Croston’s method did quite a good job in forecasting the value over 50 days.
我们可以看到预测值非常接近实际降雨量-表明Croston的方法在50天的预测值方面做得很好。
Suppose we select a longer time period of 200 days. How would the model perform then?
假设我们选择200天的较长时间段。 模型将如何执行?
> sum(test)
[1] 377.4
> sum(crostonanalysis$frc.out)
[1] 426.4243
We can see that while the Croston method overestimates the total rainfall value to a greater extent under the longer time period, the estimated value is still reasonably close to the actual value.
我们可以看到,虽然克罗斯顿方法在较长的时间段内高估了总降雨值,但估算值仍合理地接近实际值。
That said, let’s see if we can improve this model.
也就是说,让我们看看是否可以改进此模型。
crostonanalysis<-crost(train,h=200,outplot=c(TRUE),init=c("mean"),cost=c("mse"))
Specifically, we are updating our model in order to:
具体来说,我们正在更新模型,以便:
- Plot the forecasts made by Croston’s method 用克罗斯顿的方法绘制预测
- Set MSE (or the mean squared error) as the cost function 将MSE(或均方误差)设置为成本函数
The init parameter (or initial values for the time series and intervals) is set to “mean”, which represents the mean of all values in the sample intervals
初始化参数(或时间序列和间隔的初始值)设置为“平均值”,它表示采样间隔中所有值的平均值
Here are the forecasts plotted by the model:
这是模型绘制的预测:
Source: RStudio 资料来源:RStudioThe sum of rainfall in mm for the test set, and as predicted by the Croston method is as follows:
测试集的降雨量总和(以克罗斯顿方法预测)如下:
> sum(test)
[1] 377.4
> sum(crostonanalysis$frc.out)
[1] 396.6583
The predicted value of 396 is now significantly closer to the actual value of 377. When the initial model was run, the forecast was further out at 426.
现在,396的预测值已接近377的实际值。运行初始模型时,在426处进行了进一步的预测。
Moreover, the tsintermittent package also allows for use of different intermittent time series methods as opposed to just the Croston method.
而且,与克罗斯顿方法不同,tsintermittent包还允许使用不同的间歇时间序列方法。
For instance, let’s define a model using the Syntetos-Boylan approximation instead:
例如,让我们使用Syntetos-Boylan近似来定义模型:
crostonanalysis<-crost(train,h=200,outplot=c(TRUE),type=c("sba"),init=c("mean"),cost=c("mse"))
The results when using this method were virtually identical to that of the Croston method:
使用此方法时,结果实际上与Croston方法相同:
> sum(test)
[1] 377.4
> sum(crostonanalysis$frc.out)
[1] 397.6018
结论 (Conclusion)
In this example, we have seen:
在此示例中,我们看到了:
- Why an intermittent time series can be tricky to forecast 为什么间歇性时间序列难以预测
- How Croston’s method can be used to forecast such a time series Croston的方法如何用于预测这样的时间序列
- Evaluation of model performance against a defined test set 根据定义的测试集评估模型性能
In addition, we also discussed the limitations of converting a time series, e.g. daily to monthly, in the sense that such an approach can result in significant data loss. Moreover, such an approach would mean less observations for a forecasting model to work with — which potentially means poor forecast accuracy.
此外,从这种方法可能导致大量数据丢失的意义上说,我们还讨论了将时间序列(例如每天到每月)转换的局限性。 此外,这样的方法将意味着要使用的预测模型的观测值更少-这可能意味着较差的预测准确性。
As we have seen, Croston’s method helps to solve this issue.
如我们所见,Croston的方法有助于解决此问题。
Many thanks for reading, and you can find more of my data science content at michael-grogan.com.
非常感谢您的阅读,您可以在michael-grogan.com上找到更多我的数据科学内容。
Disclaimer: This article is written on an “as is” basis and without warranty. It was written with the intention of providing an overview of data science concepts, and should not be interpreted as professional advice. The findings and interpretations in this article are those of the author and are not endorsed by or affiliated with Met Éireann in any way.
免责声明:本文按“原样”撰写,不作任何担保。 本文档旨在概述数据科学概念,不应将其解释为专业建议。 本文中的发现和解释仅归作者所有,并不以任何方式得到MetÉireann的认可或附属。
翻译自: https://medium.com/swlh/forecasting-an-intermittent-time-series-1461de7616fe
quartz间歇