Spatiotemporal data mining plays an important role in air quality monitoring, crowd flow modeling, and climate forecasting. However, the originally collected spatiotemporal data in real-world scenarios is usually incomplete due to sensor failures or transmission loss. Spatiotemporal imputation aims to fill the missing values according to the observed values and the underlying spatiotemporal dependence of them. The previous dominant models impute missing values autoregressively and suffer from the problem of error accumulation. As emerging powerful generative models, the diffusion probabilistic models can be adopted to impute missing values conditioned by observations and avoid inferring missing values from inaccurate historical imputation. However, the construction and utilization of conditional information are inevitable challenges when applying diffusion models to spatiotemporal imputation. To address above issues, we propose a conditional diffusion framework for spatiotemporal imputation with enhanced prior modeling, named PriSTI. Our proposed framework provides a conditional feature extraction module first to extract the coarse yet effective spatiotemporal dependencies from conditional information as the global context prior. Then, a noise estimation module transforms random noise to realistic values, with the spatiotemporal attention weights calculated by the conditional feature, as well as the consideration of geographic relationships. PriSTI outperforms existing imputa- tion methods in various missing patterns of different real-world spatiotemporal data, and effectively handles scenarios such as high missing rates and sensor failure. The implementation code is available at https://github.com/LMZZML/PriSTI.
时空数据挖掘在空气质量监测、人群流动建模和气候预测等方面发挥着重要作用。然而,现实世界中的原始时空数据通常由于传感器故障或传输损失而不完整。时空插值旨在根据观测值和它们背后的时空依赖关系填补缺失值。以前的主导模型自回归地填补缺失值,并遭受错误积累的问题。作为新兴的强大生成模型,扩散概率模型可以根据观测值条件地填补缺失值,并避免从不准确的历史插值中推断缺失值。然而,当应用扩散模型到时空插值时,构建和利用条件信息是不可避免的挑战。为了解决这些问题,我们提出了一个条件扩散框架,用于时空插值,并加强了先验建模,名为 PriSTI。我们提出的框架首先提供一个条件特征提取模块,以从条件信息中提取粗效但有效的时空依赖关系,作为全局背景先验。然后,一个噪声估计模块将随机噪声转换为现实值,利用条件特征计算时空注意力权重,并考虑地理关系。PriSTI 在各种现实世界的时空数据的缺失模式中优于现有的插值方法,并能够有效地处理高缺失率和传感器故障的情况。实现的代码可在 https://github.com/LMZZML/PriSTI 上获得。
Index Terms—Spatiotemporal Imputation, Diffusion Model, Spatiotemporal Dependency Learning
Spatiotemporal data is a type of data with intrinsic spatial and temporal patterns, which is widely applied in the real world for tasks such as air quality monitoring [1], [2], traffic status forecasting [3], [4], weather prediction [5] and so on. However, due to the sensor failures and transmission loss [2], the incompleteness in spatiotemporal data is a common problem, characterized by the randomness of missing value’s positions and the diversity of missing patterns, which results in incorrect analysis of spatiotemporal patterns and further interference on downstream tasks. In recent years, extensive research [1], [6], [7] has dived into spatiotemporal imputation, with the goal of exploiting spatiotemporal dependencies from available observed data to impute missing values.
时空数据是一种具有内在时空模式的数据,广泛应用于现实世界的任务中,如空气质量监测 [1],交通状况预测 [3],天气预测 [5] 等。然而,由于传感器故障和传输损失 [2],时空数据的不完整性是一个普遍的问题,其特征是缺失值位置的随机性和缺失模式的多样化,这会导致对时空模式的错误分析,并对后续任务产生进一步干扰。近年来,广泛的研究 [1], [6], [7] 已经深入探究了时空插值,旨在从可用的观测数据中挖掘时空依赖关系,以填补缺失值。
The early studies applied for spatiotemporal imputation usually impute along the temporal or spatial dimension with statistic and classic machine learning methods, including but not limited to autoregressive moving average (ARMA) [8], [9], expectation-maximization algorithm (EM) [10], [11], k- nearest neighbors (KNN) [12], [13], etc. But these methods impute missing values based on strong assumptions such as the temporal smoothness and the similarity between time series, and ignore the complexity of spatiote