【Paper】A review of data-driven building energy consumption prediction studies

论文原文:https://www.sciencedirect.com/science/article/pii/S1364032117306093
论文被引:351(08/06/220)
论文年份:2018


Abstract

Energy is the lifeblood of modern societies. In the past decades, the world’s energy consumption and associated CO2emissions increased rapidly due to the increases in population and comfort demands of people. Building energy consumption prediction is essential for energy planning, management, and conservation. Data-driven models provide a practical approach to energy consumption prediction. This paper offers a review of the studies that developed data-driven building energy consumption prediction models, with a particular focus on reviewing the scopes of prediction, the data properties and the data preprocessing methods used, the machine learning algorithms utilized for prediction, and the performance measures used for evaluation. Based on this review, existing research gaps are identified and future research directions in the area of data-driven building energy consumption prediction are highlighted.

能源是现代社会的命脉。在过去的几十年里,由于人口的增加和人们对舒适度的要求,世界能源消耗和二氧化碳排放量迅速增加。建筑能耗预测是能源规划、管理和节约的基础。数据驱动模型为能源消耗预测提供了一种实用的方法。本文回顾了开发数据驱动的建筑能耗预测模型的研究,重点介绍了预测的范围、数据特性和使用的数据预处理方法、用于预测的机器学习算法以及用于评估的性能度量。在此基础上,指出了数据驱动建筑能耗预测领域存在的研究空白和未来的研究方向。


1. Introduction

Buildings represent a large portion of the world’s energy consumption and associated CO2emissions. For example, the building sector represents 39% and 40% of the energy consumption and 38% and 36% of the CO2emissions in the U.S. [1] and Europe [2], respectively. The use of energy that is generated from fossil fuels contributes CO2 emissions and causes air pollution and global warming. Prediction of building energy consumption is crucial for improved decision making towards reducing energy consumption and CO2emissions, because it can assist in evaluating different building design alternatives and building operation strategies (in terms of their energy efficiency) and improving demand and supply management. However, building energy consumption prediction remains to be a challenging task due to the variety of factors that affect the consumption such as the physical properties of the building, the installed equipment, the outdoor weather conditions, and the energy-use behavior of the building occupants [3].

建筑占世界能源消耗和相关二氧化碳排放的很大一部分。例如,美国[1]和欧洲[2]的建筑业分别占能源消耗的39%和40%,二氧化碳排放量的38%和36%。使用矿物燃料产生的能源会造成二氧化碳排放,并造成空气污染和全球变暖。建筑能耗预测对于改善降低能源消耗和二氧化碳排放的决策至关重要,因为它可以帮助评估不同的建筑设计方案和建筑运营策略(就其能源效率而言),并改善需求和供应管理。然而,建筑能耗预测仍然是一项具有挑战性的任务,因为影响建筑能耗的因素很多,如建筑物的物理特性、安装的设备、室外天气条件以及建筑居住者的能源使用行为[3]。

Two main approaches have been taken for building energy consumption prediction: physical modelling approach and data-driven approach. Physical models (also known as engineering methods or white-box models) rely on thermodynamic rules for detailed energy modelling and analysis. Examples of building energy simulation software that utilize physical models include EnergyPlus, eQuest, and Ecotect. These types of software calculate building energy consumption based on detailed building and environmental parameters such as building construction details; operation schedules; HVAC design information; and climate, sky, and solar/shading information [4]. However, some of such detailed data may not be available to the users at the time of simulation. Failure to provide accurate input can result in poor prediction performance.

建筑能耗预测主要采用两种方法:物理模型法和数据驱动法。物理模型(也称为工程方法或白盒模型)依赖于热力学规则进行详细的能量建模和分析。使用物理模型的建筑能耗模拟软件的示例包括EnergyPlus、eQuest和Ecotect。这些类型的软件根据详细的建筑和环境参数计算建筑能耗,如建筑施工详图、运行时间表、HVAC设计信息以及气候、天空和太阳/阴影信息[4]。然而,在模拟时,用户可能无法获得这些详细的数据。未能提供准确的输入可能导致预测性能较差。

Data-driven building energy consumption prediction modelling, on the other hand, does not perform such energy analysis or require such detailed data about the simulated building, and instead learns from historical/available data for prediction. Data-driven energy consumption prediction has gained a lot of research attention in recent years [5], despite its possible limitations (as discussed in Section 8). In response, a number of review studies on the analysis of existing data-driven approaches has been published. The reviews mostly focused on the machine learning methods/algorithms used in previous research efforts. Despite the importance of these efforts, there is still a lack of review studies that analyze existing data-driven approaches from a more multivariate perspective, including data aspects such as what data types and sizes were used and what features were selected for learning. Such a review would help reveal existing research gaps in the field of data-driven building energy consumption prediction and point towards future research directions.

另一方面,数据驱动的建筑能耗预测模型不进行此类能源分析,也不需要有关模拟建筑的详细数据,而是从历史/可用数据中学习进行预测。近年来,数据驱动的能源消耗预测得到了大量的研究关注[5],尽管其可能存在局限性(如第8节所述)。作为回应,已经发表了一些关于分析现有数据驱动方法的回顾性研究。综述主要集中在机器学习的方法和算法上。尽管这些努力很重要,但仍然缺乏从更多元的角度分析现有数据驱动方法的回顾性研究,包括数据方面,例如使用了哪些数据类型和大小以及选择了哪些特征进行学习。这样的回顾将有助于揭示数据驱动的建筑能耗预测领域现有的研究空白,并指明未来的研究方向。

To address this gap, this paper offers a review of data-driven building energy consumption prediction studies that utilized machine learning algorithms, including support vector machines (SVM), artificial neural networks (ANN), decision trees, and other statistical algorithms. The paper focuses on reviewing the types of buildings, temporal granularities, types of energy consumption predicted, types of data, types of features, and data sizes in the existing studies; and provides a discussion of the review results and future research directions. The paper is organized as follows. Section 2 provides a concise overview of existing review studies on data-driven building energy consumption prediction and identifies the gaps in this area. Section 3 gives a brief introduction on the background of data-driven approaches. Section 4 defines the methodology used in this review study. Section 5 reviews previous studies in terms of the scopes of prediction, the data properties and the data preprocessing methods used, the machine learning algorithms utilized for prediction, and the performance measures used for evaluation. Section 6 discusses the previous studies in terms of the temporal granularities of prediction, the types of buildings, and the types of energy consumption predicted. Finally, Section 7 discusses future research directions, Section 8 discusses the limitations of data-driven energy consumption prediction, and Section 9 summarizes the conclusions.

为了解决这一差距,本文综述了利用机器学习算法(包括支持向量机(SVM)、人工神经网络(ANN)、决策树和其他统计算法的数据驱动建筑能耗预测研究。本文重点回顾了现有研究中建筑物类型、时间粒度、能耗预测类型、数据类型、特征类型和数据大小,并对综述结果和未来研究方向进行了讨论。论文组织如下。第2节简要概述了数据驱动的建筑能耗预测的现有回顾研究,并指出了该领域的差距。第三部分简要介绍了数据驱动方法产生的背景。第4节定义了本综述研究中使用的方法。第五节回顾了以往的研究,包括预测的范围、数据的性质和使用的数据预处理方法、用于预测的机器学习算法以及用于评估的性能度量。第6节讨论了先前的研究,包括预测的时间粒度、建筑物类型和预测的能源消耗类型。最后,第7节讨论了未来的研究方向,第8节讨论了数据驱动能耗预测的局限性,第9节总结了结论。


2. Existing review studies on data-driven building energy consumption prediction

Data-driven building energy consumption prediction gained a lot of attention in recent years. In response, a number of review studies has focused on the analysis of existing data-driven efforts. For example, Zhao and Magoulès [4] classified building energy consumption prediction methods as elaborate engineering methods, simplified engineering methods, statistical methods, ANN-based methods, SVM-based methods, and grey models; and conducted some comparative analysis in terms of model complexity, ease of use, running speed, inputs needed, and accuracy. Ahmad et al. [2] focused on the review of ANN-based, SVM-based, and hybrid methods and discussed the principles, advantages, and disadvantages of these methods. Fumo [6] summarized the classification of building energy consumption prediction methods proposed by various studies and placed a special emphasis on the review of model calibration and verification and weather data used for modelling. Li and Wen [7] conducted an inclusive review; they reviewed state-of-the-art studies not only on building energy modelling and prediction but also on building critical component modelling (e.g., photovoltaic power generation modelling), building energy modelling for demand response (e.g., weather condition forecasting), agent-based building energy modelling, and system identification for building energy modelling. Li et al. [8] reviewed the methods for building energy benchmarking and proposed a flowchart that intends to assist users in choosing the proper prediction method. Chalal et al. [9] focused on both building scale and urban scale energy consumption prediction and further classified and discussed the available methods within each scale. Wang and Srinivasan [10] reviewed and compared the principles, applications, advantages, and disadvantages of single AI-based methods (e.g., ANN and SVM) and ensemble methods.

数据驱动的建筑能耗预测近年来受到了广泛的关注。作为回应,一些审查研究侧重于分析现有的数据驱动工作。例如,Zhao和Magoulès[4]将建筑能耗预测方法分为精细工程方法、简化工程方法、统计方法、神经网络方法、支持向量机方法和灰色模型,并从模型复杂度、易用性、运行速度、所需输入等方面进行了对比分析,以及准确性。艾哈迈德等人[2] 重点介绍了基于人工神经网络、基于支持向量机的方法和混合方法,并讨论了这些方法的原理、优缺点。Fumo[6]总结了各种研究提出的建筑能耗预测方法的分类,并特别强调了模型校准和验证以及用于建模的天气数据的审查。Li和Wen[7]进行了全面的回顾;他们不仅回顾了建筑能源建模和预测方面的最新研究,还回顾了建筑关键部件建模(如光伏发电建模)、需求响应的建筑能源建模(如天气状况预测)、基于主体的建筑能源建筑能量模型的建模和系统辨识。Li等人[8] 回顾了建筑能源基准的方法,并提出了一个流程图,旨在帮助用户选择适当的预测方法。Chalal等人[9] 重点研究了建筑规模和城市规模能耗预测,并对各规模内的可用方法进行了分类和讨论。Wang和Srinivasan[10]回顾并比较了基于人工智能的单一方法(如ANN和SVM)和集成方法的原理、应用、优缺点。

The majority of these studies provided a comprehensive review on energy consumption prediction research efforts with a particular focus on the machine learning methods/algorithms used in these research studies. Despite the importance of these review efforts, there is still a lack of review studies that cover building energy consumption prediction research in terms of the scopes of prediction (e.g., heating energy consumption), the types of data used (e.g., real data, simulated data), the types of features used for prediction (e.g., outdoor weather conditions, indoor environmental conditions), the sizes of the data (e.g., duration of data collection, number of data instances), and the data preprocessing methods utilized (e.g., data reduction). Such a review is essential for identifying the research gaps and highlighting the future research directions in the field of data-driven building energy consumption prediction.

这些研究中的大多数对能源消耗预测研究工作进行了全面的回顾,特别关注这些研究中使用的机器学习方法/算法。尽管这些审查工作很重要,但仍缺乏从预测范围(例如供暖能耗)、所用数据类型(如真实数据、模拟数据)、用于预测的特征类型(例如,建筑能耗,室外天气条件、室内环境条件)方面涵盖建筑能耗预测研究的审查研究。数据的大小(例如,数据收集的持续时间、数据实例的数量)以及所使用的数据预处理方法(例如,数据简化)。这样的回顾对于发现数据驱动建筑能耗预测领域的研究空白和突出未来的研究方向至关重要。


3. Background

Developing a data-driven model, typically, consists of four primary steps: data collection, data preprocessing, model training, and model testing. In the field of building energy consumption prediction, data collection involves collecting historical/available data for model training such as outdoor weather condition and electricity consumption data. Data preprocessing may include data cleaning, data integration, data transformation, and/or data reduction. Model training is the training of the model using a training dataset. Model testing aims to evaluate the model using standard evaluation measures.

开发数据驱动模型通常包括四个主要步骤:数据收集、数据预处理、模型训练和模型测试。在建筑能耗预测领域,数据收集包括收集用于模型训练的历史/可用数据,如室外天气状况和用电量数据。数据预处理可以包括数据清理、数据集成、数据转换和/或数据缩减。模型训练是使用训练数据集对模型进行训练。模型测试的目的是使用标准的评估方法来评估模型。

SVM, ANN, decision trees, and other statistical algorithms are the most commonly-used supervised machine learning algorithms for model training. SVM is a kernel-based machine learning algorithm, which can be used for both regression and classification [11]. The algorithm is good at solving non-linear problems even with a relatively small amount of training data [4]. SVM solves a non-linear problem through transforming the non-linearity between features xi(e.g., drybulb temperature and global solar radiation) and target yi(e.g., cooling energy consumption) using linear mapping in two steps. First, it projects the non-linear problem into a high-dimensional space and determines the function f(x) that fits best in the high-dimensional space. Second, it applies a kernel function to make the complex nonlinear map a linear problem. For further details on the prediction principle using SVM, the readers are referred to [9]. SVM is one of the most robust and accurate algorithms and has been listed in the top-ten most influential data mining algorithms in the research community by the IEEE International Conference on Data Mining [11]. It was found to outperform other machine learning algorithms in numerous applications. In order to increase the computational efficiency of SVM, least squares SVM (LS-SVM) (e.g., [12]) and parallel SVM (e.g., [13]) were also implemented in the field of building energy consumption prediction.

SVM、ANN、决策树等统计算法是模型训练中最常用的有监督机器学习算法。支持向量机是一种基于核的机器学习算法,可用于回归和分类[11]。该算法即使在训练数据量相对较小的情况下也能很好地解决非线性问题[4]。SVM通过在两个步骤中利用线性映射变换特征xi之间的非线性(例如,干球温度和全球太阳辐射)和目标yi(例如,冷却能耗)来解决非线性问题。首先,将非线性问题投影到一个高维空间,并确定最适合高维空间的函数f(x)。其次,利用核函数将复杂的非线性映射问题转化为线性问题。有关使用支持向量机的预测原理的更多详细信息,请参阅[9]。支持向量机(SVM)是国际上最具影响力的数据挖掘算法之一。在许多应用中,它的性能优于其他机器学习算法为了提高支持向量机的计算效率,将最小二乘支持向量机(LS-SVM)(如[12])和并行支持向量机(如[13])应用于建筑能耗预测领域

ANN is a non-linear computational model, inspired by the human brain. A typical ANN includes three sequential layers: the input layer, the hidden layer, and the output layer. Each layer has a number of interconnected neurons, and each neuron has an activation function. Three types of parameters are typically used to define ANNs: the interconnection pattern between the neurons of the different layers, the learning process of updating the weights of the interconnections, and the activation function that converts a neuron’s weighted input to its output activation [14]. In ANN, each feature (e.g., dry-bulb temperature) is multiplied by its corresponding neuron weight and summed up with the bias. The activation function is then applied to determine the output (e.g., cooling energy consumption). For further details on the prediction principle using ANN, the readers are referred to [9]. ANN is one of the most popular algorithms used in building energy consumption prediction [2]. Examples of ANNs include the back propagation neural networks (BPNN), radial basis function neural networks (RBFNN), general regression neural networks (GRNN), feed forward neural network (FFNN), and adaptive network-based fuzzy inference system (ANFIS). Other methods that can be used in conjunction with ANN include the hierarchical mixture of experts (HME), fuzzy c-means (FCC), and multilayer perceptron (MLP).

人工神经网络是一种非线性计算模型,受人脑的启发。一个典型的人工神经网络包括三个连续层:输入层、隐藏层和输出层。每一层都有许多相互连接的神经元,每一个神经元都有一个激活功能。三种类型的参数通常用于定义ann:不同层神经元之间的互连模式,更新互连权值的学习过程,以及将神经元加权输入转换为输出激活的激活函数[14]。在人工神经网络中,每个特征(如干球温度)乘以其相应的神经元权重,并用偏差求和。然后应用激活函数来确定输出(例如,冷却能耗)。关于使用人工神经网络预测原理的更多细节,请参阅[9]。人工神经网络是建筑能耗预测中最常用的算法之一[2]。神经网络的例子包括反向传播神经网络(BPNN)、径向基函数神经网络(RBFNN)、广义回归神经网络(GRNN)、前馈神经网络(FFNN)和基于自适应网络的模糊推理系统(ANFIS)。其他可以与人工神经网络结合使用的方法包括层次混合专家(HME)、模糊c-均值(FCC)和多层感知器(MLP)。

Decision tree algorithms use a tree to map instances into predictions. In a decision tree model, each non-leaf node represents one feature, each branch of the tree represents a different value for a feature, and each leaf node represents a class of prediction. Decision trees is a flexible algorithm that could grow with an increased amount of training data [15]. The classification and regression trees (CART), chi-squared automatic interaction detector (CHAID), random forest (RF), and boosting trees (BT) are the most widely-used decision tree methods in the area of building energy consumption prediction.

决策树算法使用树将实例映射到预测中。在决策树模型中,每个非叶节点代表一个特征,树的每个分支代表一个特征的不同值,每个叶节点代表一个预测类。决策树是一种灵活的算法,可以随着训练数据量的增加而增长[15]。分类回归树(CART)、卡方自动交互检测器(CHAID)、随机森林(RF)和boosting trees(BT)是建筑能耗预测领域应用最广泛的决策树方法

Other statistical algorithms include multiple linear regression (MLR), general linear regression (GLR), ordinary least squares regression (OLS), autoregressive (AR), autoregressive integrated moving average (ARIMA), Bayesian regression, polynomial regression (poly), exponential regression, multivariate adaptive regression splines (MARS), case-based reasoning (CBR), and k-nearest neighbors (kNN).

其他统计算法包括多元线性回归(MLR)、一般线性回归(GLR)、普通最小二乘回归(OLS)、自回归(AR)、自回归综合移动平均(ARIMA)、贝叶斯回归、多项式回归(poly)、指数回归、多元自适应回归样条(MARS),基于实例推理(CBR)和k-最近邻(kNN)

Algorithms used for developing energy consumption prediction models have advantages and disadvantages. For example, ANN and SVM require many parameters and might become computationally expensive, but their prediction accuracy is, in many cases, better than decision trees and statistical algorithms. Decision trees and other statistical algorithms, on the other hand, are generally easy to use and computationally inexpensive, but their performance is usually fair [4].

用于开发能耗预测模型的算法各有优缺点。例如,人工神经网络和支持向量机需要很多参数,可能会增加计算成本,但在许多情况下,它们的预测精度比决策树和统计算法要好。另一方面,决策树和其他统计算法通常易于使用且计算成本低廉,但它们的性能通常是公平的[4]。


4. Methodology

The research methodology was composed of five primary steps:

  • Conducting a keyword-based search: A keyword-based search of research articles and abstracts was conducted using Google Scholar. Examples of the keywords that were used are: building energy estimation, building energy use prediction, building energy consumption forecasting, building energy modelling. Google Scholar was selected, because it can rank articles based on some factors such as number of citations, authors, and publisher.

  • 进行基于关键字的搜索:使用Google Scholar对研究文章和摘要进行基于关键字的搜索。使用的关键词有:建筑能源估算、建筑能源使用预测、建筑能耗预测、建筑能源建模。谷歌学者之所以被选中,是因为它可以根据引用次数、作者和出版商等因素对文章进行排名。

  • Screening the retrieved articles: The articles were screened for relevance using the following criteria: (1) the approach must be data-driven; and (2) the purpose must be to predict building energy consumption.

  • 筛选检索到的文章:使用以下标准筛选文章的相关性:(1)方法必须是数据驱动的;(2)目的必须是预测建筑能耗。

  • Identifying and screening additional articles: The articles that cited or were cited by an article that passed the screening test were further identified as additional candidate articles. These articles were further screened using the same two relevance criteria defined above.

  • 附加文章的识别和筛选:通过筛选测试的文章被引用或被引用的文章被进一步确定为附加候选文章。这些文章是进一步筛选使用相同的两个相关性标准以上定义。

  • Reviewing all relevant articles: All articles identified in steps 2 and 3 were analytically reviewed to define their purpose of prediction, scope of prediction, data properties and data preprocessing methods, machine learning algorithm(s), and performance.

  • 回顾所有相关文章:对步骤2和步骤3中确定的所有文章进行分析性审查,以确定其预测目的、预测范围、数据属性和数据预处理方法、机器学习算法和性能。

  • Analyzing the review results to identify gaps and future directions: The review results were analyzed to identify the research gaps in the field of data-driven building energy consumption and highlight future research directions.

  • 对评审结果进行分析,找出差距和未来发展方向:对评审结果进行分析,找出数据驱动建筑能耗领域的研究差距,突出未来的研究方向。


5. Review of existing data-driven energy consumption prediction models

5.1. Scope of prediction

The scope of the studies was classified in terms of type of building, temporal granularity, and type of energy consumption predicted. Two types of buildings (residential and non-residential), five types of temporal granularities (sub-hourly, hourly, daily, monthly, and yearly), and four types of energy consumption (heating, cooling, lighting, and overall energy consumption) were defined.

研究范围按建筑类型、时间粒度和预测的能源消耗类型分类。定义了两种类型的建筑(住宅和非住宅),五种时间粒度(次小时、每小时、每天、每月和每年),以及四种类型的能源消耗(供暖、制冷、照明和总能耗)

Existing models covered residential and/or non-residential buildings, with different temporal granularities and for different types of energy consumption. Fig. 1. shows the distribution of the reviewed models according to type of building, temporal granularity, and type of energy consumption. Only 19% of these models focused on residential buildings, with the remaining models focusing on non-residential buildings including commercial and educational buildings. The majority of these models, 57%, were developed for predicting hourly energy consumption, while 12%, 15%, 4%, and 12% of the models focused on sub-hourly, daily, monthly, and yearly consumption, respectively. Overall, 47% of the models focused on predicting overall energy consumption, with 31% and 20% focusing on cooling and heating energy consumption, respectively, and only 2% focusing on lighting energy consumption prediction. The scope of each reviewed model is summarized in Table 1, in terms of building type, temporal granularity, type of energy consumption, and purpose of prediction.

现有模型涵盖住宅和/或非住宅建筑,具有不同的时间粒度和不同类型的能源消耗。图1。显示了根据建筑类型、时间粒度和能耗类型审查的模型的分布。这些模型中只有19%集中在住宅建筑上,其余模型集中在包括商业和教育建筑在内的非住宅建筑上。这些模型中的大多数(57%)是为预测小时能耗而开发的,而12%、15%、4%和12%的模型分别侧重于次小时、每日、每月和每年的能耗。总体而言,47%的模型侧重于预测整体能耗,其中31%和20%的模型侧重于制冷和供热能耗的预测,只有2%的模型侧重于照明能耗预测。表1总结了每个被审查模型的范围,包括建筑类型、时间粒度、能耗类型和预测目的。

5.2. Data properties and data preprocessing

5.2.1. Types of data: real, simulated, or benchmark

Data were classified into three types: (1) real data, (2) simulated data, and (3) public benchmark data (e.g., datasets provided for energy consumption prediction competitions). Fig. 1. shows the distribution of the reviewed studies by type of data used for training and testing. The majority (67%) of these studies used real data to train and test their models, while 19% and 14% of the studies used simulated and public benchmark data, respectively. Table 1 shows the types of data used in the reviewed studies.

数据分为三类:(1)真实数据,(2)模拟数据,和(3)公共基准数据(如为能源消耗预测竞赛提供的数据集)。图1。按用于培训和测试的数据类型显示所审查研究的分布情况。这些研究中的大多数(67%)使用真实数据来训练和测试模型,而19%和14%的研究分别使用模拟和公共基准数据。表1显示了回顾性研究中使用的数据类型。

Real data cover data collected through smart energy meters, sensors, building management systems, and weather stations; in addition to utility bills, energy consumption surveys, and energy consumption statistics and reports [16]. Sensor-based approaches have several advantages and disadvantages. On one hand, sensor-based approaches provide actual indoor environmental condition data and energy consumption levels. On the other hand, installing sensors brings an additional cost and effort not only to install the required sensors, but also to test and ensure the quality of the data collected [12]. Otherwise, sensor data may include noise, missing values, and/or outliers, which would affect the performance of the prediction models adversely.

真实数据包括通过智能电能表、传感器、建筑管理系统和气象站收集的数据;此外还有水电费账单、能源消耗调查、能源消耗统计和报告[16]。基于传感器的方法有几个优点和缺点。一方面,基于传感器的方法提供实际的室内环境条件数据和能耗水平。另一方面,安装传感器会带来额外的成本和精力,不仅要安装所需的传感器,还要测试并确保所收集数据的质量[12]。否则,传感器数据可能包括噪声、缺失值和/或异常值,这将对预测模型的性能产生不利影响。

Simulation-based studies, on the other hand, model an existing or unexisting building in a building energy simulation software tool – such as EnergyPlus, DeST, DOE2, or Ecotect – and obtain the needed data through running the simulations. By nature of modelling, a model cannot fully represent its prototype or exactly behave same as it does. For example, Li et al. [17] showed that current building energy software tools are, in some cases, limited in evaluating the performance of energy conservation measures. Simulation data are, however, useful in cases where real data are limited (e.g., when instrumenting a building is difficult due to technical difficulties and/or economic reasons).

另一方面,基于仿真的研究在建筑能耗仿真软件工具(如EnergyPlus、DeST、DOE2或Ecotect)中对现有或不存在的建筑进行建模,并通过运行模拟获得所需的数据。根据建模的本质,一个模型不能完全代表它的原型,或者表现得和它完全一样。例如,Li等人。[17] 表明目前的建筑节能软件工具在某些情况下,在评估节能措施的性能方面是有限的。然而,在真实数据有限的情况下(例如,由于技术困难和/或经济原因,对建筑物进行仪器测量很困难时),模拟数据是有用的。

Other studies (e.g., [12,18,19]) utilized publicly-available benchmark datasets such as the ASHRAE’s Great Building Energy Predictor Shootout and EUNITE dataset. This type of datasets provides benchmark data that can be used to compare the performance of different models.

其他研究(例如,[12,18,19])使用了公开可用的基准数据集,如ASHRAE的伟大建筑能源预测器射击和EUNITE数据集。这种类型的数据集提供了基准数据,可用于比较不同模型的性能。

5.2.2. Types of features

A machine learning model predicts energy consumption based on a set of features. These features can be related to outdoor weather conditions, indoor environmental conditions, building characteristics, time, occupancy and occupant energy use behavior, and/or historical energy consumption. Outdoor weather condition features include drybulb temperature, dew point temperature, relative humidity, global solar radiation, wind speed, wind direction, degree of cloudiness, pressure, rainfall amount, and evaporation. Indoor environmental condition features include room temperature, room relative humidity, and indoor lighting level. Building characteristic features include relative compactness, surface area, wall area, roof area, overall height, orientation, glazing area, glazing area distribution, mean heat transfer coefficient of building walls, mean thermal inert index of building walls, roof heat transfer coefficient, building size coefficient, absorption coefficient for solar radiation of exterior walls, eastern window-wall ratio (WWR), western WWR, southern WWR, northern WWR, mean WWR, shading coefficient (SC) of eastern window, SC of western window, SC of southern window, SC of northern window, and integrated SC. Time features include the type of day (e.g., weekday, weekend, holiday) and the type of hour (e.g., daytime, nighttime). Occupant energy use behavior and occupancy features include building use schedule, heat gain through lights and people, water temperature, and number of occupants.

机器学习模型根据一组特征预测能耗。这些特征可能与室外天气条件、室内环境条件、建筑特征、时间、占用和居住者能源使用行为和/或历史能源消耗有关。室外气候条件特征包括干球温度、露点温度、相对湿度、全球太阳辐射、风速、风向、云量、气压、降雨量和蒸发量。室内环境条件特征包括室内温度、室内相对湿度和室内照明水平。建筑特征包括相对密实度、表面积、墙体面积、屋面面积、总高度、朝向、采光面积、玻璃面积分布、建筑墙体平均传热系数、建筑墙体平均热惰性指数、屋面传热系数、建筑面积系数、吸热系数外墙太阳辐射、东窗墙比(WWR)、西窗、南窗、北窗、平均WWR、东窗遮阳系数(SC)、西窗SC、南窗SC、北窗SC、综合SC。时间特征包括日类型(如工作日、周末、假日)和小时类型(例如,白天,晚上)。居住者的能源使用行为和居住特征包括建筑使用计划、通过灯光和人获得的热量、水温和居住人数。

For all these types of features, some studies used data considering various past time steps (e.g., past hour) in history. For example, Li et al.

对于所有这些类型的特征,一些研究使用了考虑历史中各种过去时间步长(例如,过去一小时)的数据。例如,Li等人。
【Paper】A review of data-driven building energy consumption prediction studies_第1张图片
[20] used current outdoor dry-bulb temperature, outdoor dry-bulb temperature of an hour ago, outdoor dry-bulb temperature of two hours ago, current relative humidity, current solar radiation, and solar radiation of an hour ago to predict building cooling load. Jain et al. [16] used electricity consumption of the previous two time steps, current temperature, current solar flux, a denote for weekend/holiday or weekday, sine of current hour, and cosine of current hour to predict the electricity consumption of a multi-family residential building. Table 1 summarizes the features used in the reviewed models.

[20] 利用当前室外干球温度、1小时前室外干球温度、2小时前室外干球温度、当前相对湿度、当前太阳辐射、1小时前太阳辐射预测建筑冷负荷。Jain等人[16] 利用前两个时间步长的用电量、当前温度、当前太阳通量、周末/假日或工作日的表示、当前小时的正弦和当前小时的余弦来预测多户住宅的用电量。表1总结了评审模型中使用的特性。

5.2.3. Data sizes

The sizes of datasets varied from 2-week (e.g., [21]) to 4-year energy consumption data (e.g. [22,23]). A small dataset may not be able to capture a representative sample of data, whereas a large dataset requires a lot of computational effort to process. The majority (56%) of the reviewed studies utilized one-month to one-year long datasets; 9% utilized datasets shorter than one-month; and 31% utilized datasets longer than one-year. Table 1 shows the dataset sizes used in the reviewed studies.

数据集的大小从2周(例如[21])到4年能源消耗数据(例如[22,23])。小数据集可能无法捕获具有代表性的数据样本,而大型数据集则需要大量的计算工作来处理。大多数(56%)被审查的研究使用了一个月到一年的长数据集;9%使用了少于一个月的数据集;31%使用了超过一年的数据集。表1显示了回顾性研究中使用的数据集大小。

5.2.4. Data preprocessing

Data preprocessing is essential for any data-driven approach, because any incorrect or inconsistent data can cause errors in the analysis [24]. Data preprocessing may include data cleaning, data integration, data transformation, and/or data reduction. Data cleaning is the process of detecting and correcting (completing, modifying, replacing, and/or removing) the incomplete, incorrect, inaccurate, irrelevant, and/or noisy parts of the data. For example, data collected through sensors are usually noisy and often incomplete [25]. Data integration is the process of combining multiple data from different sources. For example, outdoor weather condition data and hourly electricity consumption data come from different sources, but are combined in a single dataset for training and testing. Data transformation is the process of transforming the data into the format that is required by the learning algorithm. Data transformation may include normalization, smoothing, aggregation/disaggregation, and/or generalization of the data. Data reduction is the process of reducing the dimensionality of the dataset, which is not only computationally more efficient but may also enhance the performance of the machine learning algorithm by removing non-discriminative features. There are different techniques for data reduction including principal component analysis (PCA) and kernel PCA (KPCA). For example, Xuemei et al. [26] applied PCA and KPCA for reducing the dimensionality of the data and compared the performances of SVM with PCA, SVM with KPCA, and SVM without any data reduction techniques. They also applied C-mean clustering to ensure that the training samples were chosen based on the similarity degree of the input samples and compared the performances of fuzzy C-means (FCM) fuzzy SVM, FCM-SVM, and SVM without any clustering [27].

数据预处理对于任何数据驱动方法都是必不可少的,因为任何不正确或不一致的数据都可能导致分析中的错误[24]。数据预处理可以包括数据清理、数据集成、数据转换和/或数据缩减。数据清理是检测和纠正(完成、修改、替换和/或删除)数据中不完整、不正确、不准确、不相关和/或有噪音的部分的过程。例如,通过传感器收集的数据通常是有噪声的,而且常常是不完整的[25]。数据集成是组合来自不同来源的多个数据的过程。例如,室外天气状况数据和每小时用电量数据来自不同的来源,但将它们组合在一个数据集中进行培训和测试。数据转换是将数据转换为学习算法所需格式的过程。数据转换可以包括数据的规范化、平滑化、聚合/分解和/或泛化。数据约简是对数据集进行降维的过程,它不仅在计算上更有效,而且可以通过去除非歧视性特征来提高机器学习算法的性能。有不同的数据简化技术,包括主成分分析(PCA)和核PCA(KPCA)。例如,Xuemei 等人[26]应用PCA和KPCA对数据进行降维处理,比较了SVM与PCA、SVM与KPCA、SVM与KPCA以及不使用任何数据缩减技术的SVM的性能。他们还应用C均值聚类来确保训练样本是根据输入样本的相似度来选择的,并比较了模糊C均值(FCM)模糊支持向量机、FCM-SVM和不进行任何聚类的支持向量机的性能[27]。

5.3. Machine learning algorithms

A machine learning algorithm is needed to train an energy consumption prediction model. Previous studies in data-driven building energy consumption prediction have utilized SVM, ANN, decision trees, and/or other statistical algorithms. Fig. 1. shows the distribution of the studies by type of machine learning algorithm. Overall, 47% and 25% of the studies utilized ANN and SVM, respectively, to train their models. Only 4% of the studies utilized decision trees. On the other hand, 24% of the studies utilized other statistical algorithms such as MLR, OLS, and ARIMA.

需要一种机器学习算法来训练能耗预测模型。以前在数据驱动的建筑能耗预测研究中已经使用了支持向量机、神经网络、决策树和/或其他统计算法。图1显示了按机器学习算法类型划分的研究分布。总的来说,47%和25%的研究分别使用人工神经网络和支持向量机来训练他们的模型。只有4%的研究使用决策树。另一方面,24%的研究使用了其他统计算法,如MLR、OLS和ARIMA。

Some studies also compared the effectiveness of different algorithms in energy consumption prediction. For example, Li et al. [20] compared SVM and BPNN; Borges et al. [28] compared SVM and AR; Xuemei et al. [29] compared LS-SVM and BPNN; Liu and Chen [21] compared SVM and ANN; Penya et al. [30] compared poly, exponential, mixed, AR, ANN, SVM, and Bayesian Network; Platon et al. [31] compared ANN and CBR; Jain et al. [32] compared SVM and MLR; Hou et al. [33] compared ARIMA and ANN; Penya et al. [34] compared AR, ARIMA, ANN, and Bayesian Network; Fan et al. [35] compared MLR, ARIMA, SVM, RF, MLP, BT, MARS, and kNN; Chou and Bui [36] compared ANN, SVM, CART, CHAID, and GLR; Edwards et al. [12] compared MLR, FFNN, SVM, LS-SVM, HME-FFNN, and FCM-FFNN; Li et al. [37] and Li et al. [38] compared SVM, BPNN, RBFNN, and GRNN; Dagnely et al. [22] compared OLS and SVM; Massana et al. [39] compared MLR, MLP, and SVM; and Fernandez et al. [40] compared AR, poly, ANN, and SVM.

一些研究还比较了不同算法在能耗预测中的有效性。例如,Li等人[20] 比较了支持向量机和bp神经网络;博尔赫斯等[28]比较支持向量机和AR;Xuemei等人[29]比较了LS-SVM和BPNN;Liu和Chen[21]比较了SVM和ANN;Penya等人[30]比较了poly、index、mixed、AR、ANN、SVM和Bayesian网络;Platon等人[31]比较了ANN和CBR;Jain等人[32]比较了SVM和MLR;Hou等人[33]比较了ARIMA和ANN;Penya等人[34]比较了AR、ARIMA、ANN和贝叶斯网络;Fan等人[35]比较了MLR、ARIMA、SVM、RF、MLP、BT、MARS和kNN;Chou和Bui[36]比较了ANN、SVM、CART、CHAID和GLR;Edwards等人[12] 比较了MLR、FFNN、SVM、LS-SVM、HME-FFNN和FCM-FFNN;Li等[37]和Li等人[38]比较了SVM、BPNN、RBFNN和GRNN;Dagnely等人[22]比较了OLS和SVM;Massana等人[39]比较了MLR、MLP和SVM;以及Fernandez等人[40]比较了AR、poly、ANN和SVM。

【Paper】A review of data-driven building energy consumption prediction studies_第2张图片
【Paper】A review of data-driven building energy consumption prediction studies_第3张图片
【Paper】A review of data-driven building energy consumption prediction studies_第4张图片
【Paper】A review of data-driven building energy consumption prediction studies_第5张图片
【Paper】A review of data-driven building energy consumption prediction studies_第6张图片

5.4. Performance evaluation

Model testing is the evaluation of the prediction model using some standard evaluation measures. The most commonly-used evaluation measures of energy consumption prediction models are the coefficient of variation (CV), mean absolute percentage error (MAPE), and root mean square error (RMSE). These measures can be calculated using Eqs. (1 to 3). Overall, 41%, 29%, and 16% of the reviewed studies utilized CV, MAPE, and RMSE, respectively, to evaluate their models. Other measures used for evaluating energy consumption prediction include the mean absolute error (MAE), mean bias error (MBE), mean squared error (MSE), R-squared (R2), and error rate (δ). These measures can be calculated using Eqs. (4 to 8). CV is the most commonly-used evaluation measure probably for two reasons. First, it is one of the performance evaluation measures recommended by ASHRAE for evaluating energy consumption prediction models. Second, it normalizes the prediction error by the average energy consumption and provides a unitless measure that is more convenient for comparison purposes.

模型测试是使用一些标准的评估方法对预测模型进行评估。能耗预测模型最常用的评价指标是变异系数(CV)、平均绝对百分比误差(MAPE)和均方根误差(RMSE)。这些度量可以用等式来计算。(1到3)。总的来说,41%、29%和16%的被审查研究分别使用了CV、MAPE和RMSE来评估他们的模型。用于评估能耗预测的其他指标包括平均绝对误差(MAE)、平均偏差误差(MBE)、均方误差(MSE)、R平方(R2)和误差率(δ)。这些度量可以用等式来计算。(4到8)。CV是最常用的评价指标,可能有两个原因。首先,它是ASHRAE推荐的用于评价能耗预测模型的性能评价指标之一。其次,它用平均能耗对预测误差进行归一化处理,并提供了一种更便于比较的无量纲度量。
【Paper】A review of data-driven building energy consumption prediction studies_第7张图片


6. Discussion

6.1. Temporal granularities

Both short-term (e.g., sub-hourly, hourly, or daily) and long-term (e.g., yearly) energy consumption prediction are essential for building and grid design and operation. For example, “HVAC operations including adjusting the starting time of cooling to meet start-up loads, minimizing or limiting the electric on-peak demand, optimizing the costs and energy utilization in cool storage systems, and related energy and cost needs in other HVAC systems” all benefit from short-term energy consumption prediction [26]. Short-term energy consumption prediction models are also utilized for maintaining economic and secure operation of power grids and for providing energy consumption data to building occupants to better negotiate energy prices with energy retailers [40]. Among the reviewed literature, 84% of the studies focused on short-term energy consumption prediction because of its direct relation to the day-to-day operations of buildings [35].

短期(例如,每小时、每小时或每天)和长期(如每年)的能耗预测对于建筑和电网的设计和运行都是至关重要的。例如,“暖通空调运行,包括调整制冷开始时间以满足启动负荷,最小化或限制用电高峰需求,优化蓄冷系统的成本和能源利用率,以及其他暖通空调系统的相关能源和成本需求”都得益于短期能耗预测[26]。短期能源消耗预测模型还可用于维持电网的经济和安全运行,并向建筑住户提供能耗数据,以便更好地与能源零售商协商能源价格[40]。在回顾的文献中,84%的研究集中在短期能源消耗预测上,因为它直接关系到建筑物的日常运行[35]。

Only 12% of the studies focused on long-term (yearly) energy consumption prediction. This might be caused by several reasons. First, to achieve good performance, long-term energy consumption prediction requires a relatively higher amount of data that covers a long time span [79]. For example, prediction errors of annual energy consumption prediction models, which were developed based on 1-day, 1-week, and 3-month measurements, were 100%, 30%, and 6%, respectively [80]. Second, nonlinearity in long-term data is usually more prominent compared to short-term data [81]. Third, uncertainties in long-term energy consumption prediction are usually higher because of the many changes that may occur in the supply and demand over a long time span. Long-term energy consumption prediction, thus, requires specific long-term prediction models due to the non-homogeneity and significant changes that may occur on the long-run [82]. Despite their challenges, long-term energy consumption prediction models are essential; they are required when studying decisions of long-term implications such as capacity expansion, energy supply strategy, and capital investment [83].

只有12%的研究集中在长期(每年)的能源消耗预测上。这可能是由几个原因造成的。首先,为了获得良好的性能,长期的能源消耗预测需要覆盖较长时间跨度的相对较高的数据量[79]。例如,根据1天、1周和3个月的测量数据建立的年度能源消耗预测模型的预测误差分别为100%、30%和6%[80]。其次,与短期数据相比,长期数据的非线性通常更为突出[81]。第三,长期能源消费预测中的不确定性通常较高,因为在很长一段时间内,供需可能会发生许多变化。因此,长期能源消耗预测需要特定的长期预测模型,因为从长期来看可能会发生不均匀性和重大变化[82]。尽管存在挑战,但长期能源消耗预测模型是必不可少的;在研究产能扩张、能源供应战略和资本投资等长期影响的决策时,需要这些模型[83]。

6.2. Building types

About 81% of the reviewed research efforts focused on developing energy consumption prediction models for commercial and/or educational buildings, with only 19% focusing on residential buildings. The relative lack of studies on residential buildings could be due to a number of reasons. First, the lack of data – specifically sensor-based data – could be a main reason. The majority, 73%, of non-residential building energy consumption prediction models rely on sensor data for algorithm training. Such data are much harder to obtain for residential buildings because the majority of buildings are not sufficiently metered in a way that allows for sensing at high granularity [10]. Another reason could be the complexity of predicting energy consumption in residential contexts because of the relatively higher variability of occupant behavior compared to the commercial context [16]. Occupant behavior is the greatest uncertainty in building energy consumption prediction [84]; ignoring, misunderstanding, and/or underestimating the role of occupant behavior in affecting energy consumption is one of the main causes for the deviations between the predicted and the actual consumption levels [85].

大约81%的研究工作集中在开发商业和/或教育建筑的能源消耗预测模型上,只有19%的研究集中在住宅建筑上。对住宅建筑的研究相对缺乏可能是由于许多原因造成的。首先,缺乏数据——特别是基于传感器的数据——可能是一个主要原因。非住宅建筑能耗预测模型大多(73%)依赖传感器数据进行算法训练。对于住宅建筑来说,这类数据更难获得,因为大多数建筑的计量方式不足以实现高粒度的传感[10]。另一个原因可能是住宅环境中预测能源消耗的复杂性,因为与商业环境相比,居住者行为的可变性相对较高[16]。居住者行为是建筑能耗预测中最大的不确定性[84];忽视、误解和/或低估居住者行为在影响能耗中的作用是导致预测和实际能耗水平偏差的主要原因之一[85]。

Despite their challenges, residential building energy consumption predictions are needed because of the high energy consumption share of this sector and the potential high gain that can be achieved if successful energy reducing strategies are implemented. Residential buildings represent 21% of the total energy consumption in the US, which is greater than the share of commercial buildings [86]. Further studies are, thus, needed on the residential sector. For example, experimental studies could be conducted to see if/how existing datadriven commercial building energy consumption prediction models could be extended to the residential context.

尽管存在挑战,但住宅建筑能耗预测仍然是必要的,因为这一部门的能源消耗份额很高,而且如果实施成功的节能战略,就可能实现高收益。在美国,住宅建筑占总能耗的21%,高于商业建筑的比重[86]。因此,需要对住宅部门进行进一步的研究。例如,可以进行实验研究,看看现有的数据驱动的商业建筑能耗预测模型是否/如何能够扩展到住宅环境中。

6.3. Energy consumption types

As discussed in Section 5, 46%, 31%, 20%, and 2% of the reviewed research efforts focused on predicting overall, cooling, heating, and lighting energy consumption, respectively. This shows a relative lack of studies on predicting lighting loads. This might be caused by the predominant impact of occupant behavior on lighting energy consumption. Lighting use is directly impacted by building occupancy and occupant behavior patterns [87]. For example, 500 lx is the recommended illuminance level for office buildings [88]. Theoretically, people who have access to natural lighting, when the outdoor illumination is sufficient, are expected to use artificial lightings less [89]. However, Yun et al. [87] showed that there are no statistically significant relationships between outdoor illuminance and artificial lighting use patterns.

正如第5节所讨论的,46%、31%、20%和2%的研究工作集中在预测总体、冷却、加热和照明能耗上。这表明在预测照明负荷方面的研究相对缺乏。这可能是由居住者行为对照明能耗的主要影响造成的。照明使用直接受到建筑物占用率和居住者行为模式的影响[87]。例如,500 lx是办公楼的建议照度水平[88]。从理论上讲,当室外照明充足时,可以使用自然光的人应该少用人工照明[89]。但是,Yun等人[87]表明,室外照度和人工照明使用模式之间没有统计学上的显著关系。

Despite these reasons, lighting energy consumption prediction is essential for building energy efficiency and for efficient supply-side management. Lighting represents almost 20% of the global electricity consumption [90]. Since it is a major heat source, lighting is not only a significant piece of building energy consumption by itself, but it also impacts the cooling energy demand [77]. In general, one-third of the cooling energy consumption can be saved if a good balance between natural light and solar heat can be achieved [57]. In addition, different building design features – in terms of building envelope, architectural features, and building materials – may have different impacts on lighting energy consumption [91]. Lighting energy consumption prediction models, thus, require more attention to better understand lighting energy consumption trends and conservation opportunities, the interaction between cooling load and lighting, and the impacts of various design features on consumption levels.

尽管有这些原因,照明能耗预测对于建筑节能和有效的供应侧管理至关重要。照明几乎占全球用电量的20%[90]。由于照明是一种主要的热源,照明本身不仅是建筑能耗的一个重要组成部分,而且还会影响到制冷能源的需求[77]。一般来说,如果能在自然光和太阳能热之间实现良好的平衡,可以节省三分之一的制冷能耗[57]。此外,不同的建筑设计特征——就建筑围护结构、建筑特征和建筑材料而言——可能会对照明能耗产生不同的影响[91]。因此,照明能耗预测模型需要更多的关注,以便更好地了解照明能耗趋势和节能机会、冷负荷与照明的相互作用以及各种设计特征对能耗水平的影响。


7. Future research directions

Many of the research challenges discussed above can be attributed to insufficiency of data (in terms of representativeness, size, etc.) and/ or complexity of occupant energy use behavior. Two future research directions are discussed in this regard.

上面讨论的许多研究挑战可归因于数据不足(在代表性、规模等方面)和/或居住者能源使用行为的复杂性。在此基础上,讨论了今后的两个研究方向。

One growing research direction is big energy data analytics. With the advent of smart meters and advanced metering infrastructure (AMI) larger sizes of monitoring data will become available. Making these data accessible to the research community may open unprecedented opportunities for researches to better understand building energy efficiency. Establishing a roadmap – including which buildings to monitor and in which locations to ensure data representativeness – could also help consolidate the many research efforts in the area of building energy efficiency, in order to eliminate duplication of efforts, provide more coverage of research questions and methods, and create a stronger research impact in the area of building energy consumption prediction. Future research directions in the area of big energy data analytics include building energy efficiency retrofitting, occupant behavior analysis, and smart energy management. For example, Mathew et al. [92] presented a vision for the potential use of big data analytics in energy efficiency retrofits. Zhou and Yang [93] proposed a vision for interdisciplinary research to analyze and understand individuals׳ energy consumption behavior using big energy data analytics. Zhou et al. [94] presented a comprehensive vision for big-data-driven smart energy management, including smart power generation, power transmission, power distribution and transformation, and demand side management.

一个日益增长的研究方向是大能源数据分析。随着智能电表和先进计量基础设施(AMI)的出现,更大尺寸的监测数据将变得可用。将这些数据提供给研究团体可能会为研究人员提供前所未有的机会,以便更好地了解建筑节能。制定一个路线图——包括监测哪些建筑以及在哪些位置确保数据的代表性——也有助于巩固建筑节能领域的众多研究成果,以消除重复工作,提供更多研究问题和方法的覆盖面,在建筑能耗预测领域创造了较强的研究影响力。未来大能源数据分析领域的研究方向包括建筑节能改造、居住者行为分析和智能能源管理。例如,Mathew等人。[92]提出了在节能改造中使用大数据分析的设想。Zhou和Yang[93]提出了一个跨学科研究的愿景,即利用大能源数据分析来分析和理解个人的能源消费行为。周等。[94]提出了大数据驱动的智能能源管理的全面构想,包括智能发电、输电、配电变电和需求侧管理。

Another important research direction is behavioral energy efficiency. More efforts to capture and study occupant energy use behavior are needed to better understand how energy use behavior affects energy consumption, what the energy wasting and saving behaviors are, and how much improved behaviors can save energy. For example, Turner and Hong [95] recently proposed a framework to capture occupant energy use behavior but did not test their framework in a real-world setting. Empirical studies for capturing occupant energy use behavior and studying their impact on energy consumption are thus needed. Three sub-challenges are, however, associated with energy use behavior studies. One is the cost and time associated with real data collection, as noted above. Another is the difficulty in conducting such studies on a representative sample of occupants; behavior is highly personal and variable across different types of people and more difficult to generalize than other types of energy data. The last is the potential privacy concerns associated with tracking the behavior of occupants.

另一个重要的研究方向是行为能量效率(behavioral energy efficiency)。为了更好地理解能源使用行为是如何影响能源消耗的,什么是能源浪费和节约行为,以及改进后的行为能在多大程度上节约能源,还需要更多的努力来捕捉和研究居住者的能源使用行为。例如,Turner和Hong[95]最近提出了一个框架来捕捉居住者的能源使用行为,但没有在真实环境中测试他们的框架。因此,需要进行实证研究,以捕捉居住者的能源使用行为,并研究其对能源消耗的影响。然而,三个次级挑战与能源使用行为研究有关。一是与实际数据收集相关的成本和时间,如上所述。另一个问题是在有代表性的居住者样本上进行这类研究的难度;行为具有高度的个人性,在不同类型的人群中变化很大,比其他类型的能源数据更难概括。最后一个是与跟踪居住者行为相关的潜在隐私问题。

In addition to these two primary directions, future research efforts could also explore the use of other types of machine learning algorithms in energy consumption prediction. For example, deep learning algorithms have been proven to outperform other machine learning algorithms in many other fields (e.g., image classification and multi-modal data analysis [96]) but have not been sufficiently studied in the field of building energy consumption prediction yet.

除了这两个主要方向,未来的研究工作还可以探索其他类型的机器学习算法在能源消耗预测中的应用。例如,深度学习算法在许多其他领域(如图像分类和多模态数据分析[96])优于其他机器学习算法,但在建筑能耗预测领域尚未得到充分研究。

As new data-driven models are developed, sharing more information about the development process and purpose, validation, and reusability of these models will be essential to avoid unnecessary duplication of research efforts. Some important model information (e.g., purpose of prediction) are sometimes not reported or not sufficiently described. Insufficient information offers limited guidance on whether certain models are applicable in a new context or not, which could inhibit the reusability of the models.

随着新的数据驱动模型的开发,共享更多关于这些模型的开发过程和目的、验证和可重用性的信息对于避免不必要的重复研究工作至关重要。一些重要的模型信息(例如预测的目的)有时没有被报告或没有得到充分的描述。信息不足对某些模型是否适用于新的环境提供了有限的指导,这可能会抑制模型的可重用性。


8. Limitations of data-driven energy consumption prediction and applicability considerations

Despite the importance of data-driven approaches, data-driven energy consumption prediction has two main limitations. First, datadriven prediction models may not perform well outside of their training range. Assumptions made by the learning algorithm have implications on the model’s ability to cope with new data outside of the training data and whether it would generalize well beyond the training range or not [7]. For example, a model that was trained by learning from a limited dataset (e.g., data collected from a small set of buildings) may not perform well outside of the training data (e.g., different types of buildings in terms of physical properties, operation strategies, weather conditions, occupant behavior, etc.). The dataset used for training must, thus, be representative of the range of application and contain sufficient variety. Collecting such sufficiently representative and wideranging data may be difficult, costly, and/or time consuming [9]. It is, therefore, crucial to consider the training range when determining the suitability of using a data-driven model in a specific application. For example, using a data-driven approach for exploratory analysis of what-if-scenarios outside of the training range may be unsuitable or may be used with caution.

尽管数据驱动方法很重要,但是数据驱动的能源消耗预测有两个主要的局限性。首先,数据驱动的预测模型在训练范围之外可能表现不佳。学习算法所做的假设会影响模型处理训练数据之外的新数据的能力,以及它是否会在训练范围之外进行推广[7]。例如,通过从有限的数据集(例如,从一小组建筑物收集的数据)中学习得到的模型,在训练数据之外(例如,不同类型的建筑物在物理特性、操作策略、天气条件、居住者行为等方面)可能表现不佳。因此,用于培训的数据集必须代表应用范围并包含足够的多样性。收集此类具有充分代表性且范围广泛的数据可能很困难、成本高昂和/或耗时[9]。因此,在确定在特定应用中使用数据驱动模型的适用性时,考虑培训范围是至关重要的。例如,使用数据驱动的方法对培训范围之外的假设情景进行探索性分析可能是不合适的,或者可以谨慎使用。

Second, data-driven prediction models are black-box models – their internals are not known. A black-box model may provide sufficient prediction accuracy, but may be limited in providing a detailed understanding of the different parameters and its behavior in terms of energy consumption [97].

第二,数据驱动的预测模型是黑盒模型,其内部结构未知。黑盒模型可以提供足够的预测精度,但在提供不同参数及其在能耗方面的行为的详细理解方面可能受到限制[97]。

Hybrid or grey-box modelling approaches, on the other hand, offer a combination of physical and data-driven prediction models, thereby leveraging the advantages and minimizing the disadvantages of both approaches. In grey-box models, some internal parameter and equations are physically interpretable. Grey-box models may also show better performance compared to black-box and white-box models. For example, Dong et al. [98] developed a hybrid model, which couples a data-driven model and a thermal network model, for predicting the total and non-AC energy consumptions of residential bu i l d i n g s a n d c o m p a r e d i t s p r e d i c t i o n performance to ANN-, SVM-, LSSVM-, Gaussian mixture model (GMM)-, Gaussian process regression (GPR)-based models. Similarly, Li et al. [99] developed a hybrid improved particle swarm optimization (iPSO)-ANN model for predicting building electricity consumption. The results of both studies showed that these hybrid models offered some performance improvement.

另一方面,混合或灰箱建模方法提供了物理预测模型和数据驱动预测模型的组合,从而充分利用了这两种方法的优点并将其缺点降至最低。在灰箱模型中,一些内部参数和方程是可以物理解释的。与黑盒和白盒模型相比,灰盒模型也可能显示出更好的性能。例如,Dong等人。[98]开发了一个混合模型,该模型将数据驱动模型和热网模型相结合,用于预测基于ANN-、SVM-、LSSVM-、Gaussian mixed model(GMM)-、Gaussian process returnation(GPR)的住宅楼宇和空调系统的总能耗和非空调能耗。同样,Li等人。[99]开发了一种混合改进粒子群优化(iPSO)-神经网络模型,用于预测建筑用电量。两项研究的结果显示,这些混合模型提供了一些性能改善。


9. Conclusions

This paper presented an overview of recent research efforts in the area of data-driven building energy consumption prediction. The scope of a set of models was reviewed in terms of building types (i.e., residential and non-residential), temporal granularities of prediction (i.e., sub-hourly, hourly, daily, monthly, and yearly), and types of energy consumption predicted (i.e., heating, cooling, lighting, and overall). The properties of the data used for training and testing these models were reviewed, including the types of data (i.e., real, simulation, and public benchmark data), the types of features (i.e., features related to outdoor weather conditions, indoor environmental conditions, building characteristics, time, occupant energy use behavior and occupancy, and historical energy consumption data), and the sizes of the data. The machine learning algorithms and the performance levels of these prediction models were also reviewed. The paper concluded with a discussion of the results, research gaps, and future research directions.

本文综述了数据驱动建筑能耗预测领域的最新研究成果。从建筑类型(即住宅和非住宅)、预测的时间粒度(即,每小时、每小时、每天、每月和每年)和预测的能源消耗类型(即供暖、制冷、照明和总体)审查了一组模型的范围。回顾了用于训练和测试这些模型的数据的特性,包括数据类型(即真实、模拟和公共基准数据)、特征类型(即与室外天气条件、室内环境条件、建筑特征、时间相关的特征),居住者能源使用行为和占用率,以及历史能源消耗数据),以及数据的大小。文中还介绍了机器学习算法和这些预测模型的性能水平。文章最后对研究结果、研究差距和未来的研究方向进行了讨论。

As seen from the review, data-driven building energy consumption prediction has been attracting significant research attention. Different models serve different purposes, have different scopes, were trained on different datasets, and use different features for prediction. All of the models have their own strengths and weaknesses and perform differently under different circumstances. There is no one-size-fits-all model that can be utilized under all conditions. Application-specific model development is, therefore, essential and requires case-by-case consideration of all the aspects analyzed in this paper, including data properties and machine learning algorithms.

从回顾中可以看出,数据驱动的建筑能耗预测一直是备受关注的研究热点。不同的模型有不同的用途,有不同的范围,在不同的数据集上训练,并使用不同的特征进行预测。所有的模型都有自己的优缺点,在不同的环境下表现不同。没有一刀切的模型可以在所有条件下使用。因此,特定于应用程序的模型开发是必不可少的,并且需要对本文分析的所有方面(包括数据属性和机器学习算法)逐个进行考虑。

The results of this review indicate some research areas that may require more attention: long-term building energy consumption prediction, residential building energy consumption prediction, and lighting building energy consumption prediction. The relative lack of research efforts in these areas could be attributed to insufficiency of data and/or complexity of occupant energy use behavior in these contexts. Sufficient data – in terms of types, sizes, temporal coverage, and representativeness – are essential. Capturing occupant behavior, and taking it into account, is also critical for improved energy consumption prediction. Future research directions that may lead to major improvements in these areas and beyond include big energy data analytics and behavioral energy efficiency.

本文的研究结果指出了一些值得关注的研究领域:长期建筑能耗预测、住宅建筑能耗预测和照明建筑能耗预测。这些领域的研究工作相对缺乏,可归因于数据不足和/或这些环境下居住者能源使用行为的复杂性。足够的数据——在类型、规模、时间覆盖率和代表性方面——是必不可少的。捕捉乘客行为并将其考虑在内,对于改进能耗预测也是至关重要的。未来的研究方向,可能导致这些领域和其他领域的重大改进,包括大能源数据分析和行为能源效率。


Acknowledgements

This publication was made possible by NPRP Grant #6–1370-2– 552 from the Qatar National Research Fund (a member of Qatar Foundation). The findings achieved herein are solely the responsibility of the authors.

你可能感兴趣的:(时间序列处理(Time,Series),大数据,算法,时间序列预测,能源消耗预测)