datamonday

【Paper】A review of data-driven building energy consumption prediction studies

论文原文：https://www.sciencedirect.com/science/article/pii/S1364032117306093
论文被引：351（08/06/220）
论文年份：2018

Abstract

Energy is the lifeblood of modern societies. In the past decades, the world’s energy consumption and associated CO2emissions increased rapidly due to the increases in population and comfort demands of people. Building energy consumption prediction is essential for energy planning, management, and conservation. Data-driven models provide a practical approach to energy consumption prediction. This paper offers a review of the studies that developed data-driven building energy consumption prediction models, with a particular focus on reviewing the scopes of prediction, the data properties and the data preprocessing methods used, the machine learning algorithms utilized for prediction, and the performance measures used for evaluation. Based on this review, existing research gaps are identified and future research directions in the area of data-driven building energy consumption prediction are highlighted.

能源是现代社会的命脉。在过去的几十年里，由于人口的增加和人们对舒适度的要求，世界能源消耗和二氧化碳排放量迅速增加。建筑能耗预测是能源规划、管理和节约的基础。数据驱动模型为能源消耗预测提供了一种实用的方法。本文回顾了开发数据驱动的建筑能耗预测模型的研究，重点介绍了预测的范围、数据特性和使用的数据预处理方法、用于预测的机器学习算法以及用于评估的性能度量。在此基础上，指出了数据驱动建筑能耗预测领域存在的研究空白和未来的研究方向。

1. Introduction

Buildings represent a large portion of the world’s energy consumption and associated CO2emissions. For example, the building sector represents 39% and 40% of the energy consumption and 38% and 36% of the CO2emissions in the U.S. [1] and Europe [2], respectively. The use of energy that is generated from fossil fuels contributes CO2 emissions and causes air pollution and global warming. Prediction of building energy consumption is crucial for improved decision making towards reducing energy consumption and CO2emissions, because it can assist in evaluating different building design alternatives and building operation strategies (in terms of their energy efficiency) and improving demand and supply management. However, building energy consumption prediction remains to be a challenging task due to the variety of factors that affect the consumption such as the physical properties of the building, the installed equipment, the outdoor weather conditions, and the energy-use behavior of the building occupants [3].

建筑占世界能源消耗和相关二氧化碳排放的很大一部分。例如，美国[1]和欧洲[2]的建筑业分别占能源消耗的39%和40%，二氧化碳排放量的38%和36%。使用矿物燃料产生的能源会造成二氧化碳排放，并造成空气污染和全球变暖。建筑能耗预测对于改善降低能源消耗和二氧化碳排放的决策至关重要，因为它可以帮助评估不同的建筑设计方案和建筑运营策略（就其能源效率而言），并改善需求和供应管理。然而，建筑能耗预测仍然是一项具有挑战性的任务，因为影响建筑能耗的因素很多，如建筑物的物理特性、安装的设备、室外天气条件以及建筑居住者的能源使用行为[3]。

Two main approaches have been taken for building energy consumption prediction: physical modelling approach and data-driven approach. Physical models (also known as engineering methods or white-box models) rely on thermodynamic rules for detailed energy modelling and analysis. Examples of building energy simulation software that utilize physical models include EnergyPlus, eQuest, and Ecotect. These types of software calculate building energy consumption based on detailed building and environmental parameters such as building construction details; operation schedules; HVAC design information; and climate, sky, and solar/shading information [4]. However, some of such detailed data may not be available to the users at the time of simulation. Failure to provide accurate input can result in poor prediction performance.

建筑能耗预测主要采用两种方法：物理模型法和数据驱动法。物理模型（也称为工程方法或白盒模型）依赖于热力学规则进行详细的能量建模和分析。使用物理模型的建筑能耗模拟软件的示例包括EnergyPlus、eQuest和Ecotect。这些类型的软件根据详细的建筑和环境参数计算建筑能耗，如建筑施工详图、运行时间表、HVAC设计信息以及气候、天空和太阳/阴影信息[4]。然而，在模拟时，用户可能无法获得这些详细的数据。未能提供准确的输入可能导致预测性能较差。

Data-driven building energy consumption prediction modelling, on the other hand, does not perform such energy analysis or require such detailed data about the simulated building, and instead learns from historical/available data for prediction. Data-driven energy consumption prediction has gained a lot of research attention in recent years [5], despite its possible limitations (as discussed in Section 8). In response, a number of review studies on the analysis of existing data-driven approaches has been published. The reviews mostly focused on the machine learning methods/algorithms used in previous research efforts. Despite the importance of these efforts, there is still a lack of review studies that analyze existing data-driven approaches from a more multivariate perspective, including data aspects such as what data types and sizes were used and what features were selected for learning. Such a review would help reveal existing research gaps in the field of data-driven building energy consumption prediction and point towards future research directions.

另一方面，数据驱动的建筑能耗预测模型不进行此类能源分析，也不需要有关模拟建筑的详细数据，而是从历史/可用数据中学习进行预测。近年来，数据驱动的能源消耗预测得到了大量的研究关注[5]，尽管其可能存在局限性（如第8节所述）。作为回应，已经发表了一些关于分析现有数据驱动方法的回顾性研究。综述主要集中在机器学习的方法和算法上。尽管这些努力很重要，但仍然缺乏从更多元的角度分析现有数据驱动方法的回顾性研究，包括数据方面，例如使用了哪些数据类型和大小以及选择了哪些特征进行学习。这样的回顾将有助于揭示数据驱动的建筑能耗预测领域现有的研究空白，并指明未来的研究方向。

To address this gap, this paper offers a review of data-driven building energy consumption prediction studies that utilized machine learning algorithms, including support vector machines (SVM), artificial neural networks (ANN), decision trees, and other statistical algorithms. The paper focuses on reviewing the types of buildings, temporal granularities, types of energy consumption predicted, types of data, types of features, and data sizes in the existing studies; and provides a discussion of the review results and future research directions. The paper is organized as follows. Section 2 provides a concise overview of existing review studies on data-driven building energy consumption prediction and identifies the gaps in this area. Section 3 gives a brief introduction on the background of data-driven approaches. Section 4 defines the methodology used in this review study. Section 5 reviews previous studies in terms of the scopes of prediction, the data properties and the data preprocessing methods used, the machine learning algorithms utilized for prediction, and the performance measures used for evaluation. Section 6 discusses the previous studies in terms of the temporal granularities of prediction, the types of buildings, and the types of energy consumption predicted. Finally, Section 7 discusses future research directions, Section 8 discusses the limitations of data-driven energy consumption prediction, and Section 9 summarizes the conclusions.

为了解决这一差距，本文综述了利用机器学习算法（包括支持向量机（SVM）、人工神经网络（ANN）、决策树和其他统计算法的数据驱动建筑能耗预测研究。本文重点回顾了现有研究中建筑物类型、时间粒度、能耗预测类型、数据类型、特征类型和数据大小，并对综述结果和未来研究方向进行了讨论。论文组织如下。第2节简要概述了数据驱动的建筑能耗预测的现有回顾研究，并指出了该领域的差距。第三部分简要介绍了数据驱动方法产生的背景。第4节定义了本综述研究中使用的方法。第五节回顾了以往的研究，包括预测的范围、数据的性质和使用的数据预处理方法、用于预测的机器学习算法以及用于评估的性能度量。第6节讨论了先前的研究，包括预测的时间粒度、建筑物类型和预测的能源消耗类型。最后，第7节讨论了未来的研究方向，第8节讨论了数据驱动能耗预测的局限性，第9节总结了结论。

2. Existing review studies on data-driven building energy consumption prediction

Data-driven building energy consumption prediction gained a lot of attention in recent years. In response, a number of review studies has focused on the analysis of existing data-driven efforts. For example, Zhao and Magoulès [4] classified building energy consumption prediction methods as elaborate engineering methods, simplified engineering methods, statistical methods, ANN-based methods, SVM-based methods, and grey models; and conducted some comparative analysis in terms of model complexity, ease of use, running speed, inputs needed, and accuracy. Ahmad et al. [2] focused on the review of ANN-based, SVM-based, and hybrid methods and discussed the principles, advantages, and disadvantages of these methods. Fumo [6] summarized the classification of building energy consumption prediction methods proposed by various studies and placed a special emphasis on the review of model calibration and verification and weather data used for modelling. Li and Wen [7] conducted an inclusive review; they reviewed state-of-the-art studies not only on building energy modelling and prediction but also on building critical component modelling (e.g., photovoltaic power generation modelling), building energy modelling for demand response (e.g., weather condition forecasting), agent-based building energy modelling, and system identification for building energy modelling. Li et al. [8] reviewed the methods for building energy benchmarking and proposed a flowchart that intends to assist users in choosing the proper prediction method. Chalal et al. [9] focused on both building scale and urban scale energy consumption prediction and further classified and discussed the available methods within each scale. Wang and Srinivasan [10] reviewed and compared the principles, applications, advantages, and disadvantages of single AI-based methods (e.g., ANN and SVM) and ensemble methods.

数据驱动的建筑能耗预测近年来受到了广泛的关注。作为回应，一些审查研究侧重于分析现有的数据驱动工作。例如，Zhao和Magoulès[4]将建筑能耗预测方法分为精细工程方法、简化工程方法、统计方法、神经网络方法、支持向量机方法和灰色模型，并从模型复杂度、易用性、运行速度、所需输入等方面进行了对比分析，以及准确性。艾哈迈德等人[2] 重点介绍了基于人工神经网络、基于支持向量机的方法和混合方法，并讨论了这些方法的原理、优缺点。Fumo[6]总结了各种研究提出的建筑能耗预测方法的分类，并特别强调了模型校准和验证以及用于建模的天气数据的审查。Li和Wen[7]进行了全面的回顾；他们不仅回顾了建筑能源建模和预测方面的最新研究，还回顾了建筑关键部件建模（如光伏发电建模）、需求响应的建筑能源建模（如天气状况预测）、基于主体的建筑能源建筑能量模型的建模和系统辨识。Li等人[8] 回顾了建筑能源基准的方法，并提出了一个流程图，旨在帮助用户选择适当的预测方法。Chalal等人[9] 重点研究了建筑规模和城市规模能耗预测，并对各规模内的可用方法进行了分类和讨论。Wang和Srinivasan[10]回顾并比较了基于人工智能的单一方法（如ANN和SVM）和集成方法的原理、应用、优缺点。

The majority of these studies provided a comprehensive review on energy consumption prediction research efforts with a particular focus on the machine learning methods/algorithms used in these research studies. Despite the importance of these review efforts, there is still a lack of review studies that cover building energy consumption prediction research in terms of the scopes of prediction (e.g., heating energy consumption), the types of data used (e.g., real data, simulated data), the types of features used for prediction (e.g., outdoor weather conditions, indoor environmental conditions), the sizes of the data (e.g., duration of data collection, number of data instances), and the data preprocessing methods utilized (e.g., data reduction). Such a review is essential for identifying the research gaps and highlighting the future research directions in the field of data-driven building energy consumption prediction.

这些研究中的大多数对能源消耗预测研究工作进行了全面的回顾，特别关注这些研究中使用的机器学习方法/算法。尽管这些审查工作很重要，但仍缺乏从预测范围（例如供暖能耗）、所用数据类型（如真实数据、模拟数据）、用于预测的特征类型（例如，建筑能耗，室外天气条件、室内环境条件）方面涵盖建筑能耗预测研究的审查研究。数据的大小（例如，数据收集的持续时间、数据实例的数量）以及所使用的数据预处理方法（例如，数据简化）。这样的回顾对于发现数据驱动建筑能耗预测领域的研究空白和突出未来的研究方向至关重要。

3. Background

Developing a data-driven model, typically, consists of four primary steps: data collection, data preprocessing, model training, and model testing. In the field of building energy consumption prediction, data collection involves collecting historical/available data for model training such as outdoor weather condition and electricity consumption data. Data preprocessing may include data cleaning, data integration, data transformation, and/or data reduction. Model training is the training of the model using a training dataset. Model testing aims to evaluate the model using standard evaluation measures.

开发数据驱动模型通常包括四个主要步骤：数据收集、数据预处理、模型训练和模型测试。在建筑能耗预测领域，数据收集包括收集用于模型训练的历史/可用数据，如室外天气状况和用电量数据。数据预处理可以包括数据清理、数据集成、数据转换和/或数据缩减。模型训练是使用训练数据集对模型进行训练。模型测试的目的是使用标准的评估方法来评估模型。

SVM, ANN, decision trees, and other statistical algorithms are the most commonly-used supervised machine learning algorithms for model training. SVM is a kernel-based machine learning algorithm, which can be used for both regression and classification [11]. The algorithm is good at solving non-linear problems even with a relatively small amount of training data [4]. SVM solves a non-linear problem through transforming the non-linearity between features xi(e.g., drybulb temperature and global solar radiation) and target yi(e.g., cooling energy consumption) using linear mapping in two steps. First, it projects the non-linear problem into a high-dimensional space and determines the function f(x) that fits best in the high-dimensional space. Second, it applies a kernel function to make the complex nonlinear map a linear problem. For further details on the prediction principle using SVM, the readers are referred to [9]. SVM is one of the most robust and accurate algorithms and has been listed in the top-ten most influential data mining algorithms in the research community by the IEEE International Conference on Data Mining [11]. It was found to outperform other machine learning algorithms in numerous applications. In order to increase the computational efficiency of SVM, least squares SVM (LS-SVM) (e.g., [12]) and parallel SVM (e.g., [13]) were also implemented in the field of building energy consumption prediction.

SVM、ANN、决策树等统计算法是模型训练中最常用的有监督机器学习算法。支持向量机是一种基于核的机器学习算法，可用于回归和分类[11]。该算法即使在训练数据量相对较小的情况下也能很好地解决非线性问题[4]。SVM通过在两个步骤中利用线性映射变换特征xi之间的非线性（例如，干球温度和全球太阳辐射）和目标yi（例如，冷却能耗）来解决非线性问题。首先，将非线性问题投影到一个高维空间，并确定最适合高维空间的函数f（x）。其次，利用核函数将复杂的非线性映射问题转化为线性问题。有关使用支持向量机的预测原理的更多详细信息，请参阅[9]。支持向量机（SVM）是国际上最具影响力的数据挖掘算法之一。在许多应用中，它的性能优于其他机器学习算法。为了提高支持向量机的计算效率，将最小二乘支持向量机（LS-SVM）（如[12]）和并行支持向量机（如[13]）应用于建筑能耗预测领域。

ANN is a non-linear computational model, inspired by the human brain. A typical ANN includes three sequential layers: the input layer, the hidden layer, and the output layer. Each layer has a number of interconnected neurons, and each neuron has an activation function. Three types of parameters are typically used to define ANNs: the interconnection pattern between the neurons of the different layers, the learning process of updating the weights of the interconnections, and the activation function that converts a neuron’s weighted input to its output activation [14]. In ANN, each feature (e.g., dry-bulb temperature) is multiplied by its corresponding neuron weight and summed up with the bias. The activation function is then applied to determine the output (e.g., cooling energy consumption). For further details on the prediction principle using ANN, the readers are referred to [9]. ANN is one of the most popular algorithms used in building energy consumption prediction [2]. Examples of ANNs include the back propagation neural networks (BPNN), radial basis function neural networks (RBFNN), general regression neural networks (GRNN), feed forward neural network (FFNN), and adaptive network-based fuzzy inference system (ANFIS). Other methods that can be used in conjunction with ANN include the hierarchical mixture of experts (HME), fuzzy c-means (FCC), and multilayer perceptron (MLP).

人工神经网络是一种非线性计算模型，受人脑的启发。一个典型的人工神经网络包括三个连续层：输入层、隐藏层和输出层。每一层都有许多相互连接的神经元，每一个神经元都有一个激活功能。三种类型的参数通常用于定义ann：不同层神经元之间的互连模式，更新互连权值的学习过程，以及将神经元加权输入转换为输出激活的激活函数[14]。在人工神经网络中，每个特征（如干球温度）乘以其相应的神经元权重，并用偏差求和。然后应用激活函数来确定输出（例如，冷却能耗）。关于使用人工神经网络预测原理的更多细节，请参阅[9]。人工神经网络是建筑能耗预测中最常用的算法之一[2]。神经网络的例子包括反向传播神经网络（BPNN）、径向基函数神经网络（RBFNN）、广义回归神经网络（GRNN）、前馈神经网络（FFNN）和基于自适应网络的模糊推理系统（ANFIS）。其他可以与人工神经网络结合使用的方法包括层次混合专家（HME）、模糊c-均值（FCC）和多层感知器（MLP）。

Decision tree algorithms use a tree to map instances into predictions. In a decision tree model, each non-leaf node represents one feature, each branch of the tree represents a different value for a feature, and each leaf node represents a class of prediction. Decision trees is a flexible algorithm that could grow with an increased amount of training data [15]. The classification and regression trees (CART), chi-squared automatic interaction detector (CHAID), random forest (RF), and boosting trees (BT) are the most widely-used decision tree methods in the area of building energy consumption prediction.

决策树算法使用树将实例映射到预测中。在决策树模型中，每个非叶节点代表一个特征，树的每个分支代表一个特征的不同值，每个叶节点代表一个预测类。决策树是一种灵活的算法，可以随着训练数据量的增加而增长[15]。分类回归树（CART）、卡方自动交互检测器（CHAID）、随机森林（RF）和boosting trees（BT）是建筑能耗预测领域应用最广泛的决策树方法。

Other statistical algorithms include multiple linear regression (MLR), general linear regression (GLR), ordinary least squares regression (OLS), autoregressive (AR), autoregressive integrated moving average (ARIMA), Bayesian regression, polynomial regression (poly), exponential regression, multivariate adaptive regression splines (MARS), case-based reasoning (CBR), and k-nearest neighbors (kNN).

其他统计算法包括多元线性回归（MLR）、一般线性回归（GLR）、普通最小二乘回归（OLS）、自回归（AR）、自回归综合移动平均（ARIMA）、贝叶斯回归、多项式回归（poly）、指数回归、多元自适应回归样条（MARS），基于实例推理（CBR）和k-最近邻（kNN）。

Algorithms used for developing energy consumption prediction models have advantages and disadvantages. For example, ANN and SVM require many parameters and might become computationally expensive, but their prediction accuracy is, in many cases, better than decision trees and statistical algorithms. Decision trees and other statistical algorithms, on the other hand, are generally easy to use and computationally inexpensive, but their performance is usually fair [4].

用于开发能耗预测模型的算法各有优缺点。例如，人工神经网络和支持向量机需要很多参数，可能会增加计算成本，但在许多情况下，它们的预测精度比决策树和统计算法要好。另一方面，决策树和其他统计算法通常易于使用且计算成本低廉，但它们的性能通常是公平的[4]。

4. Methodology

The research methodology was composed of five primary steps:

Conducting a keyword-based search: A keyword-based search of research articles and abstracts was conducted using Google Scholar. Examples of the keywords that were used are: building energy estimation, building energy use prediction, building energy consumption forecasting, building energy modelling. Google Scholar was selected, because it can rank articles based on some factors such as number of citations, authors, and publisher.
进行基于关键字的搜索：使用Google Scholar对研究文章和摘要进行基于关键字的搜索。使用的关键词有：建筑能源估算、建筑能源使用预测、建筑能耗预测、建筑能源建模。谷歌学者之所以被选中，是因为它可以根据引用次数、作者和出版商等因素对文章进行排名。
Screening the retrieved articles: The articles were screened for relevance using the following criteria: (1) the approach must be data-driven; and (2) the purpose must be to predict building energy consumption.
筛选检索到的文章：使用以下标准筛选文章的相关性：（1）方法必须是数据驱动的；（2）目的必须是预测建筑能耗。
Identifying and screening additional articles: The articles that cited or were cited by an article that passed the screening test were further identified as additional candidate articles. These articles were further screened using the same two relevance criteria defined above.
附加文章的识别和筛选：通过筛选测试的文章被引用或被引用的文章被进一步确定为附加候选文章。这些文章是进一步筛选使用相同的两个相关性标准以上定义。
Reviewing all relevant articles: All articles identified in steps 2 and 3 were analytically reviewed to define their purpose of prediction, scope of prediction, data properties and data preprocessing methods, machine learning algorithm(s), and performance.
回顾所有相关文章：对步骤2和步骤3中确定的所有文章进行分析性审查，以确定其预测目的、预测范围、数据属性和数据预处理方法、机器学习算法和性能。
Analyzing the review results to identify gaps and future directions: The review results were analyzed to identify the research gaps in the field of data-driven building energy consumption and highlight future research directions.
对评审结果进行分析，找出差距和未来发展方向：对评审结果进行分析，找出数据驱动建筑能耗领域的研究差距，突出未来的研究方向。

5. Review of existing data-driven energy consumption prediction models

5.1. Scope of prediction

The scope of the studies was classified in terms of type of building, temporal granularity, and type of energy consumption predicted. Two types of buildings (residential and non-residential), five types of temporal granularities (sub-hourly, hourly, daily, monthly, and yearly), and four types of energy consumption (heating, cooling, lighting, and overall energy consumption) were defined.

研究范围按建筑类型、时间粒度和预测的能源消耗类型分类。定义了两种类型的建筑（住宅和非住宅），五种时间粒度（次小时、每小时、每天、每月和每年），以及四种类型的能源消耗（供暖、制冷、照明和总能耗）。

Existing models covered residential and/or non-residential buildings, with different temporal granularities and for different types of energy consumption. Fig. 1. shows the distribution of the reviewed models according to type of building, temporal granularity, and type of energy consumption. Only 19% of these models focused on residential buildings, with the remaining models focusing on non-residential buildings including commercial and educational buildings. The majority of these models, 57%, were developed for predicting hourly energy consumption, while 12%, 15%, 4%, and 12% of the models focused on sub-hourly, daily, monthly, and yearly consumption, respectively. Overall, 47% of the models focused on predicting overall energy consumption, with 31% and 20% focusing on cooling and heating energy consumption, respectively, and only 2% focusing on lighting energy consumption prediction. The scope of each reviewed model is summarized in Table 1, in terms of building type, temporal granularity, type of energy consumption, and purpose of prediction.

现有模型涵盖住宅和/或非住宅建筑，具有不同的时间粒度和不同类型的能源消耗。图1。显示了根据建筑类型、时间粒度和能耗类型审查的模型的分布。这些模型中只有19%集中在住宅建筑上，其余模型集中在包括商业和教育建筑在内的非住宅建筑上。这些模型中的大多数（57%）是为预测小时能耗而开发的，而12%、15%、4%和12%的模型分别侧重于次小时、每日、每月和每年的能耗。总体而言，47%的模型侧重于预测整体能耗，其中31%和20%的模型侧重于制冷和供热能耗的预测，只有2%的模型侧重于照明能耗预测。表1总结了每个被审查模型的范围，包括建筑类型、时间粒度、能耗类型和预测目的。

5.2. Data properties and data preprocessing

5.2.1. Types of data: real, simulated, or benchmark

Data were classified into three types: (1) real data, (2) simulated data, and (3) public benchmark data (e.g., datasets provided for energy consumption prediction competitions). Fig. 1. shows the distribution of the reviewed studies by type of data used for training and testing. The majority (67%) of these studies used real data to train and test their models, while 19% and 14% of the studies used simulated and public benchmark data, respectively. Table 1 shows the types of data used in the reviewed studies.

数据分为三类：（1）真实数据，（2）模拟数据，和（3）公共基准数据（如为能源消耗预测竞赛提供的数据集）。图1。按用于培训和测试的数据类型显示所审查研究的分布情况。这些研究中的大多数（67%）使用真实数据来训练和测试模型，而19%和14%的研究分别使用模拟和公共基准数据。表1显示了回顾性研究中使用的数据类型。

Real data cover data collected through smart energy meters, sensors, building management systems, and weather stations; in addition to utility bills, energy consumption surveys, and energy consumption statistics and reports [16]. Sensor-based approaches have several advantages and disadvantages. On one hand, sensor-based approaches provide actual indoor environmental condition data and energy consumption levels. On the other hand, installing sensors brings an additional cost and effort not only to install the required sensors, but also to test and ensure the quality of the data collected [12]. Otherwise, sensor data may include noise, missing values, and/or outliers, which would affect the performance of the prediction models adversely.

真实数据包括通过智能电能表、传感器、建筑管理系统和气象站收集的数据；此外还有水电费账单、能源消耗调查、能源消耗统计和报告[16]。基于传感器的方法有几个优点和缺点。一方面，基于传感器的方法提供实际的室内环境条件数据和能耗水平。另一方面，安装传感器会带来额外的成本和精力，不仅要安装所需的传感器，还要测试并确保所收集数据的质量[12]。否则，传感器数据可能包括噪声、缺失值和/或异常值，这将对预测模型的性能产生不利影响。

Simulation-based studies, on the other hand, model an existing or unexisting building in a building energy simulation software tool – such as EnergyPlus, DeST, DOE2, or Ecotect – and obtain the needed data through running the simulations. By nature of modelling, a model cannot fully represent its prototype or exactly behave same as it does. For example, Li et al. [17] showed that current building energy software tools are, in some cases, limited in evaluating the performance of energy conservation measures. Simulation data are, however, useful in cases where real data are limited (e.g., when instrumenting a building is difficult due to technical difficulties and/or economic reasons).

另一方面，基于仿真的研究在建筑能耗仿真软件工具（如EnergyPlus、DeST、DOE2或Ecotect）中对现有或不存在的建筑进行建模，并通过运行模拟获得所需的数据。根据建模的本质，一个模型不能完全代表它的原型，或者表现得和它完全一样。例如，Li等人。[17] 表明目前的建筑节能软件工具在某些情况下，在评估节能措施的性能方面是有限的。然而，在真实数据有限的情况下（例如，由于技术困难和/或经济原因，对建筑物进行仪器测量很困难时），模拟数据是有用的。

Other studies (e.g., [12,18,19]) utilized publicly-available benchmark datasets such as the ASHRAE’s Great Building Energy Predictor Shootout and EUNITE dataset. This type of datasets provides benchmark data that can be used to compare the performance of different models.

其他研究（例如，[12,18,19]）使用了公开可用的基准数据集，如ASHRAE的伟大建筑能源预测器射击和EUNITE数据集。这种类型的数据集提供了基准数据，可用于比较不同模型的性能。

5.2.2. Types of features

A machine learning model predicts energy consumption based on a set of features. These features can be related to outdoor weather conditions, indoor environmental conditions, building characteristics, time, occupancy and occupant energy use behavior, and/or historical energy consumption. Outdoor weather condition features include drybulb temperature, dew point temperature, relative humidity, global solar radiation, wind speed, wind direction, degree of cloudiness, pressure, rainfall amount, and evaporation. Indoor environmental condition features include room temperature, room relative humidity, and indoor lighting level. Building characteristic features include relative compactness, surface area, wall area, roof area, overall height, orientation, glazing area, glazing area distribution, mean heat transfer coefficient of building walls, mean thermal inert index of building walls, roof heat transfer coefficient, building size coefficient, absorption coefficient for solar radiation of exterior walls, eastern window-wall ratio (WWR), western WWR, southern WWR, northern WWR, mean WWR, shading coefficient (SC) of eastern window, SC of western window, SC of southern window, SC of northern window, and integrated SC. Time features include the type of day (e.g., weekday, weekend, holiday) and the type of hour (e.g., daytime, nighttime). Occupant energy use behavior and occupancy features include building use schedule, heat gain through lights and people, water temperature, and number of occupants.

机器学习模型根据一组特征预测能耗。这些特征可能与室外天气条件、室内环境条件、建筑特征、时间、占用和居住者能源使用行为和/或历史能源消耗有关。室外气候条件特征包括干球温度、露点温度、相对湿度、全球太阳辐射、风速、风向、云量、气压、降雨量和蒸发量。室内环境条件特征包括室内温度、室内相对湿度和室内照明水平。建筑特征包括相对密实度、表面积、墙体面积、屋面面积、总高度、朝向、采光面积、玻璃面积分布、建筑墙体平均传热系数、建筑墙体平均热惰性指数、屋面传热系数、建筑面积系数、吸热系数外墙太阳辐射、东窗墙比（WWR）、西窗、南窗、北窗、平均WWR、东窗遮阳系数（SC）、西窗SC、南窗SC、北窗SC、综合SC。时间特征包括日类型（如工作日、周末、假日）和小时类型（例如，白天，晚上）。居住者的能源使用行为和居住特征包括建筑使用计划、通过灯光和人获得的热量、水温和居住人数。

For all these types of features, some studies used data considering various past time steps (e.g., past hour) in history. For example, Li et al.

对于所有这些类型的特征，一些研究使用了考虑历史中各种过去时间步长（例如，过去一小时）的数据。例如，Li等人。

[20] used current outdoor dry-bulb temperature, outdoor dry-bulb temperature of an hour ago, outdoor dry-bulb temperature of two hours ago, current relative humidity, current solar radiation, and solar radiation of an hour ago to predict building cooling load. Jain et al. [16] used electricity consumption of the previous two time steps, current temperature, current solar flux, a denote for weekend/holiday or weekday, sine of current hour, and cosine of current hour to predict the electricity consumption of a multi-family residential building. Table 1 summarizes the features used in the reviewed models.

[20] 利用当前室外干球温度、1小时前室外干球温度、2小时前室外干球温度、当前相对湿度、当前太阳辐射、1小时前太阳辐射预测建筑冷负荷。Jain等人[16] 利用前两个时间步长的用电量、当前温度、当前太阳通量、周末/假日或工作日的表示、当前小时的正弦和当前小时的余弦来预测多户住宅的用电量。表1总结了评审模型中使用的特性。

5.2.3. Data sizes

The sizes of datasets varied from 2-week (e.g., [21]) to 4-year energy consumption data (e.g. [22,23]). A small dataset may not be able to capture a representative sample of data, whereas a large dataset requires a lot of computational effort to process. The majority (56%) of the reviewed studies utilized one-month to one-year long datasets; 9% utilized datasets shorter than one-month; and 31% utilized datasets longer than one-year. Table 1 shows the dataset sizes used in the reviewed studies.

数据集的大小从2周（例如[21]）到4年能源消耗数据（例如[22,23]）。小数据集可能无法捕获具有代表性的数据样本，而大型数据集则需要大量的计算工作来处理。大多数（56%）被审查的研究使用了一个月到一年的长数据集；9%使用了少于一个月的数据集；31%使用了超过一年的数据集。表1显示了回顾性研究中使用的数据集大小。

5.2.4. Data preprocessing

Data preprocessing is essential for any data-driven approach, because any incorrect or inconsistent data can cause errors in the analysis [24]. Data preprocessing may include data cleaning, data integration, data transformation, and/or data reduction. Data cleaning is the process of detecting and correcting (completing, modifying, replacing, and/or removing) the incomplete, incorrect, inaccurate, irrelevant, and/or noisy parts of the data. For example, data collected through sensors are usually noisy and often incomplete [25]. Data integration is the process of combining multiple data from different sources. For example, outdoor weather condition data and hourly electricity consumption data come from different sources, but are combined in a single dataset for training and testing. Data transformation is the process of transforming the data into the format that is required by the learning algorithm. Data transformation may include normalization, smoothing, aggregation/disaggregation, and/or generalization of the data. Data reduction is the process of reducing the dimensionality of the dataset, which is not only computationally more efficient but may also enhance the performance of the machine learning algorithm by removing non-discriminative features. There are different techniques for data reduction including principal component analysis (PCA) and kernel PCA (KPCA). For example, Xuemei et al. [26] applied PCA and KPCA for reducing the dimensionality of the data and compared the performances of SVM with PCA, SVM with KPCA, and SVM without any data reduction techniques. They also applied C-mean clustering to ensure that the training samples were chosen based on the similarity degree of the input samples and compared the performances of fuzzy C-means (FCM) fuzzy SVM, FCM-SVM, and SVM without any clustering [27].

数据预处理对于任何数据驱动方法都是必不可少的，因为任何不正确或不一致的数据都可能导致分析中的错误[24]。数据预处理可以包括数据清理、数据集成、数据转换和/或数据缩减。数据清理是检测和纠正（完成、修改、替换和/或删除）数据中不完整、不正确、不准确、不相关和/或有噪音的部分的过程。例如，通过传感器收集的数据通常是有噪声的，而且常常是不完整的[25]。数据集成是组合来自不同来源的多个数据的过程。例如，室外天气状况数据和每小时用电量数据来自不同的来源，但将它们组合在一个数据集中进行培训和测试。数据转换是将数据转换为学习算法所需格式的过程。数据转换可以包括数据的规范化、平滑化、聚合/分解和/或泛化。数据约简是对数据集进行降维的过程，它不仅在计算上更有效，而且可以通过去除非歧视性特征来提高机器学习算法的性能。有不同的数据简化技术，包括主成分分析（PCA）和核PCA（KPCA）。例如，Xuemei 等人[26]应用PCA和KPCA对数据进行降维处理，比较了SVM与PCA、SVM与KPCA、SVM与KPCA以及不使用任何数据缩减技术的SVM的性能。他们还应用C均值聚类来确保训练样本是根据输入样本的相似度来选择的，并比较了模糊C均值（FCM）模糊支持向量机、FCM-SVM和不进行任何聚类的支持向量机的性能[27]。

5.3. Machine learning algorithms

A machine learning algorithm is needed to train an energy consumption prediction model. Previous studies in data-driven building energy consumption prediction have utilized SVM, ANN, decision trees, and/or other statistical algorithms. Fig. 1. shows the distribution of the studies by type of machine learning algorithm. Overall, 47% and 25% of the studies utilized ANN and SVM, respectively, to train their models. Only 4% of the studies utilized decision trees. On the other hand, 24% of the studies utilized other statistical algorithms such as MLR, OLS, and ARIMA.

需要一种机器学习算法来训练能耗预测模型。以前在数据驱动的建筑能耗预测研究中已经使用了支持向量机、神经网络、决策树和/或其他统计算法。图1显示了按机器学习算法类型划分的研究分布。总的来说，47%和25%的研究分别使用人工神经网络和支持向量机来训练他们的模型。只有4%的研究使用决策树。另一方面，24%的研究使用了其他统计算法，如MLR、OLS和ARIMA。

Some studies also compared the effectiveness of different algorithms in energy consumption prediction. For example, Li et al. [20] compared SVM and BPNN; Borges et al. [28] compared SVM and AR; Xuemei et al. [29] compared LS-SVM and BPNN; Liu and Chen [21] compared SVM and ANN; Penya et al. [30] compared poly, exponential, mixed, AR, ANN, SVM, and Bayesian Network; Platon et al. [31] compared ANN and CBR; Jain et al. [32] compared SVM and MLR; Hou et al. [33] compared ARIMA and ANN; Penya et al. [34] compared AR, ARIMA, ANN, and Bayesian Network; Fan et al. [35] compared MLR, ARIMA, SVM, RF, MLP, BT, MARS, and kNN; Chou and Bui [36] compared ANN, SVM, CART, CHAID, and GLR; Edwards et al. [12] compared MLR, FFNN, SVM, LS-SVM, HME-FFNN, and FCM-FFNN; Li et al. [37] and Li et al. [38] compared SVM, BPNN, RBFNN, and GRNN; Dagnely et al. [22] compared OLS and SVM; Massana et al. [39] compared MLR, MLP, and SVM; and Fernandez et al. [40] compared AR, poly, ANN, and SVM.

一些研究还比较了不同算法在能耗预测中的有效性。例如，Li等人[20] 比较了支持向量机和bp神经网络；博尔赫斯等[28]比较支持向量机和AR；Xuemei等人[29]比较了LS-SVM和BPNN；Liu和Chen[21]比较了SVM和ANN；Penya等人[30]比较了poly、index、mixed、AR、ANN、SVM和Bayesian网络；Platon等人[31]比较了ANN和CBR；Jain等人[32]比较了SVM和MLR；Hou等人[33]比较了ARIMA和ANN；Penya等人[34]比较了AR、ARIMA、ANN和贝叶斯网络；Fan等人[35]比较了MLR、ARIMA、SVM、RF、MLP、BT、MARS和kNN；Chou和Bui[36]比较了ANN、SVM、CART、CHAID和GLR；Edwards等人[12] 比较了MLR、FFNN、SVM、LS-SVM、HME-FFNN和FCM-FFNN；Li等[37]和Li等人[38]比较了SVM、BPNN、RBFNN和GRNN；Dagnely等人[22]比较了OLS和SVM；Massana等人[39]比较了MLR、MLP和SVM；以及Fernandez等人[40]比较了AR、poly、ANN和SVM。

5.4. Performance evaluation

Model testing is the evaluation of the prediction model using some standard evaluation measures. The most commonly-used evaluation measures of energy consumption prediction models are the coefficient of variation (CV), mean absolute percentage error (MAPE), and root mean square error (RMSE). These measures can be calculated using Eqs. (1 to 3). Overall, 41%, 29%, and 16% of the reviewed studies utilized CV, MAPE, and RMSE, respectively, to evaluate their models. Other measures used for evaluating energy consumption prediction include the mean absolute error (MAE), mean bias error (MBE), mean squared error (MSE), R-squared (R2), and error rate (δ). These measures can be calculated using Eqs. (4 to 8). CV is the most commonly-used evaluation measure probably for two reasons. First, it is one of the performance evaluation measures recommended by ASHRAE for evaluating energy consumption prediction models. Second, it normalizes the prediction error by the average energy consumption and provides a unitless measure that is more convenient for comparison purposes.

模型测试是使用一些标准的评估方法对预测模型进行评估。能耗预测模型最常用的评价指标是变异系数（CV）、平均绝对百分比误差（MAPE）和均方根误差（RMSE）。这些度量可以用等式来计算。（1到3）。总的来说，41%、29%和16%的被审查研究分别使用了CV、MAPE和RMSE来评估他们的模型。用于评估能耗预测的其他指标包括平均绝对误差（MAE）、平均偏差误差（MBE）、均方误差（MSE）、R平方（R2）和误差率（δ）。这些度量可以用等式来计算。（4到8）。CV是最常用的评价指标，可能有两个原因。首先，它是ASHRAE推荐的用于评价能耗预测模型的性能评价指标之一。其次，它用平均能耗对预测误差进行归一化处理，并提供了一种更便于比较的无量纲度量。

6. Discussion

6.1. Temporal granularities

Both short-term (e.g., sub-hourly, hourly, or daily) and long-term (e.g., yearly) energy consumption prediction are essential for building and grid design and operation. For example, “HVAC operations including adjusting the starting time of cooling to meet start-up loads, minimizing or limiting the electric on-peak demand, optimizing the costs and energy utilization in cool storage systems, and related energy and cost needs in other HVAC systems” all benefit from short-term energy consumption prediction [26]. Short-term energy consumption prediction models are also utilized for maintaining economic and secure operation of power grids and for providing energy consumption data to building occupants to better negotiate energy prices with energy retailers [40]. Among the reviewed literature, 84% of the studies focused on short-term energy consumption prediction because of its direct relation to the day-to-day operations of buildings [35].

短期（例如，每小时、每小时或每天）和长期（如每年）的能耗预测对于建筑和电网的设计和运行都是至关重要的。例如，“暖通空调运行，包括调整制冷开始时间以满足启动负荷，最小化或限制用电高峰需求，优化蓄冷系统的成本和能源利用率，以及其他暖通空调系统的相关能源和成本需求”都得益于短期能耗预测[26]。短期能源消耗预测模型还可用于维持电网的经济和安全运行，并向建筑住户提供能耗数据，以便更好地与能源零售商协商能源价格[40]。在回顾的文献中，84%的研究集中在短期能源消耗预测上，因为它直接关系到建筑物的日常运行[35]。

Only 12% of the studies focused on long-term (yearly) energy consumption prediction. This might be caused by several reasons. First, to achieve good performance, long-term energy consumption prediction requires a relatively higher amount of data that covers a long time span [79]. For example, prediction errors of annual energy consumption prediction models, which were developed based on 1-day, 1-week, and 3-month measurements, were 100%, 30%, and 6%, respectively [80]. Second, nonlinearity in long-term data is usually more prominent compared to short-term data [81]. Third, uncertainties in long-term energy consumption prediction are usually higher because of the many changes that may occur in the supply and demand over a long time span. Long-term energy consumption prediction, thus, requires specific long-term prediction models due to the non-homogeneity and significant changes that may occur on the long-run [82]. Despite their challenges, long-term energy consumption prediction models are essential; they are required when studying decisions of long-term implications such as capacity expansion, energy supply strategy, and capital investment [83].

只有12%的研究集中在长期（每年）的能源消耗预测上。这可能是由几个原因造成的。首先，为了获得良好的性能，长期的能源消耗预测需要覆盖较长时间跨度的相对较高的数据量[79]。例如，根据1天、1周和3个月的测量数据建立的年度能源消耗预测模型的预测误差分别为100%、30%和6%[80]。其次，与短期数据相比，长期数据的非线性通常更为突出[81]。第三，长期能源消费预测中的不确定性通常较高，因为在很长一段时间内，供需可能会发生许多变化。因此，长期能源消耗预测需要特定的长期预测模型，因为从长期来看可能会发生不均匀性和重大变化[82]。尽管存在挑战，但长期能源消耗预测模型是必不可少的；在研究产能扩张、能源供应战略和资本投资等长期影响的决策时，需要这些模型[83]。

6.2. Building types

About 81% of the reviewed research efforts focused on developing energy consumption prediction models for commercial and/or educational buildings, with only 19% focusing on residential buildings. The relative lack of studies on residential buildings could be due to a number of reasons. First, the lack of data – specifically sensor-based data – could be a main reason. The majority, 73%, of non-residential building energy consumption prediction models rely on sensor data for algorithm training. Such data are much harder to obtain for residential buildings because the majority of buildings are not sufficiently metered in a way that allows for sensing at high granularity [10]. Another reason could be the complexity of predicting energy consumption in residential contexts because of the relatively higher variability of occupant behavior compared to the commercial context [16]. Occupant behavior is the greatest uncertainty in building energy consumption prediction [84]; ignoring, misunderstanding, and/or underestimating the role of occupant behavior in affecting energy consumption is one of the main causes for the deviations between the predicted and the actual consumption levels [85].

大约81%的研究工作集中在开发商业和/或教育建筑的能源消耗预测模型上，只有19%的研究集中在住宅建筑上。对住宅建筑的研究相对缺乏可能是由于许多原因造成的。首先，缺乏数据——特别是基于传感器的数据——可能是一个主要原因。非住宅建筑能耗预测模型大多（73%）依赖传感器数据进行算法训练。对于住宅建筑来说，这类数据更难获得，因为大多数建筑的计量方式不足以实现高粒度的传感[10]。另一个原因可能是住宅环境中预测能源消耗的复杂性，因为与商业环境相比，居住者行为的可变性相对较高[16]。居住者行为是建筑能耗预测中最大的不确定性[84]；忽视、误解和/或低估居住者行为在影响能耗中的作用是导致预测和实际能耗水平偏差的主要原因之一[85]。

Despite their challenges, residential building energy consumption predictions are needed because of the high energy consumption share of this sector and the potential high gain that can be achieved if successful energy reducing strategies are implemented. Residential buildings represent 21% of the total energy consumption in the US, which is greater than the share of commercial buildings [86]. Further studies are, thus, needed on the residential sector. For example, experimental studies could be conducted to see if/how existing datadriven commercial building energy consumption prediction models could be extended to the residential context.

尽管存在挑战，但住宅建筑能耗预测仍然是必要的，因为这一部门的能源消耗份额很高，而且如果实施成功的节能战略，就可能实现高收益。在美国，住宅建筑占总能耗的21%，高于商业建筑的比重[86]。因此，需要对住宅部门进行进一步的研究。例如，可以进行实验研究，看看现有的数据驱动的商业建筑能耗预测模型是否/如何能够扩展到住宅环境中。

6.3. Energy consumption types

As discussed in Section 5, 46%, 31%, 20%, and 2% of the reviewed research efforts focused on predicting overall, cooling, heating, and lighting energy consumption, respectively. This shows a relative lack of studies on predicting lighting loads. This might be caused by the predominant impact of occupant behavior on lighting energy consumption. Lighting use is directly impacted by building occupancy and occupant behavior patterns [87]. For example, 500 lx is the recommended illuminance level for office buildings [88]. Theoretically, people who have access to natural lighting, when the outdoor illumination is sufficient, are expected to use artificial lightings less [89]. However, Yun et al. [87] showed that there are no statistically significant relationships between outdoor illuminance and artificial lighting use patterns.

正如第5节所讨论的，46%、31%、20%和2%的研究工作集中在预测总体、冷却、加热和照明能耗上。这表明在预测照明负荷方面的研究相对缺乏。这可能是由居住者行为对照明能耗的主要影响造成的。照明使用直接受到建筑物占用率和居住者行为模式的影响[87]。例如，500 lx是办公楼的建议照度水平[88]。从理论上讲，当室外照明充足时，可以使用自然光的人应该少用人工照明[89]。但是，Yun等人[87]表明，室外照度和人工照明使用模式之间没有统计学上的显著关系。

Despite these reasons, lighting energy consumption prediction is essential for building energy efficiency and for efficient supply-side management. Lighting represents almost 20% of the global electricity consumption [90]. Since it is a major heat source, lighting is not only a significant piece of building energy consumption by itself, but it also impacts the cooling energy demand [77]. In general, one-third of the cooling energy consumption can be saved if a good balance between natural light and solar heat can be achieved [57]. In addition, different building design features – in terms of building envelope, architectural features, and building materials – may have different impacts on lighting energy consumption [91]. Lighting energy consumption prediction models, thus, require more attention to better understand lighting energy consumption trends and conservation opportunities, the interaction between cooling load and lighting, and the impacts of various design features on consumption levels.

尽管有这些原因，照明能耗预测对于建筑节能和有效的供应侧管理至关重要。照明几乎占全球用电量的20%[90]。由于照明是一种主要的热源，照明本身不仅是建筑能耗的一个重要组成部分，而且还会影响到制冷能源的需求[77]。一般来说，如果能在自然光和太阳能热之间实现良好的平衡，可以节省三分之一的制冷能耗[57]。此外，不同的建筑设计特征——就建筑围护结构、建筑特征和建筑材料而言——可能会对照明能耗产生不同的影响[91]。因此，照明能耗预测模型需要更多的关注，以便更好地了解照明能耗趋势和节能机会、冷负荷与照明的相互作用以及各种设计特征对能耗水平的影响。

7. Future research directions

Many of the research challenges discussed above can be attributed to insufficiency of data (in terms of representativeness, size, etc.) and/ or complexity of occupant energy use behavior. Two future research directions are discussed in this regard.

上面讨论的许多研究挑战可归因于数据不足（在代表性、规模等方面）和/或居住者能源使用行为的复杂性。在此基础上，讨论了今后的两个研究方向。

One growing research direction is big energy data analytics. With the advent of smart meters and advanced metering infrastructure (AMI) larger sizes of monitoring data will become available. Making these data accessible to the research community may open unprecedented opportunities for researches to better understand building energy efficiency. Establishing a roadmap – including which buildings to monitor and in which locations to ensure data representativeness – could also help consolidate the many research efforts in the area of building energy efficiency, in order to eliminate duplication of efforts, provide more coverage of research questions and methods, and create a stronger research impact in the area of building energy consumption prediction. Future research directions in the area of big energy data analytics include building energy efficiency retrofitting, occupant behavior analysis, and smart energy management. For example, Mathew et al. [92] presented a vision for the potential use of big data analytics in energy efficiency retrofits. Zhou and Yang [93] proposed a vision for interdisciplinary research to analyze and understand individuals׳ energy consumption behavior using big energy data analytics. Zhou et al. [94] presented a comprehensive vision for big-data-driven smart energy management, including smart power generation, power transmission, power distribution and transformation, and demand side management.

一个日益增长的研究方向是大能源数据分析。随着智能电表和先进计量基础设施（AMI）的出现，更大尺寸的监测数据将变得可用。将这些数据提供给研究团体可能会为研究人员提供前所未有的机会，以便更好地了解建筑节能。制定一个路线图——包括监测哪些建筑以及在哪些位置确保数据的代表性——也有助于巩固建筑节能领域的众多研究成果，以消除重复工作，提供更多研究问题和方法的覆盖面，在建筑能耗预测领域创造了较强的研究影响力。未来大能源数据分析领域的研究方向包括建筑节能改造、居住者行为分析和智能能源管理。例如，Mathew等人。[92]提出了在节能改造中使用大数据分析的设想。Zhou和Yang[93]提出了一个跨学科研究的愿景，即利用大能源数据分析来分析和理解个人的能源消费行为。周等。[94]提出了大数据驱动的智能能源管理的全面构想，包括智能发电、输电、配电变电和需求侧管理。

Another important research direction is behavioral energy efficiency. More efforts to capture and study occupant energy use behavior are needed to better understand how energy use behavior affects energy consumption, what the energy wasting and saving behaviors are, and how much improved behaviors can save energy. For example, Turner and Hong [95] recently proposed a framework to capture occupant energy use behavior but did not test their framework in a real-world setting. Empirical studies for capturing occupant energy use behavior and studying their impact on energy consumption are thus needed. Three sub-challenges are, however, associated with energy use behavior studies. One is the cost and time associated with real data collection, as noted above. Another is the difficulty in conducting such studies on a representative sample of occupants; behavior is highly personal and variable across different types of people and more difficult to generalize than other types of energy data. The last is the potential privacy concerns associated with tracking the behavior of occupants.

另一个重要的研究方向是行为能量效率（behavioral energy efficiency）。为了更好地理解能源使用行为是如何影响能源消耗的，什么是能源浪费和节约行为，以及改进后的行为能在多大程度上节约能源，还需要更多的努力来捕捉和研究居住者的能源使用行为。例如，Turner和Hong[95]最近提出了一个框架来捕捉居住者的能源使用行为，但没有在真实环境中测试他们的框架。因此，需要进行实证研究，以捕捉居住者的能源使用行为，并研究其对能源消耗的影响。然而，三个次级挑战与能源使用行为研究有关。一是与实际数据收集相关的成本和时间，如上所述。另一个问题是在有代表性的居住者样本上进行这类研究的难度；行为具有高度的个人性，在不同类型的人群中变化很大，比其他类型的能源数据更难概括。最后一个是与跟踪居住者行为相关的潜在隐私问题。

In addition to these two primary directions, future research efforts could also explore the use of other types of machine learning algorithms in energy consumption prediction. For example, deep learning algorithms have been proven to outperform other machine learning algorithms in many other fields (e.g., image classification and multi-modal data analysis [96]) but have not been sufficiently studied in the field of building energy consumption prediction yet.

除了这两个主要方向，未来的研究工作还可以探索其他类型的机器学习算法在能源消耗预测中的应用。例如，深度学习算法在许多其他领域（如图像分类和多模态数据分析[96]）优于其他机器学习算法，但在建筑能耗预测领域尚未得到充分研究。

As new data-driven models are developed, sharing more information about the development process and purpose, validation, and reusability of these models will be essential to avoid unnecessary duplication of research efforts. Some important model information (e.g., purpose of prediction) are sometimes not reported or not sufficiently described. Insufficient information offers limited guidance on whether certain models are applicable in a new context or not, which could inhibit the reusability of the models.

随着新的数据驱动模型的开发，共享更多关于这些模型的开发过程和目的、验证和可重用性的信息对于避免不必要的重复研究工作至关重要。一些重要的模型信息（例如预测的目的）有时没有被报告或没有得到充分的描述。信息不足对某些模型是否适用于新的环境提供了有限的指导，这可能会抑制模型的可重用性。

8. Limitations of data-driven energy consumption prediction and applicability considerations

Despite the importance of data-driven approaches, data-driven energy consumption prediction has two main limitations. First, datadriven prediction models may not perform well outside of their training range. Assumptions made by the learning algorithm have implications on the model’s ability to cope with new data outside of the training data and whether it would generalize well beyond the training range or not [7]. For example, a model that was trained by learning from a limited dataset (e.g., data collected from a small set of buildings) may not perform well outside of the training data (e.g., different types of buildings in terms of physical properties, operation strategies, weather conditions, occupant behavior, etc.). The dataset used for training must, thus, be representative of the range of application and contain sufficient variety. Collecting such sufficiently representative and wideranging data may be difficult, costly, and/or time consuming [9]. It is, therefore, crucial to consider the training range when determining the suitability of using a data-driven model in a specific application. For example, using a data-driven approach for exploratory analysis of what-if-scenarios outside of the training range may be unsuitable or may be used with caution.

尽管数据驱动方法很重要，但是数据驱动的能源消耗预测有两个主要的局限性。首先，数据驱动的预测模型在训练范围之外可能表现不佳。学习算法所做的假设会影响模型处理训练数据之外的新数据的能力，以及它是否会在训练范围之外进行推广[7]。例如，通过从有限的数据集（例如，从一小组建筑物收集的数据）中学习得到的模型，在训练数据之外（例如，不同类型的建筑物在物理特性、操作策略、天气条件、居住者行为等方面）可能表现不佳。因此，用于培训的数据集必须代表应用范围并包含足够的多样性。收集此类具有充分代表性且范围广泛的数据可能很困难、成本高昂和/或耗时[9]。因此，在确定在特定应用中使用数据驱动模型的适用性时，考虑培训范围是至关重要的。例如，使用数据驱动的方法对培训范围之外的假设情景进行探索性分析可能是不合适的，或者可以谨慎使用。

Second, data-driven prediction models are black-box models – their internals are not known. A black-box model may provide sufficient prediction accuracy, but may be limited in providing a detailed understanding of the different parameters and its behavior in terms of energy consumption [97].

第二，数据驱动的预测模型是黑盒模型，其内部结构未知。黑盒模型可以提供足够的预测精度，但在提供不同参数及其在能耗方面的行为的详细理解方面可能受到限制[97]。

Hybrid or grey-box modelling approaches, on the other hand, offer a combination of physical and data-driven prediction models, thereby leveraging the advantages and minimizing the disadvantages of both approaches. In grey-box models, some internal parameter and equations are physically interpretable. Grey-box models may also show better performance compared to black-box and white-box models. For example, Dong et al. [98] developed a hybrid model, which couples a data-driven model and a thermal network model, for predicting the total and non-AC energy consumptions of residential bu i l d i n g s a n d c o m p a r e d i t s p r e d i c t i o n performance to ANN-, SVM-, LSSVM-, Gaussian mixture model (GMM)-, Gaussian process regression (GPR)-based models. Similarly, Li et al. [99] developed a hybrid improved particle swarm optimization (iPSO)-ANN model for predicting building electricity consumption. The results of both studies showed that these hybrid models offered some performance improvement.

另一方面，混合或灰箱建模方法提供了物理预测模型和数据驱动预测模型的组合，从而充分利用了这两种方法的优点并将其缺点降至最低。在灰箱模型中，一些内部参数和方程是可以物理解释的。与黑盒和白盒模型相比，灰盒模型也可能显示出更好的性能。例如，Dong等人。[98]开发了一个混合模型，该模型将数据驱动模型和热网模型相结合，用于预测基于ANN-、SVM-、LSSVM-、Gaussian mixed model（GMM）-、Gaussian process returnation（GPR）的住宅楼宇和空调系统的总能耗和非空调能耗。同样，Li等人。[99]开发了一种混合改进粒子群优化（iPSO）-神经网络模型，用于预测建筑用电量。两项研究的结果显示，这些混合模型提供了一些性能改善。

9. Conclusions

This paper presented an overview of recent research efforts in the area of data-driven building energy consumption prediction. The scope of a set of models was reviewed in terms of building types (i.e., residential and non-residential), temporal granularities of prediction (i.e., sub-hourly, hourly, daily, monthly, and yearly), and types of energy consumption predicted (i.e., heating, cooling, lighting, and overall). The properties of the data used for training and testing these models were reviewed, including the types of data (i.e., real, simulation, and public benchmark data), the types of features (i.e., features related to outdoor weather conditions, indoor environmental conditions, building characteristics, time, occupant energy use behavior and occupancy, and historical energy consumption data), and the sizes of the data. The machine learning algorithms and the performance levels of these prediction models were also reviewed. The paper concluded with a discussion of the results, research gaps, and future research directions.

本文综述了数据驱动建筑能耗预测领域的最新研究成果。从建筑类型（即住宅和非住宅）、预测的时间粒度（即，每小时、每小时、每天、每月和每年）和预测的能源消耗类型（即供暖、制冷、照明和总体）审查了一组模型的范围。回顾了用于训练和测试这些模型的数据的特性，包括数据类型（即真实、模拟和公共基准数据）、特征类型（即与室外天气条件、室内环境条件、建筑特征、时间相关的特征），居住者能源使用行为和占用率，以及历史能源消耗数据），以及数据的大小。文中还介绍了机器学习算法和这些预测模型的性能水平。文章最后对研究结果、研究差距和未来的研究方向进行了讨论。

As seen from the review, data-driven building energy consumption prediction has been attracting significant research attention. Different models serve different purposes, have different scopes, were trained on different datasets, and use different features for prediction. All of the models have their own strengths and weaknesses and perform differently under different circumstances. There is no one-size-fits-all model that can be utilized under all conditions. Application-specific model development is, therefore, essential and requires case-by-case consideration of all the aspects analyzed in this paper, including data properties and machine learning algorithms.

从回顾中可以看出，数据驱动的建筑能耗预测一直是备受关注的研究热点。不同的模型有不同的用途，有不同的范围，在不同的数据集上训练，并使用不同的特征进行预测。所有的模型都有自己的优缺点，在不同的环境下表现不同。没有一刀切的模型可以在所有条件下使用。因此，特定于应用程序的模型开发是必不可少的，并且需要对本文分析的所有方面（包括数据属性和机器学习算法）逐个进行考虑。

The results of this review indicate some research areas that may require more attention: long-term building energy consumption prediction, residential building energy consumption prediction, and lighting building energy consumption prediction. The relative lack of research efforts in these areas could be attributed to insufficiency of data and/or complexity of occupant energy use behavior in these contexts. Sufficient data – in terms of types, sizes, temporal coverage, and representativeness – are essential. Capturing occupant behavior, and taking it into account, is also critical for improved energy consumption prediction. Future research directions that may lead to major improvements in these areas and beyond include big energy data analytics and behavioral energy efficiency.

本文的研究结果指出了一些值得关注的研究领域：长期建筑能耗预测、住宅建筑能耗预测和照明建筑能耗预测。这些领域的研究工作相对缺乏，可归因于数据不足和/或这些环境下居住者能源使用行为的复杂性。足够的数据——在类型、规模、时间覆盖率和代表性方面——是必不可少的。捕捉乘客行为并将其考虑在内，对于改进能耗预测也是至关重要的。未来的研究方向，可能导致这些领域和其他领域的重大改进，包括大能源数据分析和行为能源效率。

Acknowledgements

This publication was made possible by NPRP Grant #6–1370-2– 552 from the Qatar National Research Fund (a member of Qatar Foundation). The findings achieved herein are solely the responsibility of the authors.

你可能感兴趣的:(时间序列处理（Time,Series）,大数据,算法,时间序列预测,能源消耗预测)

PCL基础：pcl::SACSegmentation＜PointXYZRGBN＞函数全面说明，一遍文章精通平面分割算法多宝Kim #PCL点云库使用笔记 c++算法 windows visual studio
创作不易，如果本篇文章能够给你提供帮助，请点赞鼓励+收藏备查+关注获取最新技术动态，支持作者输出高质量干货！（一般在周末更新技术干货）`pcl::SACSegmentation`是PointCloudLibrary(PCL)中用于进行随机抽样一致性（RandomSampleConsensus，RANSAC）平面分割的类模板，模板参数`PointXYZRGBN`表示点云中点的类型，该类型包含三维坐标
使用PyTorch搭建Transformer神经网络:入门篇 DASA13 pytorch transformer 神经网络
1.简介Transformer是一种强大的神经网络架构,在自然语言处理等多个领域取得了巨大成功。本教程将指导您使用PyTorch框架从头开始构建一个Transformer模型。我们将逐步解释每个组件,并提供详细的代码实现。2.环境设置首先,确保您的系统中已安装Python(推荐3.7+版本)。然后,安装PyTorch和其他必要的库:pipinstalltorchnumpymatplotlib3.P
算法及数据结构系列 - 动态规划诺亚凹凸曼算法及数据结构算法数据结构动态规划
系列文章目录算法及数据结构系列-二分查找算法及数据结构系列-BFS算法文章目录框架思路子序列问题解题模板一维dp数组二维dp数组经典题型322.零钱兑换暴力递归带备忘录的暴力递归动态规划300.最长上升子序列1143.最长公共子序列72.编辑距离框架思路动态规划问题的一般形式就是求最值。动态规划其实是运筹学的一种最优化方法，只不过在计算机问题上应用比较多，比如说求最长递增子序列，最小编辑距离等等。
3.20 补题（二分模板，反向搜索） ZZZS0516 深度优先算法图论 c++
目录D-填涂颜色（搜索）题目描述思路分析代码实现F-跳石头（二分模板）题目描述思路分析代码实现D-填涂颜色（搜索）链接：P1162填涂颜色-洛谷题目描述由数字000组成的方阵中，有一任意形状的由数字111构成的闭合圈。现要求把闭合圈内的所有空间都填写成222。例如：6×66\times66×6的方阵（n=6n=6n=6），涂色前和涂色后的方阵如下：如果从某个000出发，只向上下左右444个方向移动
解析大模型归一化：提升训练稳定性和性能的关键技术秋声studio 口语化解析深度学习人工智能大模型归一化
引言在深度学习领域，特别是在处理大型神经网络模型时，归一化（Normalization）是一项至关重要的技术。它可以提高模型的训练稳定性和性能，在加速收敛方面发挥了重要作用。本文将深入探讨大模型归一化的原理、常见方法及其应用场景，并结合实际案例和代码示例进行说明。一、归一化的作用与理论基础归一化的主要目的是为了提高模型的训练稳定性和性能。具体来说，归一化有以下几个关键作用：提高训练稳定性：在神经网
Eagle_Wood-滤波方式学习笔记 OverflowSummer 嵌入式泛用知识学习笔记人工智能算法嵌入式硬件笔记学习
//1.移动平均滤波器（信号处理）#defineWINDOW_SIZE5floatmoving_average(float*buffer,floatnew_sample){ staticfloatsum=0; staticintindex=0; staticfloatsamples[WINDOW_SIZE]={0}; sum-=samples[index]; samples[ind
PyTorch数据归一化处理：transforms 2401_87555420 pytorch 人工智能 python
##1.数据归一化处理：transforms.Normalize###1.1理解torchvision*torchvision.transforms：常用的图像预处理方法*torchvision.datasets：常用的数据集Dataset实现*torchvision.models：常用的CV（预训练）模型实现torchvision.transforms:常用的数据预处理方法，提升泛化能力，包括：
Python 向量检索库Faiss使用懒大王爱吃狼 python python 开发语言自动化 Python基础 python教程
Faiss（FacebookAISimilaritySearch）是一个由FacebookAIResearch开发的库，它专门用于高效地搜索和聚类大量向量。Faiss能够在几毫秒内搜索数亿个向量，这使得它非常适合于实现近似最近邻（ANN）搜索，这在许多应用中都非常有用，比如图像检索、推荐系统和自然语言处理。以下是如何使用Faiss的基本步骤和示例：1.安装Faiss首先，你需要安装Faiss。你可
小白零基础学数学建模系列-引言与课程目录川川菜鸟数学建模小白到精通系列数学建模
目录引言一、我们的专辑包含哪些内容？第一周：数学建模基础与工具第二周：高级数学建模技巧与应用第三周：机器学习基础与数据处理第四周：监督学习与无监督学习算法第五周：神经网络二、学完本专辑能收获到什么？三、适合什么样的人群学习？四、如何学习本专辑？课程目录第1周：数学建模基础与工具第1天：数学建模入门介绍第2天：数学建模工具介绍第3天：线性回归与曲线拟合第4天：线性规划第5天：动态规划第2周：高级数学
kafka生产消息失败 ...has passed since batch creation plus linger time Lichenpar #记录BUG解决 kafka 网络安全 java
背景：公司要使用华为云的kafka服务，我负责进行技术预研，后期要封装kafka组件。从华为云下载了demo，完全按照开发者文档来进行配置文件配置，但是会报以下错误。org.apache.kafka.common.errors.TimeoutException:Expiring10record(s)fortopic-0:30015mshaspassedsincebatchcreationplusl
AWS SAP学习笔记-概念 HainesFreeman AWS aws
1、什么是ETL应用程序，举个例子说明？ETL（Extract,Transform,Load）应用程序是一种用于数据处理和迁移的工具或程序，它主要负责从多个数据源提取数据，对数据进行转换和清洗，然后将处理后的数据加载到目标数据仓库或数据库中。ETL应用程序广泛应用于数据集成、数据仓库构建、数据分析和数据迁移等场景。ETL的三个主要步骤：Extract（提取）：从各种数据源（如数据库、文件、API等
C++基础系列【26】排序和查找算法程序喵大人 C++基础系列 c语言算法开发语言 c++
博主介绍：程序喵大人35-资深C/C++/Rust/Android/iOS客户端开发10年大厂工作经验嵌入式/人工智能/自动驾驶/音视频/游戏开发入门级选手《C++20高级编程》《C++23高级编程》等多本书籍著译者更多原创精品文章，首发gzh，见文末记得订阅专栏，以防走丢C++基础系列专栏C语言基础系列专栏C++大佬养成攻略专栏C++训练营排序与查找算法的重要性不用过多介绍了吧，面试也经常考察。
探索数据安全新境界：Apache Spark SQL Ranger Security插件深度揭秘乌昱有Melanie
探索数据安全新境界：ApacheSparkSQLRangerSecurity插件深度揭秘项目地址:https://gitcode.com/gh_mirrors/sp/spark-ranger随着大数据的爆炸性增长，数据安全性成为了企业不可忽视的核心议题。在这一背景下，【ApacheSparkSQLRangerSecurityPlugin】以其强大的数据访问控制能力脱颖而出，成为数据处理领域的明星级
Java 大视界 -- Java 大数据在智能医疗远程会诊与专家协作中的技术支持（146）青云交大数据新视界 Java 大视界 java 大数据智能医疗远程会诊专家协作数据安全病例诊断
亲爱的朋友们，热烈欢迎来到青云交的博客！能与诸位在此相逢，我倍感荣幸。在这飞速更迭的时代，我们都渴望一方心灵净土，而我的博客正是这样温暖的所在。这里为你呈上趣味与实用兼具的知识，也期待你毫无保留地分享独特见解，愿我们于此携手成长，共赴新程！一、欢迎加入【福利社群】点击快速加入：青云交灵犀技韵交响盛汇福利社群点击快速加入2：2024CSDN博客之星创作交流营（NEW)二、本博客的精华专栏：大数据新视
遗传算法-变异算法 ArthurKingYs 遗传算法遗传算法神经网络
遗传算法系列（4）变异算法在基因交叉之后产生的子代个体，其变量可能以很小的概率或者步长发生转变，这个过程称为变异(Mutation)。如果进化的目标函数极值是单峰值的，那么，将变异概率p设置为种群数量n的倒数是一个比较好的选择。如果变异概率很大，那么整个搜索过程就退化为一个随机搜索过程。所以，比较稳妥的做法是，进化过程刚刚开始的时候，取p为一个比较大的概率，随着搜索过程的进行，p逐渐缩小到0附近。
初始OpenCV 指尖下的技术 OpenCV opencv 人工智能计算机视觉
OpenCV是一个功能强大、应用广泛的计算机视觉库，它为开发人员提供了丰富的工具和算法，可以帮助他们快速构建各种视觉应用。随着计算机视觉技术的不断发展，OpenCV也将会继续发挥重要的作用。OpenCV提供了大量的计算机视觉算法和图像处理工具，广泛应用于图像和视频的处理、分析以及机器学习领域。所以学习人计算机视觉或者图像处理方面的知识，OpenCV是一个要重点学习的工具库。首先介绍一下OpenCV
Fluent 与 Openfoam 网格比较 Hardess-god CFD 服务器
ANSYSFluent和OpenFOAM是两个广泛使用的计算流体动力学（CFD）软件，它们在网格生成、处理和使用方面存在一些基本差异。这些差异主要源于两者的设计哲学、目标用户群体和工作流程。以下是Fluent和OpenFOAM在网格生成方面的一些关键比较：1.网格生成工具ANSYSFluent:Fluent通常与ANSYSWorkbench集成使用，后者提供了一个强大的网格生成工具（如ANSYSM
ModuleNotFoundError: No module named ‘h5py‘ Hardess-god python
到ModuleNotFoundError:Nomodulenamed'h5py'错误表明Python环境中没有安装h5py模块。h5py是一个用于处理HDF5二进制数据格式的Python接口，广泛用于大规模存储和操纵数据。解决方案：安装h5py要解决这个问题，你需要在你的Python环境中安装h5py。以下是如何在不同环境中安装h5py的步骤：使用pip安装如果你使用的是pip包管理器，可以通过以
深入探讨盘古大模型的高精度多尺度能力 Hardess-god WRF 人工智能算法
随着人工智能技术的快速发展，大模型的研究逐渐进入新的阶段。其中，盘古大模型以其卓越的高精度和多尺度处理能力成为研究热点。本文将详细分析盘古模型在高精度多尺度问题上的技术特征、优势和应用潜力，并探讨其深入研究的方向。一、盘古模型概述盘古模型是华为推出的中文预训练大模型系列，拥有数十亿甚至千亿级的参数规模。它以Transformer架构为基础，通过海量文本数据进行训练，表现出优异的自然语言理解和生成能
遗传算法均匀变异 huahua20190514
importnumpyasnpimportrandompop_1=np.array([[1,11,21,9,16,10,8,17],[2,12,22,10,17,11,9,18],[3,13,23,11,18,12,10
01年实习生被曝负责字节RL核心算法！系字节LLM攻坚小组成员量子位
一个超越DeepSeekGRPO的关键RL算法出现了！用上该算法后，Qwen2.5-32B模型只经过RL训练，不引入蒸馏等其他技术，在AIME2024基准上拿下50分，优于相同setting下使用GRPO算法的DeepSeek-R1-Zero-Qwen，且DAPO使用的训练步数还减少了50%。这个算法名为DAPO，字节、清华AIR联合实验室SIALab出品，现已开源。论文通讯作者和开源项目负责人都
CSP-J备考冲刺必刷题（C++） | AcWing 1253 家谱热爱编程的通信人 c++开发语言
本文分享的必刷题目是从蓝桥云课、洛谷、AcWing等知名刷题平台精心挑选而来，并结合各平台提供的算法标签和难度等级进行了系统分类。题目涵盖了从基础到进阶的多种算法和数据结构，旨在为不同阶段的编程学习者提供一条清晰、平稳的学习提升路径。欢迎大家订阅我的专栏：算法题解：C++与Python实现！附上汇总贴：算法竞赛备考冲刺必刷题（C++）|汇总【题目来源】Acwing：1253.家谱-AcWing题库
【2017-2025】Adobe Photoshop【PS】软件下载安装 adkjcbqvblq adobe photoshop ui
获取安装包https://pan.baidu.com/s/1NLUthiAyC2chlSEwbf1LRQ?pwd=4ppq1.起源与发展1.1初试啼声AdobePhotoshop的历史可以追溯到1987年，当时由托马斯·诺尔（ThomasKnoll）和他的兄弟约翰·诺尔（JohnKnoll）共同开发。托马斯在父亲的帮助下，开始了图像处理的编程尝试。他们的初始产品是一个用于Mac系统的程序，最初名为
栈和队列基础 Luther coder 算法
目录一.队列简述二.栈三.例题一.队列简述队列多用于辅助，很少有单独的题目。例如图的BFS，需要队列辅助实现。常见运用：单调队列：概念和单调栈类似。应用很少，多用于对一些算法的优化（动态规划等），不再赘述。优先队列：普通的队列是一种先进先出的数据结构，元素在队列尾追加，而从队列头删除。在优先队列中，元素被赋予优先级。当访问元素时，具有最高优先级的元素最先删除。优先队列具有最高级先出的特征。基于堆（
机器学习结合伏羲模型高精度多尺度气象分析与降尺度实现 Hardess-god WRF 算法人工智能
随着人工智能的发展，机器学习技术在气象预报领域展现出巨大潜力。本文详细探讨如何结合机器学习（ML）和伏羲模型进行高精度多尺度气象模拟分析，并提供详细的实现步骤和相关代码。1.研究目标与技术路线目标：结合机器学习模型与伏羲气象模式，实现区域和局地高精度降尺度。技术路线：伏羲模型提供大尺度气象数据和预报使用机器学习模型（如CNN、LSTM、XGBoost）进行降尺度2.数据准备与处理2.1气象数据获取
前缀和处理数组区间之和问题张同学吧笔记 c++
1.什么是区间和问题“区间和问题”通常指的是涉及计算或处理数组或数列某个子区间（即一段连续元素）的总和的类型问题。这类问题可能有多种变体和不同的复杂度，但基本思想都是在给定的区间内快速计算总和或处理与区间和相关的操作。2.例题1题目描述给定一个整数数组Array，请计算该数组在每个指定区间内元素的总和。输入描述第一行输入为整数数组Array的长度n，接下来n行，每行一个整数，表示数组的元素。随后的
K8S遇到过的比较深刻的Pod问题 Gold Steps. 技术博文分享 kubernetes 容器云原生故障处理
第一案：Pod集体自杀凌晨12点的告警总是格外刺眼。值班群里突然炸出一连串消息："支付服务全部下线！但Pod日志显示一切正常！"运维组赶到战场时，发现大量Pod像多米诺骨牌般接连消失，监控面板上却全是绿色对勾。错误排查：#查看案发时间线kubectlgetevents--sort-by='.lastTimestamp'|grep-ikilled#查看Pod详细信息kubectldescribepo
MSE分类时梯度消失的问题详解和交叉熵损失的梯度推导阿正的梦工坊 Machine Learning Deep Learning 分类人工智能深度学习机器学习
下面是MSE不适合分类任务的解释，包含梯度推导。以及交叉熵的梯度推导。前文请移步笔者的另一篇博客：大模型训练为什么选择交叉熵损失（Cross-EntropyLoss）：均方误差（MSE）和交叉熵损失的深入对比MSE分类时梯度消失的问题详解我们深入探讨MSE（均方误差）的梯度特性，结合公式推导和分析，解释为什么在预测值接近0或1时梯度趋于0，以及这背后的含义。我会尽量保持清晰且严谨，适合高理论水平的
使用Python和LangChain构建检索增强生成（RAG）应用的详细指南 m0_57781768 python langchain 搜索引擎
使用Python和LangChain构建检索增强生成（RAG）应用的详细指南引言在人工智能和自然语言处理领域，利用大语言模型（LLM）构建复杂的问答（Q&A）系统是一个重要应用。检索增强生成（RetrievalAugmentedGeneration，RAG）是一种技术，通过将模型知识与额外数据结合来增强LLM的能力，使其能够回答关于特定源信息的问题。这些应用不仅限于公开数据，还可以处理私有数据和模
华为OD机试 - 相对开音节 - 正则表达式（Python/JS/C/C++ 2024 E卷 100分）哪吒华为od 正则表达式 python
华为OD机试2024E卷题库疯狂收录中，刷题点这里专栏导读本专栏收录于《华为OD机试真题（Python/JS/C/C++）》。刷的越多，抽中的概率越大，私信哪吒，备注华为OD，加入华为OD刷题交流群，每一题都有详细的答题思路、详细的代码注释、3个测试用例、为什么这道题采用XX算法、XX算法的适用场景，发现新题目，随时更新。一、题目描述相对开音节构成的结构为辅音+元音（aeiou）+辅音(r除外)+
插入表主键冲突做更新 a-john
有以下场景：用户下了一个订单，订单内的内容较多，且来自多表，首次下单的时候，内容可能会不全（部分内容不是必须，出现有些表根本就没有没有该订单的值）。在以后更改订单时，有些内容会更改，有些内容会新增。问题：如果在sql语句中执行update操作，在没有数据的表中会出错。如果在逻辑代码中先做查询，查询结果有做更新，没有做插入，这样会将代码复杂化。解决： mysql中提供了一个sql语
Android xml资源文件中@、@android:type、@*、？、@+含义和区别 Cb123456 @+@?@*
一.@代表引用资源 1.引用自定义资源。格式：@[package:]type/name android：text="@string/hello" 2.引用系统资源。格式：@android:type/name android:textColor="@android:color/opaque_red"
数据结构的基本介绍天子之骄数据结构散列表树、图线性结构价格标签
数据结构的基本介绍数据结构就是数据的组织形式，用一种提前设计好的框架去存取数据，以便更方便，高效的对数据进行增删查改。正确选择合适的数据结构，对软件程序的高效执行的影响作用不亚于算法的设计。此外，在计算机系统中数据结构的作用也是非同小可。例如常常在编程语言中听到的栈，堆等，就是经典的数据结构。经典的数据结构大致如下：一：线性数据结构 (1)：列表 a
通过二维码开放平台的API快速生成二维码一炮送你回车库 api
现在很多网站都有通过扫二维码用手机连接的功能，联图网(http://www.liantu.com/pingtai/)的二维码开放平台开放了一个生成二维码图片的Api,挺方便使用的。闲着无聊，写了个前台快速生成二维码的方法。 html代码如下:(二维码将生成在这div下) ? 1 &nbs
ImageIO读取一张图片改变大小 3213213333332132 java IO image BufferedImage
package com.demo; import java.awt.image.BufferedImage; import java.io.File; import java.io.IOException; import javax.imageio.ImageIO; /** * @Description 读取一张图片改变大小 * @author FuJianyon
myeclipse集成svn（一针见血） 7454103 eclipse SVN MyEclipse
&n
装箱与拆箱----autoboxing和unboxing darkranger J2SE
4.2　自动装箱和拆箱基本数据(Primitive)类型的自动装箱(autoboxing)、拆箱(unboxing)是自J2SE 5.0开始提供的功能。虽然为您打包基本数据类型提供了方便，但提供方便的同时表示隐藏了细节，建议在能够区分基本数据类型与对象的差别时再使用。 4.2.1　autoboxing和unboxing 在Java中，所有要处理的东西几乎都是对象(Object)
ajax传统的方式制作ajax aijuans Ajax
//这是前台的代码 <%@ page language="java" import="java.util.*" pageEncoding="UTF-8"%> <% String path = request.getContextPath(); String basePath = request.getScheme()+
只用jre的eclipse是怎么编译java源文件的？ avords java eclipse jdk tomcat
eclipse只需要jre就可以运行开发java程序了，也能自动编译java源代码，但是jre不是java的运行环境么，难道jre中也带有编译工具？还是eclipse自己实现的？谁能给解释一下呢问题补充：假设系统中没有安装jdk or jre，只在eclipse的目录中有一个jre，那么eclipse会采用该jre，问题是eclipse照样可以编译java源文件，为什么呢？ &nb
前端模块化 bee1314 模块化
背景：前端JavaScript模块化，其实已经不是什么新鲜事了。但是很多的项目还没有真正的使用起来，还处于刀耕火种的野蛮生长阶段。 JavaScript一直缺乏有效的包管理机制，造成了大量的全局变量，大量的方法冲突。我们多么渴望有天能像Java（import），Python (import)，Ruby(require)那样写代码。在没有包管理机制的年代，我们是怎么避免所
处理百万级以上的数据处理 bijian1013 oracle sql 数据库大数据查询
一.处理百万级以上的数据提高查询速度的方法： 1.应尽量避免在 where 子句中使用!=或<>操作符，否则将引擎放弃使用索引而进行全表扫描。 2.对查询进行优化，应尽量避免全表扫描，首先应考虑在 where 及 o
mac 卸载 java 1.7 或更高版本征客丶 java OS
卸载 java 1.7 或更高 sudo rm -rf /Library/Internet\ Plug-Ins/JavaAppletPlugin.plugin 成功执行此命令后，还可以执行 java 与 javac 命令 sudo rm -rf /Library/PreferencePanes/JavaControlPanel.prefPane 成功执行此命令后，还可以执行 java
【Spark六十一】Spark Streaming结合Flume、Kafka进行日志分析 bit1129 Stream
第一步，Flume和Kakfa对接，Flume抓取日志，写到Kafka中第二部，Spark Streaming读取Kafka中的数据，进行实时分析本文首先使用Kakfa自带的消息处理（脚本）来获取消息，走通Flume和Kafka的对接 1. Flume配置 1. 下载Flume和Kafka集成的插件，下载地址：https://github.com/beyondj2ee/f
Erlang vs TNSDL bookjovi erlang
TNSDL是Nokia内部用于开发电信交换软件的私有语言，是在SDL语言的基础上加以修改而成，TNSDL需翻译成C语言得以编译执行，TNSDL语言中实现了异步并行的特点，当然要完整实现异步并行还需要运行时动态库的支持，异步并行类似于Erlang的process（轻量级进程），TNSDL中则称之为hand，Erlang是基于vm(beam)开发，
非常希望有一个预防疲劳的java软件, 预防过劳死和眼睛疲劳,大家一起努力搞一个 ljy325 企业应用
　非常希望有一个预防疲劳的java软件，我看新闻和网站，国防科技大学的科学家累死了，太疲劳，老是加班，不休息，经常吃药，吃药根本就没用，根本原因是疲劳过度。我以前做java,那会公司垃圾，老想赶快学习到东西跳槽离开，搞得超负荷，不明理。深圳做软件开发经常累死人，总有不明理的人，有个软件提醒限制很好，可以挽救很多人的生命。相关新闻：（1）IT行业成五大疾病重灾区：过劳死平均37.9岁
读《研磨设计模式》-代码笔记-原型模式 bylijinnan java 设计模式
声明：本文只为方便我个人查阅和理解，详细的分析以及源代码请移步原作者的博客http://chjavach.iteye.com/ /** * Effective Java 建议使用copy constructor or copy factory来代替clone()方法： * 1.public Product copy(Product p){} * 2.publi
配置管理---svn工具之权限配置 chenyu19891124 SVN
今天花了大半天的功夫，终于弄懂svn权限配置。下面是今天收获的战绩。安装完svn后就是在svn中建立版本库，比如我本地的是版本库路径是C:\Repositories\pepos。pepos是我的版本库。在pepos的目录结构 pepos component webapps 在conf里面的auth里赋予的权限配置为 [groups]
浅谈程序员的数学修养 comsci 设计模式编程算法面试招聘
浅谈程序员的数学修养
批量执行 bulk collect与forall用法 daizj oracle sql bulk collect forall
BULK COLLECT 子句会批量检索结果，即一次性将结果集绑定到一个集合变量中，并从SQL引擎发送到PL/SQL引擎。通常可以在SELECT INTO、 FETCH INTO以及RETURNING INTO子句中使用BULK COLLECT。本文将逐一描述BULK COLLECT在这几种情形下的用法。有关FORALL语句的用法请参考：批量SQL之 F
Linux下使用rsync最快速删除海量文件的方法 dongwei_6688 OS
1、先安装rsync：yum install rsync 2、建立一个空的文件夹：mkdir /tmp/test 3、用rsync删除目标目录：rsync --delete-before -a -H -v --progress --stats /tmp/test/ log/这样我们要删除的log目录就会被清空了，删除的速度会非常快。rsync实际上用的是替换原理，处理数十万个文件也是秒删。
Yii CModel中rules验证规格 dcj3sjt126com rules yii validate
Yii cValidator主要用法分析： yii验证rulesit 分类： Yii yii的rules验证 cValidator主要属性 attributes ,builtInValidators,enableClientValidation,message,on,safe,skipOnError
基于vagrant的redis主从实验 dcj3sjt126com vagrant
平台: Mac 工具: Vagrant 系统: Centos6.5 实验目的: Redis主从实现思路制作一个基于sentos6.5, 已经安装好reids的box, 添加一个脚本配置从机, 然后作为后面主机从机的基础box 制作sentos6.5+redis的box mkdir vagrant_redis cd vagrant_
Memcached(二)、Centos安装Memcached服务器 frank1234 centos memcached
一、安装gcc rpm和yum安装memcached服务器连接没有找到，所以我使用的是make的方式安装，由于make依赖于gcc，所以要先安装gcc 开始安装，命令如下，[color=red][b]顺序一定不能出错[/b][/color]：建议可以先切换到root用户，不然可能会遇到权限问题：su root 输入密码...... rpm -ivh kernel-head
Remove Duplicates from Sorted List hcx2013 remove
Given a sorted linked list, delete all duplicates such that each element appear only once. For example,Given 1->1->2, return 1->2.Given 1->1->2->3->3, return&
Spring4新特性——JSR310日期时间API的支持 jinnianshilongnian spring4
Spring4新特性——泛型限定式依赖注入 Spring4新特性——核心容器的其他改进 Spring4新特性——Web开发的增强 Spring4新特性——集成Bean Validation 1.1(JSR-349)到SpringMVC Spring4新特性——Groovy Bean定义DSL Spring4新特性——更好的Java泛型操作API Spring4新
浅谈enum与单例设计模式 247687009 java 单例
在JDK1.5之前的单例实现方式有两种(懒汉式和饿汉式并无设计上的区别故看做一种)，两者同是私有构造器，导出静态成员变量，以便调用者访问。第一种 package singleton; public class Singleton { //导出全局成员 public final static Singleton INSTANCE = new S
使用switch条件语句需要注意的几点 openwrt c break switch
1. 当满足条件的case中没有break，程序将依次执行其后的每种条件（包括default）直到遇到break跳出 int main() { int n = 1; switch(n) { case 1: printf("--1--\n"); default: printf("defa
配置Spring Mybatis JUnit测试环境的应用上下文 schnell18 spring mybatis JUnit
Spring-test模块中的应用上下文和web及spring boot的有很大差异。主要试下来差异有：单元测试的app context不支持从外部properties文件注入属性 @Value注解不能解析带通配符的路径字符串解决第一个问题可以配置一个PropertyPlaceholderConfigurer的bean。第二个问题的具体实例是：
Java 定时任务总结一 tuoni java spring timer quartz timertask
Java定时任务总结一.从技术上分类大概分为以下三种方式： 1.Java自带的java.util.Timer类，这个类允许你调度一个java.util.TimerTask任务; 说明： java.util.Timer定时器，实际上是个线程，定时执行TimerTask类 &
一种防止用户生成内容站点出现商业广告以及非法有害等垃圾信息的方法 yangshangchuan rank 相似度计算文本相似度词袋模型余弦相似度
本文描述了一种在ITEYE博客频道上面出现的新型的商业广告形式及其应对方法，对于其他的用户生成内容站点类型也具有同样的适用性。最近在ITEYE博客频道上面出现了一种新型的商业广告形式，方法如下： 1、注册多个账号（一般10个以上）。 2、从多个账号中选择一个账号，发表1-2篇博文