Deep Learning for Anomaly Detection: A Survey
https://www.researchgate.net/publication/330357393_Deep_Learning_for_Anomaly_Detection_A_Survey
10 Deep Anomaly Detection (DAD) Models
10.1 Supervised deep anomaly detection 有监督的深度异常检测
10.2 Semi-supervised deep anomaly detection 半监督深度异常检测
10.3 Hybrid deep anomaly detection 混合深度异常检测
10.4 One-class neural networks (OC-NN) for anomaly detection 用于异常检测的一类神经网络(OC-NN)
10.5 Unsupervised Deep Anomaly Detection 无监督的深度异常检测
10.6 Miscellaneous Techniques 各种各样的技术
10.6.1 Transfer Learning based anomaly detection 基于迁移学习的异常检测
10.6.2 Zero Shot learning based anomaly detection 基于零次/零样本学习的异常检测
10.6.3 Ensemble based anomaly detection 基于集成的异常检测
10.6.4 Clustering based anomaly detection 基于聚类的异常检测
10.6.5 Deep Reinforcement Learning (DRL) based anomaly detection 基于深度强化学习(DRL)的异常检测
10.6.6 Statistical techniques deep anomaly detection 统计技术深度异常检测
In this section, we discuss various DAD models classified based on the availability of labels and training objective. For each model types domain, we discuss the following four aspects:
在本节中,我们讨论根据标签的可用性和培训目标分类的各种DAD模型。对于每个模型类型域,我们讨论以下四个方面:
—assumptions; 假设;
—type of model architectures; 模型架构的类型;
—computational complexity; 计算复杂度;
—advantages and disadvantages; 优点和缺点;
Supervised anomaly detection techniques are superior in performance compared to unsupervised anomaly detection techniques since these techniques use labeled samples (Gornitz et al. [2013]). Supervised anomaly detection learns the separating boundary from a set of annotated data instances (training) and then, classify a test instance into either normal or anomalous classes with the learned model (testing).
有监督的异常检测技术比无监督的异常检测技术具有更高的性能,因为这些技术使用标记的样本(Gornitz等人,2013年)。有监督的异常检测从一组带注释的数据实例中学习分离边界(训练),然后使用学习的模型将测试实例分为正常类或异常类(测试)。
Assumptions: Deep supervised learning methods depend on separating data classes whereas unsupervised techniques focus on explaining and understanding the characteristics of data. Multi-class classification based anomaly detection techniques assumes that the training data contains labeled instances of multiple normal classes (Shilton et al. [2013], Jumutc and Suykens [2014], Kim et al. [2015], Erfani et al. [2017]). Multi-class anomaly detection techniques learn a classifier to distinguish between anomalous class from the rest of the classes. In general, supervised deep learning-based classification schemes for anomaly detection have two sub-networks, a feature extraction network followed by a classifier network. Deep models require a substantial number of training samples (in the order of thousands or millions) to learn feature representations to discriminate various class instances effectively. Due to, lack of availability of clean data labels supervised deep anomaly detection techniques are not so popular as semi-supervised and unsupervised methods.
Assumptions: 深度有监督的学习方法依赖于分离数据类,而无监督技术则专注于解释和理解数据的特征。基于多类别分类的异常检测技术假设训练数据包含多个正常类别的标记实例(Shilton等人[2013],Jumutc和Suykens [2014],Kim等人[2015],Erfani等人[2017] ])。多类异常检测技术学习分类器,以区分异常类和其余类。通常,有监督的基于深度学习的异常检测分类方案具有两个子网,一个特征提取网络,后跟一个分类器网络。深度模型需要大量的训练样本(成千上万个)来学习特征表示,以有效地区分各种类实例。由于缺乏干净的数据标签,有监督的深度异常检测技术并不像半监督和无监督方法那样流行。
Computational Complexity: The computational complexity of deep supervised anomaly detection methods based techniques depends on the input data dimension and the number of hidden layers trained using back-propagation algorithm. High dimensional data tend to have more hidden layers to ensure meaning-full hierarchical learning of input features.The computational complexity also increases linearly with the number of hidden layers and require greater model training and update time.
Computational Complexity: 基于深度有监督的异常检测方法的技术的计算复杂度取决于输入数据维度和使用反向传播算法训练的隐藏层数。高维数据倾向于具有更多隐藏层,以确保对输入特征进行完全意义的分层学习。计算复杂度也随着隐藏层数的增加而线性增加,并且需要更多的模型训练和更新时间。
Advantages and Disadvantages: The advantages of supervised DAD techniques are as follows: 有监督的DAD技术的优点如下:
The disadvantages of Supervised DAD techniques are as follows: 有监督的DAD技术的缺点如下:
Semi-supervised or (one-class classification) DAD techniques assume that all training instances have only one class label. A review of deep learning based semi-supervised techniques for anomaly detection is presented by Kiran et al. [2018] and Min et al. [2018]. DAD techniques learn a discriminative boundary around the normal instances. The test instance that does not belong to the majority class is flagged as being anomalous (Perera and Patel [2018], Blanchard et al. [2010]). Various semi-supervised DAD model architectures are illustrated in Table 20.
半监督或(单类分类)DAD技术假设所有的训练实例只有一个类标签。Kiran等[2018]和Min等[2018]综述了基于深度学习的半监督异常检测技术。DAD技术在正常情况下学习有区别的边界。不属于多数类的测试实例被标记为异常(Perera和Patel [2018], Blanchard等人[2010])。各种半监督的DAD模型架构如表20所示。
Assumptions: Semi-supervised DAD methods proposed to rely on one of the following assumptions to score a data instance as an anomaly. 半监督DAD方法建议依靠以下假设之一对数据实例评分为异常。
Computational Complexity: The computational complexity of semi-supervised DAD methods based techniques is similar to supervised DAD techniques, which primarily depends on the dimensionality of the input data and the number of hidden layers used for representative feature learning. 基于技术的半监督的DAD方法的计算复杂度类似于有监督的DAD技术,主要取决于输入数据的维数和用于代表性特征学习的隐藏层数。
Advantages and Disadvantages: The advantages of semi-supervised deep anomaly detection techniques are as follows: 半监督的深度异常检测技术的优点如下:
The fundamental disadvantages of semi-supervised techniques presented by (Lu [2009]) are applicable even in a deep learning context. Furthermore, the hierarchical features extracted within hidden layers may not be representative of fewer anomalous instances hence are prone to the over-fitting problem. (Lu [2009])提出的半监督技术的基本缺点甚至适用于深度学习环境。此外,在隐藏层中提取的分层特征可能无法代表更少的异常实例,因此容易出现过拟合问题。
Deep learning models are widely used as feature extractors to learn robust features (Andrews et al. [2016a]). In deep hybrid models, the representative features learned within deep models are input to traditional algorithms like one-class Radial Basis Function (RBF), Support Vector Machine (SVM) classifiers. The hybrid models employ two step learning and are shown to produce state-of-the-art results (Erfani et al. [2016a,b], Wu et al. [2015b]). Deep hybrid architectures used in anomaly detection is presented in Table 21.
深度学习模型被广泛用作特征提取器来学习鲁棒特征(Andrews et al. [2016a])。在深度混合模型中,在深度模型中学习到的代表性特征被输入到传统的算法中,如单类径向基函数(RBF)、支持向量机(SVM)分类器。混合模型采用了两步学习,并被证明可以产生最先进的结果(Erfani et al. [2016a,b], Wu et al. [2015b])。用于异常检测的深度混合架构如表21所示。
Assumptions: The deep hybrid models proposed for anomaly detection rely on one of the following assumptions to detect outliers: 提出的用于异常检测的深度混合模型依赖于以下假设之一来检测离群值:
Computational Complexity :
The computational complexity of a hybrid model includes the complexity of both deep architectures as well as traditional algorithms used within. Additionally, an inherent issue of non-trivial choice of deep network architecture and parameters which involves searching optimized parameters in a considerably larger space introduces the computational complexity of using deep layers within hybrid models. Furthermore considering the classical algorithms such as linear SVM which has prediction complexity of O(d) with d the number of input dimensions. For most kernels, including polynomial and RBF, the complexity is O(nd) where n is the number of support vectors although an approximation O( d 2 d_2 d2) is considered for SVMs with an RBF kernel.
混合模型的计算复杂度包括深度架构以及内部使用的传统算法的复杂度。此外,深度网络结构和参数的选择涉及到在相当大的空间中搜索优化参数,这一固有问题引入了混合模型中使用深度层的计算复杂度。此外,考虑经典算法,例如线性SVM,其预测复杂度为O(d),输入维数为d。对于大多数内核(包括多项式和RBF),复杂度为O(nd),其中n是支持向量的数量,尽管对于具有RBF内核的SVM,考虑了近似值O( d 2 d_2 d2)。
Advantages and Disadvantages
The advantages of hybrid DAD techniques are as follows: 混合DAD技术的优点如下:
The significant disadvantages of hybrid DAD techniques are: 混合DAD技术的主要缺点是:
One-class neural networks (OC-NN) combines the ability of deep networks to extract a progressively rich representation of data alongwith the one-class objective, such as a hyperplane (Chalapathy et al. [2018a]) or hypersphere (Ruff et al. [2018a]) to separate all the normal data points from the outliers. The OC-NN approach is novel for the following crucial reason: data representation in the hidden layer are learned by optimizing the objective function customized for anomaly detection as illustrated in The experimental results in (Chalapathy et al. [2018a], Ruff et al. [2018a]) demonstrate that OC-NN can achieve comparable or better performance than existing state-of-the-art methods for complex datasets, while having reasonable training and testing time compared to the existing methods.
一类神经网络(OC-NN)结合了深度网络的能力(逐步提取丰富的数据表示)以及一类目标,例如超平面(Chalapathy等人,[2018a])或超球体(Ruff等人) [2018a])将所有正常数据点与离群值分开。 OC-NN方法之所以新颖,原因有以下几个关键原因:隐藏层中的数据表示是通过优化针对异常检测定制的目标函数来学习的,如(Chalapathy et al。[2018a],Ruff et al。 [2018a])中的实验结果表明,对于复杂的数据集,OC-NN可以实现比现有最先进的方法类似或更好的性能,并且与现有方法相比具有合理的训练和测试时间。
Assumptions: The OC-NN models proposed for anomaly detection rely on the following assumptions to detect outliers: 提出的用于异常检测的OC-NN模型基于以下假设来检测异常值:
Computational Complexity: The Computational complexity of an OC-NN model as against the hybrid model includes only the complexity of the deep network of choice (Saxe et al. [2011]). OC-NN models do not require data to be stored for prediction, thus have very low memory complexity. However, it is evident that the OC-NN training time is proportional to the input dimension. 与混合模型相比,OC-NN模型的计算复杂度仅包括选择的深度网络的复杂度(Saxe等人[2011])。 OC-NN模型不需要为预测而存储数据,因此内存复杂度非常低。但是,很明显,OC-NN训练时间与输入维度成正比。
Advantages and Disadvantages: The advantages of OC-NN are as follows:
OC-NN models jointly train a deep neural network while optimizing a data-enclosing hypersphere or hyperplane in output space. OC-NN模型联合训练一个深度神经网络,同时优化输出空间中的数据封闭超球面或超平面。
OC-NN propose an alternating minimization algorithm for learning the parameters of the OC-NN model. We observe that the subproblem of the OC-NN objective is equivalent to a solving a quantile selection problem which is well defined. OC-NN提出了一种交替最小化算法,用于学习OC-NN模型的参数。我们观察到OC-NN目标的子问题等同于解决定义明确的分位数选择问题。
The significant disadvantages of OC-NN for anomaly detection are:
Unsupervised DAD is an essential area of research in both fundamental machine learning research and industrial applications. Several deep learning frameworks that address challenges in unsupervised anomaly detection are proposed and shown to produce a state-of-the-art performance as illustrated in Table 22. Autoencoders are the fundamental unsupervised deep architectures used in anomaly detection (Baldi [2012]).
无监督DAD是基础机器学习研究和工业应用中必不可少的研究领域。提出了一些针对无监督异常检测中的挑战的深度学习框架,并显示出它们产生的最新性能,如表22所示。自动编码器是用于异常检测的基本无监督深度架构(Baldi [2012])。
Assumptions: The deep unsupervised models proposed for anomaly detection rely on one of the following assumptions to detect outliers: 提出的用于异常检测的深层无监督模型依赖于以下假设之一来检测异常值:
Computational Complexity: The autoencoders are the most common architecture employed in outlier detection with quadratic cost, the optimization problem is non-convex, similar to any other neural network architecture. The computational complexity of model depends on the number of operations, network parameters, and hidden layers. However, the computational complexity of training an autoencoder is much higher than traditional methods such as Principal Component Analysis (PCA) since PCA is based on matrix decomposition (Meng et al. [2018], Parchami et al. [2017]).
自编码是二次代价离群点检测中最常用的结构,优化问题是非凸的,类似于任何其他神经网络结构。模型的计算复杂度取决于操作、网络参数和隐藏层的数量。然而,由于PCA基于矩阵分解(孟等[2018],Parchami等[2017]),因此训练一个自动编码器的计算复杂度比传统方法如主成分分析(PCA)要高得多。
Advantages and Disadvantages: The advantages of unsupervised deep anomaly detection techniques are as follows:
The significant disadvantages of unsupervised deep anomaly detection techniques are:
This section explores, various DAD techniques which are shown to be effective and promising, we discuss the key idea behind those techniques and their area of applicability.
本节将探讨各种DAD技术,这些技术被证明是有效和有前景的,我们讨论了这些技术背后的关键思想及其适用范围。
Deep learning for long has been criticized for the need to have enough data to produce good results. Both Litjens et al. [2017] and Pan et al. [2010] present the review of deep transfer learning approaches and illustrate their significance to learn good feature representations. Transfer learning is an essential tool in machine learning to solve the fundamental problem of insufficient training data. It aims to transfer the knowledge from the source domain to the target domain by relaxing the assumption that training and future data must be in the same feature space and have the same distribution. Deep transfer representation-learning has been explored by (Andrews et al. [2016b], Vercruyssen et al. [2017], Li et al. [2012], Almajai et al. [2012], Kumar and Vaidehi [2017], Liang et al. [2018]) are shown to produce very promising results. The open research questions using transfer learning for anomaly detection is, the degree of transfer-ability, that is to define how well features transfer the knowledge and improve the classification performance from one task to another.
长期以来,人们一直批评深度学习需要有足够的数据来产生良好的结果。Litjens等[2017]和Pan等[2010]都对深度迁移学习方法进行了综述,并说明了它们对于学习好的特征表示的意义。迁移学习是机器学习中解决训练数据不足这一根本问题的重要工具。它的目的是通过放松训练和未来数据必须在相同的特征空间和具有相同的分布的假设,将知识从源域迁移到目标域。(Andrews et al. [2016b], Vercruyssen et al. [2017], Li et al. [2012], Almajai et al. [2012], Kumar and Vaidehi [2017], Liang et al.[2018])对深度迁移表征学习进行了探索,显示出非常有前景的结果。利用迁移学习进行异常检测的开放性研究问题是:可转移性的程度,即定义特征如何从一个任务到另一个任务很好地迁移知识并提高分类性能。
Zero shot learning (ZSL) aims to recognize objects never seen before within training set (Romera-Paredes and Torr [2015]). ZSL achieves this in two phases: Firstly the knowledge about the objects in natural language descriptions or attributes (commonly known as meta-data) is captured Secondly this knowledge is then used to classify instances among a new set of classes. This setting is important in the real world since one may not be able to obtain images of all the possible classes at training. The primary challenge associated with this approach is the obtaining the meta-data about the data instances. However several approaches of using ZSL in anomaly and novelty detection are shown to produce state-of-the-art results (Mishra et al. [2017], Socher et al. [2013], Xian et al. [2017], Liu et al. [2017], Rivero et al. [2017]).
零样本学习(ZSL)旨在识别训练集中从未见过的物体(Romera-Paredes和Torr [2015])。 ZSL通过两个阶段实现了这一目标:首先,获取有关自然语言描述或属性中的对象的知识(通常称为元数据);其次,此知识将用于在一组新的类中对实例进行分类。此设置在现实世界中很重要,因为在训练中可能无法获得所有可能的类的图像。与这种方法相关的主要挑战是获取有关数据实例的元数据。然而,在异常和新颖性检测中使用ZSL的几种方法被证明可产生最新结果(Mishra等人[2017],Socher等人[2013],Xian等人[2017],Liu等人)等人[2017],Rivero等人[2017])。
A notable issue with deep neural networks is that they are sensitive to noise within input data and often require extensive training data to perform robustly (Kim et al. [2016]). In order to achieve robustness even in noisy data an idea to randomly vary on the connectivity architecture of the autoencoder is shown to obtain significantly better performance. Autoencoder ensembles consisting of various randomly connected autoencoders are experimented by Chen et al. [2017] to achieve promising results on several benchmark datasets. The ensemble approaches are still an active area of research which has been shown to produce improved diversity, thus avoid overfitting problem while reducing training time.
深度神经网络的一个显着问题是,它们对输入数据中的噪声敏感,并且通常需要大量的训练数据才能稳健地执行(Kim等人[2016])。为了即使在嘈杂的数据中也能实现鲁棒性,随机改变自动编码器的连通性结构的思想被证明可以获得更好的性能。Chen等人[2017]对由各种随机连接的自动编码器组成的自动编码器集成进行了实验,在多个基准数据集上取得可喜的结果。集成方法仍然是一个活跃的研究领域,已被证明可以产生更好的多样性,从而避免过度拟合的问题,同时减少了训练时间。
Several anomaly detection algorithms based on clustering have been proposed in literature (Ester et al. [1996]). Clustering involves grouping together similar patterns based on features extracted detect new anomalies. The time and space complexity grows linearly with number of classes to be clustered (Sreekanth et al. [2010]), which renders the clustering based anomaly detection prohibitive for real-time practical applications. The dimensionality of the input data is reduced extracting features within the hidden layers of deep neural network which ensures scalability for complex and high dimensional datasets. Deep learning enabled clustering approach anomaly detection utilizes e.g word2vec (Mikolov et al. [2013]) models to get the semantical presentations of normal data and anomalies to form clusters and detect outliers (Yuan et al. [2017]). Several works rely on variants of hybrid models along with auto-encoders for obtaining representative features for clustering to find anomalies.
文献中提出了几种基于聚类的异常检测算法(Ester et al.[1996])。聚类涉及到根据提取的特征将相似的模式组合在一起以检测新的异常。时间和空间复杂度随待聚类的数量线性增长(Sreekanth等人[2010]),这使得基于聚类的异常检测无法用于实时实际应用。减少了输入数据的维数,在深层神经网络的隐藏层中提取特征,保证了复杂高维数据集的可扩展性。基于深度学习的聚类异常检测方法,如word2vec (Mikolov et al.[2013])对正常数据和异常数据进行语义表示,形成聚类并检测离群值(Yuan et al.[2017])。一些工作依赖于混合模型的变体和自动编码器,以获得有代表性的特性,用于聚类以发现异常。
Deep reinforcement learning (DRL) methods have attracted significant interest due to its ability to learn complex behaviors in high-dimensional data space. Efforts to detect anomalies using deep reinforcement learning have been proposed by de La Bourdonnaye et al. [2017], Chengqiang Huang [2016]. The DRL based anomaly detector does not consider any assumption about the concept of the anomaly, the detector identifies new anomalies by consistently enhancing its knowledge through reward signals accumulated. DRL based anomaly detection is a very novel concept which requires further investigation and identification of the research gap and its applications.
深度强化学习(Deep reinforcement learning, DRL)方法因其在高维数据空间中学习复杂行为的能力而备受关注。de La Bourdonnaye等[2017]、Huang Chengqiang[2016]提出了利用深度强化学习检测异常的方法。基于DRL的异常检测器不考虑任何关于异常概念的假设,检测器通过不断积累的奖励信号增强其知识来识别新的异常。基于DRL的异常检测是一个非常新颖的概念,需要进一步研究和识别其研究缺口及其应用。
Hilbert transform is a statistical signal processing technique which derives the analytic representation of a real-valued signal. This property is leveraged by (Kanarachos et al. [2015]) for real-time detection of anomalies in health-related time series dataset and is shown to be a very promising technique. The algorithm combines the ability of wavelet analysis, neural networks and Hilbert transform in a sequential manner to detect real-time anomalies. The topic of statistical techniques DAD techniques requires further investigation to understand their potential and applicability for anomaly detections fully.
希尔伯特变换是一种统计信号处理技术,它推导出实值信号的解析表达式。该属性被(Kanarachos等人[2015])用于实时检测健康相关时间序列数据集中的异常,并被证明是一种非常有前景的技术。该算法结合了小波分析、神经网络和希尔伯特变换的能力,以序列的方式检测实时异常。统计技术这一主题下的DAD技术需要进一步研究,以充分了解其在异常检测方面的潜力和适用性。