A Comparative Analysis of Deep Learning Approaches for Network Intrusion Detection Systems (N-IDSs)



数据:KDDCup-99/ NSL-KDD/ UNSW-NB15 NIDS数据集




过去几年,行业和组织经常研究的一个关键领域是入侵检测 (ID)。它是网络安全的重要途径。机器学习的许多概念和方法被转移到 ID,目的是提高区分系统异常行为和正常网络行为的性能。(安德森,1980) 是通过 1931 年发表的一篇论文“计算机安全威胁监视和监视”对 ID 工作的最初贡献者。从根本上说,IDS 根据网络类型及其行为分为两种类型,例如 (1) 基于网络的 IDS (N-IDS):根据网络流量中数据包之前的数据来识别恶意活动 (2) 主机基础 IDS:根据软件日志、系统日志、传感器等日志文件的内容,文件系统、特定主机或系统的磁盘资源。一个组织使用网络和基于主机的系统的交叉来有效地攻击实时环境中的恶意活动。

(Anderson, 1980) is an initial contributor towards the work in ID through a paper “Computer Security threat monitoring and surveillance” published in 1931. Fundamentally, the IDSs are categorized into two types based on the network type and its behaviors such as (1) network basis IDS (N-IDS): depend as far as the data prior to packets in network traffic to identify the malicious activities (2) host basis IDS: rely on the contents as far as the log files such as software logs, system logs, sensors, file systems, disk resources of particular host or a system. An organization uses the intercross as far as network and host-based system to effectively attack the malicious activities in real time environment. This has become an indispensable part of ICT systems and networks. However, the performances of detecting the unforeseen attacks are not acceptable with the existing traditional approaches in N-IDS.



1.检测TCP syn Flooding攻击
解决方式:当客户端发出的建立TCP连接的SYN包时,便跟踪记录此连接的状态,直到成功完成或超时。同时,统计在规定时间内,接受到这种SYN包的个数超过了某个规定的临界值,则发生TCP Syn Flooding攻击o





Machine learning (ML) methods are current prominent methods used largely for IDS. These ML based solutions to real-time IDS is not an effective approach mainly due to the model’s outputs in a high false positive rate and ineffective in identifying the novel intrusions (Lee, Fan, Miller, Stolfo, & Zadok, 2002). The main reason is that the machine learning models learns the attack patterns of simple features of TCP/IP packets locally. However, the recent development of machine learning models resulted in a robust and advanced learning technique, named as ‘deep learning’. Deep learning models have achieved significant results in various fields includes natural language processing (NLP), image processing (IP) and speech recognition (SR) (LeCun, Bengio, & Hinton, 2015) comes under the purview of artificial intelligence (AI) tasks. Deep learning approaches have two essential characteristics (1) Ability to learn the complex hierarchical feature representation of TCP/IP packets globally (2) Ability to memorize the past information in large sequences of TCP/IP packets. The performance of deep learning methods is transferred to ID (Staudemeyer, & Omlin, 2014; (Staudemeyer, 2015; Kim, & Kim, 2017). Moreover, recently (Hodo, Bellekens, Hamilton, Tachtatzis, & Atkinson, 2017) outlined the taxonomies and the precursory works of trivial deep learning algorithms to ID. Following, this paper compares the effectiveness of IRNN and other approaches introduced to solving the long-range temporal dependencies for N-IDS. Both LSTM and IRNN network is complex and remained as a black-box. This makes reverse engineering the system with exact same specifications by a malicious adversary quite impossible unless he/she is in possession of the exact same training sample used to build the system.

深度学习方法有两个基本特征 (1) 能够全局学习 TCP/IP 数据包的复杂分层特征表示 (2) 能够记住大序列 TCP/IP 数据包中的过去信息。深度学习方法的性能转移到 ID(Staudemeyer 和 Omlin,2014 年; 斯塔德迈尔,2015 年;金和金,2017)。此外,最近(Hodo、Bellekens、Hamilton、Tachtatzis 和 Atkinson,2017) 概述了 ID 的琐碎深度学习算法的分类法和先驱工作。


比较了 IRNN 和其他引入解决 N-IDS 的长期时间依赖性的方法的有效性。LSTM 和 IRNN 网络都很复杂,仍然是一个黑盒。这使得恶意对手完全不可能以完全相同的规格对系统进行逆向工程,除非他/她拥有用于构建系统的完全相同的训练样本。(这也可以作为一个使用深度学习做网络入侵检测的优势)



第二个方向是在网络结构的递归隐藏层中引入复杂的组件;(Hochreiter, & Schmidhuber, 1997)引入了长短时记忆(LSTM),LSTM网络的一个变种,减少了参数设置;门控递归单元(GRU),以及带时钟频率的循环神经网络(CWRN)。

RNN is mainly used for sequential data modeling in which the hidden sequential relationships in variable length input sequences is learnt by them.

RNN mechanism has significantly performed well in the field of NLP and SR (LeCun, Bengio, & Hinton, 2015). In initial time the applicability of ReLU activation function in RNN was not successful due to the fact that RNN results in large outputs. As the research evolved, authors showed that RNN outputs vanishing and exploding gradient problem in learning long range temporal dependencies of large scale sequence data modeling. To overcome this issue, research on RNN progressed on the 3 significant directions. One was towards on improving optimization methods in algorithms; Hessian-free optimization methods belong to this category (Martens, 2010). Second one was towards introducing complex components in recurrent hidden layer of network structure; (Hochreiter, & Schmidhuber, 1997) introduced long short-term memory (LSTM), a variant of LSTM network reduced parameters set; gated recurrent unit (GRU) (Cho, Van Merriënboer, Gulcehre, Bahdanau, Bougares, Schwenk, & Bengio, 2014), and clock-work RNN (CWRNN) (Koutnik, Greff, Gomez, & Schmidhuber, 2014). Third one was towards the appropriate weight initializations; recently, (Le, Jaitly, & Hinton, 2015) authors have showed RNN with ReLU involving an appropriate initialization of identity matrix to a recurrent weight matrix is able to perform closer in the performance in compared to LSTM. This was substantiated with evaluating the 4 experiments on two toy problems, language modeling and SR. They named the newly formed architecture of RNN as identity-recurrent neural network (IRNN). The basic idea behind IRNN is that, while in the case of deficiency in inputs, the RNN stays in same state indefinitely in which the RNN is composed of ReLU and initialized with identity matrix.


As further the research on RNN in handling vanishing and exploding gradient issue, (Hochreiter, & Schmidhuber, 1997) introduced long short-term memory (LSTM) that followed entirely a new kind of architecture to enhance the storing capacity of values for long time-steps.



(Le, Jaitly, & Hinton, 2015) proposed a new RNN, named as identity-recurrent neural network (IRNN) with minor changes to RNN that has significantly performed well in capturing long-range temporal dependencies.





A Comparative Analysis of Deep Learning Approaches for Network Intrusion Detection Systems (N-IDSs)_第1张图片
NSL-KDD是KDDCup-99的一个过滤版本。应用的过滤器是:(1),重复的连接记录被删除,因此它保护分类器不偏向于频繁的连接记录。(2)在索引号136,489和136,497中存在的测试连接记录被完全删除。(3) NSL-KDD的连接记录是随机选择的,保持难度与KDDCup-99成反比。(4)训练和测试数据集中现有的连接向量数量是对每类难度的合理补充。(5) NSL-KDD的记录在训练和测试数据中都是平衡的。NSL-KDD的性能对于误用或异常检测是可以接受的。即使如此,NSL-KDD在代表真实世界的网络流量特征方面仍有不足。其他问题有:(1)NSL-KDD攻击数据包的生存时间值为126或253,而不是127或254。训练和测试数据之间的攻击向量的概率分布是不唯一的。因此,机器学习分类器对更频繁的连接记录有偏见或倾斜。NSL-KDD并不能完全代表正常和攻击的现时连接向量。

NSL-KDD is a filtered version of KDDCup-99 (Tavallaee, Bagheri, Lu, & Ghorbani, 2009). The applied filters are (1), duplicate connection records were removed and as a result it protects the classifier from being biased towards the frequent connection records. (2) Test Connection records existing in index number 136,489 and 136,497 were removed entirely. (3) The connection records for NSL-KDD were chosen randomly with maintaining the degree of difficulty inversely proportional to KDDCup-99. (4) The existing number of connection vectors in train and test data set is reasonable complement to each class difficulty levels. (5) The records of NSL-KDD are balanced in both the train and test data. NSL-KDD performance is acceptable for misuse or anomaly detection. Even, NSL-KDD lack behind in representing the characteristics of real world network traffic. The other issues are (1) Instead of time to live value as 126 or 253 the NSL-KDD attack packets have 127 or 254 (McHugh, J. 2000). The probability distribution of attack vectors between train and test data are not unique (Mahoney, M. V., & Chan, P. K. 2003, September). As a result, the machine learning classifiers are biased or skewed towards the more frequent connection records. The NSL-KDD is not a complete representative of present-time connection vectors of normal and attacks.


To overcome the reported issues of KDDCup-99 and NSL-KDD, the Australian Centre for Cyber Security group introduced UNSW-NB15 (Moustafa, & Slay, 2016).
A Comparative Analysis of Deep Learning Approaches for Network Intrusion Detection Systems (N-IDSs)_第2张图片











通过IRNN和RNN变体结构的各种实验,详细研究了网络结构及其参数背后的原理。用完整的数据集和最小的特征集对实验进行了评估,以了解每个特征的重要性。IRNN和RNN变体在 "DoS "和 "Probe "攻击中表现出有效的性能,因为它们形成了独特的网络事件的时间序列。与KDDCup-99挑战赛的胜利作品相比,IRNN对低频攻击的分类性能良好。这可能会通过促进训练或在现有的架构上再堆叠几层,或在现有的数据上增加新的特征来改善。**在大多数情况下,低频攻击类别产生一个单一的连接记录。当这些低频攻击的信息隐藏在其他连接记录中时,这些低频攻击的提取显得很困难。**总的来说,RNN及其变体在检测率方面的表现比KDDCup-99挑战赛的胜利作品和其他先前公布的结果都要好。

