NSL-KDD 数据集是著名的KDD’99数据集的修订版本,该数据集由四个子数据集组成:KDDTest+、KDDTest-21、KDDTrain+、KDDTrain+_20Percent。其中KDDTest-21 和 KDDTrain+_20Percent 是 KDDTrain+ 和 KDDTest+ 的子集。数据集每条记录包含 43 个特征,其中 41 个特征指的是流量输入本身,最后两个是标签(正常或攻击)和分数(流量输入本身的严重性)。
数据集中存在 4 种不同类型的攻击:拒绝服务 (DoS)、探测、用户到根 (U2R) 和远程到本地 (R2L)。每种攻击的简要说明如下:
每种攻击的不同子类的细分如下表:
每种攻击类型的数据分布如下:
数据集中提供的特征可以分为四类:内在、内容、基于主机和基于时间。以下是对不同类别功能的描述:
下表中可以看到分类特征的可能值的细分。有 3 个可能的协议类型值、60 个可能的服务值和 11 个可能的标志值。
Flag 中的每个值代表一个连接的状态,每个值的解释如下:
每个特征的描述和数据集的细分如下表:
# | Feature Name | Description | Type | Value Type | Ranges (Between both train and test) |
---|---|---|---|---|---|
1 | Duration | Length of time duration of the connection | Continuous | Integers | 0 - 54451 |
2 | Protocol Type | Protocol used in the connection | Categorical | Strings | |
3 | Service | Destination network service used | Categorical | Strings | |
4 | Flag | Status of the connection – Normal or Error | Categorical | Strings | |
5 | Src Bytes | Number of data bytes transferred from source to destination in single connection | Continuous | Integers | 0 - 1379963888 |
6 | Dst Bytes | Number of data bytes transferred from destination to source in single connection | Continuous | Integers | 0 - 309937401 |
7 | Land | If source and destination IP addresses and port numbers are equal then, this variable takes value 1 else 0 | Binary | Integers | { 0 , 1 } |
8 | Wrong Fragment | Total number of wrong fragments in this connection | Discrete | Integers | { 0,1,3 } |
9 | Urgent | Number of urgent packets in this connection. Urgent packets are packets with the urgent bit activated | Discrete | Integers | 0 - 3 |
10 | Hot | Number of “hot‟ indicators in the content such as: entering a system directory, creating programs and executing programs | Continuous | Integers | 0 - 101 |
11 | Num Failed Logins | Count of failed login attempts | Continuous | Integers | 0 - 4 |
12 | Logged In | Login Status : 1 if successfully logged in; 0 otherwise | Binary | Integers | { 0 , 1 } |
13 | Num Compromised | Number of "compromised” conditions | Continuous | Integers | 0 - 7479 |
14 | Root Shell | 1 if root shell is obtained; 0 otherwise | Binary | Integers | { 0 , 1 } |
15 | Su Attempted | 1 if "su root’’ command attempted or used; 0 otherwise | Discrete (Dataset contains ‘2’ value) | Integers | 0 - 2 |
16 | Num Root | Number of "root’’ accesses or number of operations performed as a root in the connection | Continuous | Integers | 0 - 7468 |
17 | Num File Creations | Number of file creation operations in the connection | Continuous | Integers | 0 - 100 |
18 | Num Shells | Number of shell prompts | Continuous | Integers | 0 - 2 |
19 | Num Access Files | Number of operations on access control files | Continuous | Integers | 0 - 9 |
20 | Num Outbound Cmds | Number of outbound commands in an ftp session | Continuous | Integers | { 0 } |
21 | Is Hot Logins | 1 if the login belongs to the "hot’’ list i.e., root or admin; else 0 | Binary | Integers | { 0 , 1 } |
22 | Is Guest Login | 1 if the login is a "guest’’ login; 0 otherwise | Binary | Integers | { 0 , 1 } |
23 | Count | Number of connections to the same destination host as the current connection in the past two seconds | Discrete | Integers | 0 - 511 |
24 | Srv Count | Number of connections to the same service (port number) as the current connection in the past two seconds | Discrete | Integers | 0 - 511 |
25 | Serror Rate | The percentage of connections that have activated the flag (4) s0, s1, s2 or s3, among the connections aggregated in count (23) | Discrete | Floats (hundredths of a decimal) | 0 - 1 |
26 | Srv Serror Rate | The percentage of connections that have activated the flag (4) s0, s1, s2 or s3, among the connections aggregated in srv_count (24) | Discrete | Floats (hundredths of a decimal) | 0 - 1 |
27 | Rerror Rate | The percentage of connections that have activated the flag (4) REJ, among the connections aggregated in count (23) | Discrete | Floats (hundredths of a decimal) | 0 - 1 |
28 | Srv Rerror Rate | The percentage of connections that have activated the flag (4) REJ, among the connections aggregated in srv_count (24) | Discrete | Floats (hundredths of a decimal) | 0 - 1 |
29 | Same Srv Rate | The percentage of connections that were to the same service, among the connections aggregated in count (23) | Discrete | Floats (hundredths of a decimal) | 0 - 1 |
30 | Diff Srv Rate | The percentage of connections that were to different services, among the connections aggregated in count (23) | Discrete | Floats (hundredths of a decimal) | 0 - 1 |
31 | Srv Diff Host Rate | The percentage of connections that were to different destination machines among the connections aggregated in srv_count (24) | Discrete | Floats (hundredths of a decimal) | 0 - 1 |
32 | Dst Host Count | Number of connections having the same destination host IP address | Discrete | Integers | 0 - 255 |
33 | Dst Host Srv Count | Number of connections having the same port number | Discrete | Integers | 0 - 255 |
34 | Dst Host Same Srv Rate | The percentage of connections that were to different services, among the connections aggregated in dst_host_count (32) | Discrete | Floats (hundredths of a decimal) | 0 - 1 |
35 | Dst Host Diff Srv Rate | The percentage of connections that were to different services, among the connections aggregated in dst_host_count (32) | Discrete | Floats (hundredths of a decimal) | 0 - 1 |
36 | Dst Host Same Src Port Rate | The percentage of connections that were to the same source port, among the connections aggregated in dst_host_srv_count (33) | Discrete | Floats (hundredths of a decimal) | 0 - 1 |
37 | Dst Host Srv Diff Host Rate | The percentage of connections that were to different destination machines, among the connections aggregated in dst_host_srv_count (33) | Discrete | Floats (hundredths of a decimal) | 0 - 1 |
38 | Dst Host Serror Rate | The percentage of connections that have activated the flag (4) s0, s1, s2 or s3, among the connections aggregated in dst_host_count (32) | Discrete | Floats (hundredths of a decimal) | 0 - 1 |
39 | Dst Host Srv Serror Rate | The percent of connections that have activated the flag (4) s0, s1, s2 or s3, among the connections aggregated in dst_host_srv_count (33) | Discrete | Floats (hundredths of a decimal) | 0 - 1 |
40 | Dst Host Rerror Rate | The percentage of connections that have activated the flag (4) REJ, among the connections aggregated in dst_host_count (32) | Discrete | Floats (hundredths of a decimal) | 0 - 1 |
41 | Dst Host Srv Rerror Rate | The percentage of connections that have activated the flag (4) REJ, among the connections aggregated in dst_host_srv_count (33) | Discrete | Floats (hundredths of a decimal) | 0 - 1 |
42 | Class | Classification of the traffic input | Categorical | Strings | |
43 | Difficulty Level | Difficulty level | Discrete | Integers | 0 - 21 |
数据集下载链接:https://www.unb.ca/cic/datasets/nsl.html
数据集详细介绍请参考:https://towardsdatascience.com/a-deeper-dive-into-the-nsl-kdd-data-set-15c753364657