序列特征提取之语义特征 Semantic feature

学习论文:G. Creech and J. Hu, "A Semantic Approach to Host-Based Intrusion Detection Systems Using Contiguousand Discontiguous System Call Patterns," in IEEE Transactions on Computers, vol. 63, no. 4, pp. 807-819, April 2014, doi: 10.1109/TC.2013.13.

 

论文提取了一种新颖的sementic feature用于系统调用序列进行异常检测

提取方法分为三步

First, the training data must be processed to extract a dictionary containing every contiguous system call trace present in the training samples. This step is equivalent to using multiple window lengths under Forrest’s methodology [20], [25], [26], [46] and [47], where the maximum window length allowed is in fact the length of each trace. Each dictionary entry extracted at this stage forms a conceptual ‘word’, or a ‘phrase’ of length 1.

1、提取单词

      长度为napi连续调用子序列为一个单词( n >= 2

      训练序列得到的所有单词:组成单词字典

:一个序列为 12345

    可以得到单词[12],[23],[3,4],[4,5]

                    [1,2,3],[2,3,4],[3,4,5]

                    [1,2,3,4],[2,3,4,5]

                    [1,2,3,4,5]

Second, these words are then used to construct further dictionaries consisting of every possible combination of the words up to a specified phrase length. 

2、组合短语

  任意n个单词进行组合得到的所有可能:长度为n的短语字典(长度为1的短语词典即为单词词典)

  例如:单词字典{[1,2],[2,3],[3,4]}  :可以得到的长度为2的短语字典[1,2,2,3],[2,3,1,2],…       可以得到的长度为3的短语字典[1,2,2,3,3,4],[2,3,1,2,3,4],…

extract occurrence counts of these different length phrases. 

3、得到语义特征向量

      输出语义特征向量[x1,x2,x3,x4,xn]代表:长度为n的短语字典中的短语在预测序列中的出现种类数。

       例如:长为1的短语字典中有10种在序列中出现,则该序列特征向量的x1=10

你可能感兴趣的:(论文学习)