12.Automated Vulnerability Detection in Source Code Using Minimum IntermediateRepresentationLearning

12.Automated Vulnerability Detection in Source Code Using Minimum IntermediateRepresentationLearning_第1张图片

 

1.Introduction

现存的智能漏洞检测方法存在的问题:

(1) long-term dependency between code elements.

(2) out-of-vocabulary (OoV) issue

(3) coarse  detection granularity

(4) lack of vulnerability dataset.

 

2.Related Work

2.1. Intelligent Vulnerability Detection

 

2.2. Program Understanding Model

 
1)sequence
 
The sequential understanding model converts the source code into a sequence in a certain order, including character [ 28 ], token [ 29 ] and API [ 30 ].
 
优缺点:It retains native information. However, it is a ffff ected by long-term dependency
 
2)structure
 
The structural program understanding model includes Abstract Syntax Tree (AST) [ 24, 31], Control Flow Graph
(CFG) [ 32 ], Program Dependence Graph (PDG) [ 33 ], and Code Property Graphs (CPG) [ 34 ].
优缺点:Although CPG(代码属性图) provides accurate and datailed informtion, it compromises the efficiency of detection due to the increased data to be analyzed. Furthermore, structural representation is more complicated than sequential representation.
 
 
 
natural code sequence  ---->    structured model   ----->    sequence model      (Vuldeepecker   SySeVR)
 
 
 
 

4.Proposed Approach

4.1. Overview

12.Automated Vulnerability Detection in Source Code Using Minimum IntermediateRepresentationLearning_第2张图片
 
 
 
 

4.3 Pre-Training

Compared with traditional encoding methods, such as one-hot, term frequency–inverse document frequency (TF-IDF), n-gram, distributed representation is denser.
 
the Continuous Bag-of-Words (CBOW) model is leveraged to obtain distributed vector representations
 
 

4.4. High-Level Feature Learning

a representation learning method

训练了6个,CNN性能最好  最后选择了这个

5. Experiments and Results

5.1. Evaluation Metrics

False Positive Rate(FPR)

False Negative Rate(FNR)

Precision(P)

Recall(R)

F1-measure(F1)

 

5.2. Experimental Setup

12.Automated Vulnerability Detection in Source Code Using Minimum IntermediateRepresentationLearning_第3张图片

We set train data:test data = 7:3, and used 10-fold cross validation to choose super parameters
 
 
 
12.Automated Vulnerability Detection in Source Code Using Minimum IntermediateRepresentationLearning_第4张图片
 

 

5.3. Comparison of Difffferent Neural Networks

12.Automated Vulnerability Detection in Source Code Using Minimum IntermediateRepresentationLearning_第5张图片

12.Automated Vulnerability Detection in Source Code Using Minimum IntermediateRepresentationLearning_第6张图片
 
 

5.4. Effffectiveness of Pre-Training

 
12.Automated Vulnerability Detection in Source Code Using Minimum IntermediateRepresentationLearning_第7张图片
 
 
 
 

5.5. Ability to Detect Difffferent Vulnerabilities

12.Automated Vulnerability Detection in Source Code Using Minimum IntermediateRepresentationLearning_第8张图片

12.Automated Vulnerability Detection in Source Code Using Minimum IntermediateRepresentationLearning_第9张图片

 

 

5.6. Comparative Analysis

VulDeePecker [ 25 ] has no representation learning stage, and it uses one dataset through the entire method

12.Automated Vulnerability Detection in Source Code Using Minimum IntermediateRepresentationLearning_第10张图片

 

 

总结

数据集用的还是vuldeepecker的数据集   没有进行函数名和变量名映射,改为了pre-trained on the extended corpus  

 

 

 

 

你可能感兴趣的:(论文)