在此阅读[ICSE 2019 Demo] Defexts: A Curated Dataset of Reproducible Real-World Bugs for Modern JVM Languages。
在下载这篇文章的时候,上面标注的是ICSE 2019 Demo,想必不会错,且出自软件测试领域大牛之手,值得一读。
Benton S, Ghabari A, Zhang L. Defexts: A Curated Dataset of Reproducible Real-World Bugs for Modern JVM Languages[J].
第三作者:Lingming Zhang。很厉害的学者
其主页:http://www.utdallas.edu/~lxz144130/
先引出话题:
Software engineering studies, such as bug detection, localization, repair, and prediction, often require benchmark bug datasets for their experiments.
指出问题,和自己工作的必要性:
Few publicly available reproducible bug datasets exist for research consumption. Such datasets which publicly exist tend to be applicable exclusively towards the most popular traditional programming languages (e.g., Defects4J for Java and CoreBench for C). Thus, the creation and widespread usage of bug datasets for other popular modern JVM (Java Virtual Machine) programming languages serve to provide vital resources for software engineering research.
开始介绍自己的工作:
This paper introduces Defexts, a family of bug datasets currently containing child datasets for Kotlin (DefextsKotlin) and Groovy (DefextsGroovy). Each dataset contains reproducible real-world bugs and their corresponding patches scraped from real-world projects. Our introductory versions of DefextsKotlin and DefextsGroovy include 225 Kotlin and 302 Groovy bugs and patches. As development of Defexts continues, we aim to include other JVM languages, notably Scala. A video demonstration of Defexts is located at following link: https://youtu.be/lenYcVzRGGQ
大意是:
软件工程领域的研究,如缺陷检测、定位、修复、预测,经常需要benchmark 缺陷数据集来做实验。
但是呢,现在的开源可用的reproducible(可复现的)缺陷数据集真的很少,研究根本不够用。这样的数据集呢,还常常是专门为最流行的传统程序语言(比如:defects4j针对java,CoreBench针对C)打造的。所以,针对其他流行现代JVM(即java 虚拟机)程序语言的缺陷数据集的创建和广泛使用可以为软件工程研究提供至关重要的资源。
本文介绍了Defexts,即一系列缺陷数据集,包括针对 KotLin和Groovy 语言的子数据集。每个数据集都包含了缺陷和对应的补丁(来自real-world projects)。我们介绍的版本包括225个Kotlin缺陷和302个Groovy缺陷。未来,随着Defexts持续开发,我们还会考虑包括其他JVM 语言,尤其是Scala。这里还有个YouTube视频可以看。
1)想知道bug detection, localization, and prediction 这三个概念的区别。
Fields such as automated bug detection [1], localization [2], repair [3], and prediction [4] benefit from benchmark buggy programs.
[1] S. Wang, D. Chollak, D. Movshovitz-Attias, and L. Tan, “Bugram: bug detection with n-gram language models,” in ICSE, 2016.
[2] S. Pearson, J. Campos, R. Just, G. Fraser, R. Abreu, M. D. Ernst, D. Pang, and B. Keller, “Evaluating and improving fault localization,” in ICSE, 2017.
[3] C. L. Goues, T. Nguyen, S. Forrest, and W. Weimer, “Genprog: A generic method for automatic software repair,” IEEE TSE, January 2012.
[4] T. J. Ostrand, E. J. Weyuker, and R. M. Bell, “Predicting the location and number of faults in large software systems,” IEEE TSE, April 2005
如上,有一些对应的参考文献。
2)总共有哪些缺陷程序数据集呢。比较好奇
raditional programming languages, such as Java, already have multiple bug datasets, such as Defect4J [6], Bug.Jar [7], and BugSwarm [8].
In particular, Defect4J, a dataset currently with 395 reproducible Java bugs, has been directly cited in 200+ software engineering papers since its release in 2014, and has had a significant impact on software engineering research.
确实,如上,defects4j算是很厉害的贡献了,我感觉2014年之后,java的修复工具真的多了很多。这背后可以说一部分功劳是defects4j的。
此外,还有如下:
QuixBugs [18] is yet another dataset for Java that has recently attracted the attention of APR research community [19]. The recent BugSwarm dataset [8] includes both Java and Python bugs. Other widely used datasets mainly focus on C/C++, including ManyBugs [20], IntroClass [20], CodeFlaws [21], CoreBench [22], DbgBench [23]. None of the aforementioned datasets focus on modern JVM languages.
3)这个benchmark是怎么组织,构建起来的呢?很好奇
首先,利用Github的搜索API,来搜索项目:
A. Scraping Public GitHub Projects
Due to our familiarity with GitHub’s Search API, we chose to exclusively search for projects within GitHub for Defexts’ initial version
其次,搜索潜在的修复bug的程序提交(commits):(这个我有点懂,就是每次commit的时候,会写bug fix之类的comment,这样的话还是很好分辨的对吧。)
B. Searching for Potentially Bug-fixing Project Commits
In total, we extracted 49,982 buggy commits from 93,321 Kotlin projects and 26,438 buggy commits from 43,576 Groovy projects based on this search criteria (see Table I).
最后呢,就是验证补丁啦:
C. Verifying Patches
Thus, commits incompatible with JDK 1.8, Gradle 4.8, Maven 3.3.9, or utilizing other build systems were automatically excluded from Defexts.
- Automated Patch Verification
- Manually Removing False Positive Patches
With our best efforts, all entries within our final dataset contain a failing test suite at PC−1 and exhibit a fully passing test suite at PC.
如上,这个挺厉害的。就是在找补丁的时候,选出一个子集这种,然后从PC1到PC,必须是fail to pass。大概是这样的。不是fail-pass类型的commits全部被舍掉。 我好像懂了:PC-1就是缺陷程序,PC就是补丁程序。
看了之后,感觉做benchmark还是有方法的,没有想象的那种无法完成的难度。
但是呢,在此过程也要做很多取舍,详见作者文章。
1)看到了第三作者 Lingming Zhang 的research interest:
Software Engineering, in particular: Test Generation, Regression Testing, Mutation Testing, Automated Debugging, Program Transformation and Analysis.
Formal Methods and Programming Languages, in particular: Symbolic Execution, Model Checking, Dynamic Invariant Inference, First-Order Logic, and Points-to Analysis.
如上,大概是:
软件工程,尤其是:测试生成,回归测试,变异测试,自动修复,程序转换,程序分析。
形式化方法和程序语言,尤其是:符号执行,模型检验,动态不变时推理,一阶逻辑,指向分析。
感觉自己好像要学的东西都在这里了,顺便感觉自己的基础很差。
此外,CVE方面的论文应该着重看看了,不然没时间了,假期要结束了,不能再看这些软件测试的文章了。
2)文章写得确实很有水平:
Thus, the creation and widespread usage of bug datasets for other popular modern JVM (Java Virtual Machine) programming languages serve to provide vital resources for software engineering research.
这么一下,重要性瞬间突出。
3)文章创新性还是有的,未来还要考虑Scala语言,很有意思。
4)看了摘要,感觉现在的benchmark领域竞争也不容易啊,
总之,什么都不容易,但是什么都可以做。没有说是不能做的。