我现在时间超级紧张,所以讲道理,这个上午至少看完所有工具,那就意味着基本上一篇文章最多只能给20分钟,不能再多了。
没办法,再抓紧一点吧,不想把工作拖到最后,今天+明天 就得干完所有的论文。(至少要大概了解:技术,idea来源,一些思路)
本文旨在讲述现有的自动修复工具Angelix。
Angelix: Scalable Multiline Program Patch Synthesis via Symbolic Analysis
作者:Sergey Mechtaev; Jooyong Yi; Abhik Roychoudhury
1) Today automated program repair have garnered interest.
2) Genprog and SPR, two search-based repair tools, has common limitations as they deteles functionality in thier many “plausible” repairs.
3) Semantics-based repair is promising but has limiations – scalability.
4) Our repair method, called Angelix, can scale up to programs of similar size as are handled by search-based repair methods such as GenProg and SPR.
5) Angelix deals with large-scale real-world software, generating repairs including multi-location repairs. Also, angelix automatically repaired the well-known HearyBleed vulnerability.
这个第5条很亮,很酷。
While such semantics-based repair methods show promise in terms of quality of generated repairs, their scalability has been a concern so far.
Various automated repair tools, such as GenProg [14], PAR [21], relifix [39], SemFix [26], Nopol [8], DirectFix [24] and SPR [23], to name only a few, have been introduced recently.
好多工具,有空一看。
These automated repair methods can be classified into the following two broad methodologies, i.e., search-based methodology (e.g., GenProg, PAR, and SPR) and semantics-based methodology (e.g., SemFix, Nopol, and DirectFix).
这些自动修复技术可以分成两类,这个很酷的。
Meanwhile, the semantics-based repair methodology synthesizes a repair using semantic information (via symbolic execution and constraint solving).
符号执行和约束求解,这个我不会哇,有空一定要学一下!(基本功不扎实)
Classifying repair methods into search based repair and semantics based repair is somewhat analogous to classification of software testing into search-based testing and symbolic-execution-based testing [28]
划重点,这个[28]值得了解。
Currently, research in automated program repair considers all the three attributes – scalability (should scale to large real-world programs), repairability (should repair a large number of defects possibly by covering many defect classes), and the quality of repairs (should produce repairs which make less changes to the program, delete less functionality, and are more likely to be accepted by developers).
Semantics based repair methods often work by extracting a repair constraint typically via symbolic execution. This repair constraint acts as a specification to guide program synthesis - so a patch satisfying the repair constraint can be synthesized.
我感觉我看懂了,先用符号执行抽取一个修复约束,这个约束就是一个specification,满足约束的patch就会被合成。
The key enabler for scalable multi-
line bug fix in this paper, is our novel lightweight repair constraint that we call an angelic forest. This angelic forest is automatically extracted via symbolic execution. As compared to the repair constraints used in the previous work [24, 26], the angelic forest is simpler, and its size is independent of the size of the program under repair, thereby making our repair method scale. Our angelic forest, despite its simplicity, contains enough semantic information to enable multi-location bug fix. Among existing search-based repair tools, SPR does not support multi-line fixes. While GenProg [14] can change multiple locations of the program, a recent study on GenProg repairs [33] shows that seemingly complex repairs generated from GenProg are in the overwhelming majority of cases in fact functionally equivalent to single line modification.
厉害厉害,这符号执行学的是真的好,然而我一点都还不会的…
www.comp.nus.edu.sg/~abhik/tools/angelix/
The absence of an angelic forest for a chosen n suspicious locations implies that it is not possible to repair the bug by changing these n locations. Symbolic execution finds an angelic forest (or proves the absence of an angelic forest) efficiently by exploring only feasible execution paths.
In our custom symbolic execution, symbols are installed dur-
ing symbolic execution by replacing the value of each in-
stance of a suspicious expression with a fresh symbol (line 7).
好像是一次给多个expression来赋值,那么既不是SFA也不是RFA
Our repair tool allows to control the following parameters
of our repair algorithm — the maximum number of suspi-
cious locations that can be repaired at the same time, the
kinds of suspicious expressions, and the kinds of (semantics-
preserving) program transformation.
First, for the maximum number of suspicious locations, we
used the value between 1 and 10 (inclusive).
还可以控制可疑语句的数量,厉害了,变成了参数设定。
Afterwards, our repair algorithm replaces user-
configured n most suspicious expressions—chosen based on
the result of statistical fault localization—with symbolic vari-
ables, as shown in Figure 1c where conditional expressions
and the right-hand side of an assignment are replaced with
symbolic variables.
可以修改n个语句。
Second, our repair algorithm performs controlled symbolic
execution with a few selected suspicious expressions, instead
of usual symbolic input.
这个的意思就是自己设定 n的数目?
不是在statement level,是在expression level
We note that each of these afore-listed techniques is the
improvement or extension of earlier work by us and oth-
ers. As already mentioned, our novel lightweight program-
size-independent semantic signature is the improvement of
the heavyweight semantic signature used in our prior work
DirectFix [24]. We also mention that the controlled sym-
bolic execution was first introduced in our prior work, Sem-
Fix [26], although there a symbol is installed only at one
location, and as a result, multi-location repair was not pos-
sible. Lastly, our repair strategy to ignore repair-wise in-
feasible suspicious locations has a similarity with Nopol [8]
and SPR [23]. While detailed comparison will be provided
in Section 8, Nopol and SPR currently cannot fix multi-
location bugs. Furthermore, multi-location fix seems fun-
damentally difficult in Nopol and SPR, due to their weaker
semantic signatures that do not capture the dependence be-
tween multiple program locations. The unique combination
of our novel semantic signature with the existing techniques
enables scalable multi-location bug fixing.
确实厉害。比Nopol,SPR强的地方。
然后,技术是改进+结合。都有之前工作做铺垫。
All our experiments were performed on Intel Xeon E5-2660 2.20GHz CPU with Ubuntu 14.04 64-bit operating system. We used 12 hours as the timeout of each repair session.
到底是不是一次修一个,还是每次修angelix forest,
是不是随机选择语句,还是咋地。(按顺序之类的)
这个需要探索一下。