Background:
最近为了重现tree-based clone detection的论文:L. Jiang, G. Misherghi, Z. Su, and S. Glondu. Deckard: Scalable and accurate tree-based detection of code clones. In Proceedings of ICSE, 2007.
需要对Java class中每个method构建AST,然后将AST转化成dot格式,最后转换成vector(这一步作者已经在Github实现(https://github.com/skyhover/Deckard):执行vdbgen即可)。
通过判断vector之间的相似性来判断代码之间的相似性。
这个过程是tree-based clone detection的核心思想。
完整源码已传至我的Github: https://github.com/XBWer/JDT_AST_DOT
首先找一个class文件当例子:
Input : test.java
publicclass test { int i = 1; public void testNonEscaped() { startServer(NonEscapedURIResource.class); WebResource r = Client.create().resource(getUri().userInfo("x.y").path("x%20y").build()); assertEquals("CONTENT", r.get(String.class)); } }
Output: test.java_testNonEscaped.dot
digraph "DirectedGraph" { graph [label = "testNonEscaped", labelloc=t, concentrate = true]; "13329486" [ type=31 line=4 ] "327177752" [ type=83 line=4 ] "1458540918" [ type=39 line=4 ] "1164371389" [ type=42 line=4 ] "517210187" [ type=8 line=4 ] "267760927" [ type=21 line=5 ] "633070006" [ type=32 line=5 ] "1459794865" [ type=42 line=5 ] "1776957250" [ type=57 line=5 ] "1268066861" [ type=43 line=5 ] "827966648" [ type=42 line=5 ] "1938056729" [ type=60 line=7 ] "1273765644" [ type=43 line=7 ] "701141022" [ type=42 line=7 ] "1447689627" [ type=59 line=7 ] "112061925" [ type=42 line=7 ] "764577347" [ type=32 line=7 ] "1344645519" [ type=32 line=7 ] "1234776885" [ type=42 line=7 ] "540159270" [ type=42 line=7 ] "422250493" [ type=42 line=7 ] "1690287238" [ type=32 line=7 ] "1690254271" [ type=32 line=7 ] "1440047379" [ type=32 line=7 ] "343965883" [ type=32 line=7 ] "230835489" [ type=42 line=7 ] "280884709" [ type=42 line=7 ] "1847509784" [ type=45 line=7 ] "2114650936" [ type=42 line=7 ] "1635756693" [ type=45 line=7 ] "504527234" [ type=42 line=7 ] "101478235" [ type=21 line=8 ] "540585569" [ type=32 line=8 ] "1007653873" [ type=42 line=8 ] "836514715" [ type=45 line=8 ] "1414521932" [ type=32 line=8 ] "828441346" [ type=42 line=8 ] "1899073220" [ type=42 line=8 ] "555826066" [ type=57 line=8 ] "174573182" [ type=43 line=8 ] "858242339" [ type=42 line=8 ] "13329486" -> "327177752" "13329486" -> "1458540918" "13329486" -> "1164371389" "13329486" -> "517210187" "517210187" -> "267760927" "267760927" -> "633070006" "633070006" -> "1459794865" "633070006" -> "1776957250" "1776957250" -> "1268066861" "1268066861" -> "827966648" "517210187" -> "1938056729" "1938056729" -> "1273765644" "1273765644" -> "701141022" "1938056729" -> "1447689627" "1447689627" -> "112061925" "1447689627" -> "764577347" "764577347" -> "1344645519" "1344645519" -> "1234776885" "1344645519" -> "540159270" "764577347" -> "422250493" "764577347" -> "1690287238" "1690287238" -> "1690254271" "1690254271" -> "1440047379" "1440047379" -> "343965883" "343965883" -> "230835489" "1440047379" -> "280884709" "1440047379" -> "1847509784" "1690254271" -> "2114650936" "1690254271" -> "1635756693" "1690287238" -> "504527234" "517210187" -> "101478235" "101478235" -> "540585569" "540585569" -> "1007653873" "540585569" -> "836514715" "540585569" -> "1414521932" "1414521932" -> "828441346" "1414521932" -> "1899073220" "1414521932" -> "555826066" "555826066" -> "174573182" "174573182" -> "858242339" }
dot文件中,type代表节点的类型(定义请参阅:https://help.eclipse.org/luna/index.jsp?topic=%2Forg.eclipse.jdt.doc.isv%2Freference%2Fapi%2Forg%2Feclipse%2Fjdt%2Fcore%2Fdom%2FASTNode.html),line代表在文件中的位置(第几行)。
可视化后是这个样子:http://www.webgraphviz.com/
主要步骤:
1.将Java代码转成AST;
2.重写ASTVisitor中的visit方法根据自己的需要去遍历AST;
3.AST转.dot格式。
主要的类:
ASTNode: https://help.eclipse.org/luna/index.jsp?topic=%2Forg.eclipse.jdt.doc.isv%2Freference%2Fapi%2Forg%2Feclipse%2Fjdt%2Fcore%2Fdom%2FASTNode.html
ASTVisitor: https://help.eclipse.org/neon/index.jsp?topic=%2Forg.eclipse.jdt.doc.isv%2Freference%2Fapi%2Forg%2Feclipse%2Fjdt%2Fcore%2Fdom%2FASTVisitor.html
AST: https://help.eclipse.org/mars/index.jsp?topic=%2Forg.eclipse.jdt.doc.isv%2Freference%2Fapi%2Forg%2Feclipse%2Fjdt%2Fcore%2Fdom%2FAST.html