硕期间研究反编译,下面从反编译工具库,JAVA反编译和反编译主要研究机构进行介绍。
反编译的历史,从传统的基于编译理论的研究,正逐步转换成一种基于搜索的代码块匹配研究。
《Decompilation as search》就是基于搜索的思路去研究反编译,效果还不错。
随着移动设备的逐步普及,移动设备的安全越来越重要,可以参考《Android Hacker's Handbook》,在CSDN上有下载。
主要的反编译器和逆向分析技术
[1] DDC
https://github.com/nemerle/dcc
[2] libbeauty
Given an input .o file, it can create a .c file that compiles and has the same function as the original .o file
https://github.com/jcdutton/libbeauty/wiki
https://github.com/jcdutton/libbeauty [有部分源码]
[3] Dagger
Dagger enables easy retargetability of several planned tools, like rewriters, static or dynamic binary translators , and even simple instruction set emulators.
http://dagger.repzret.org/
[4] SecondWrite
商业化的软件
http://www.isr.umd.edu/research/posters/secondwrite
[5] IDC
交互式反编译器,某博士论文方向。
http://idc.sourceforge.net/
http://idc.sourceforge.net/wiki/
[6] Fracture
Fracture can speed up a variety of applications and also enable generic implementations of a number of static and dynamic analysis tools.
https://github.com/draperlaboratory/fracture
[7] RevGen
Automatically converting existing binary programs to the standard LLVM IR, making an increasingly large number of static and dynamic analysis frameworks, as well as run-time instrumentation tools, applicable to legacy software.
http://dslab.epfl.ch/
[8] Emscripten
A compiler from LLVM assembly to JavaScript. However there is also a lot of room for additional optimizations in Emscripten itself, in particular in how it nativizes variables and s tructures, which can potentially lead to very significant speedups.
http://www.emscripten.org
[9] Retargetable Decompiler
Create a retargetable decompiler that can be utilized for source code recovery, static malware analysis, etc.
http://decompiler.fit.vutbr.cz/home/
[10] BAP
Make it easy to develop binary analysis techniques and tools.
http://bap.ece.cmu.edu
[11] Jakstab
Jakstab is an Abstract Interpretation-based, integrated disassembly and static analysis framework for designing analyses on executables and recovering reliable control flow graphs.It is designed to be adaptable to multiple hardware platforms using customized instruction decoding and processor specifications similar to the Boomerang decompiler.
http://www.jakstab.org/home
[12] Boomerang
Develop a real decompiler for machine code programs through the open source community
http://boomerang.sourceforge.net/
[13] Hex-Rays
A plugin to IDA Disassembler
https://www.hex-rays.com/products/decompiler/
[14] Phoenix
参考《Native x86 Decompilation using Semantics-Preserving Structural Analysis and Iterative Control-Flow Structuring》
[15] C-Decompiler
上交的陈耿标 《反编译器C-Decompiler关键技术的研究与实现》
[16] Capstone
提供API的反汇编器,支持多种体系结构
http://www.capstone-engine.org/index.html
[17] SmartDec
基于数学推理的反编译器;经过实际测试,此工具的鲁棒性较差。
http://decompilation.info/
[18] Obfuscator-LLVM
安全分析使用
https://github.com/obfuscator-llvm/obfuscator/wiki
[19] mcsema
It is a library to translate the semantics of native code to LLVM IR.
https://github.com/trailofbits/mcsema
[20] PIN
Pin is a dynamic binary instrumentation framework for the IA-32 and x86-64 instruction-set architectures that enables the creation of dynamic program analysis tools.
https://software.intel.com/en-us/articles/pin-a-dynamic-binary-instrumentation-tool
[21] valgrind
Valgrind is an instrumentation framework for building dynamic analysis tools. There are Valgrind tools that can automatically detect many memory management and threading bugs, and profile your programs in detail. You can also use Valgrind to build new tools.
http://valgrind.org/
[22] BitBlaze
The BitBlaze project aims to design and develop a powerful binary analysis platform and employ the platform in order to (1) analyze and develop novel COTS protection and diagnostic mechanisms and (2) analyze, understand, and develop defenses against malicious code. The BitBlaze project also strives to open new application areas of binary analysis, which provides sound and effective solutions to applications beyond software security and malicious code defense, such as protocol reverse engineering and fingerprint generation.
http://bitblaze.cs.berkeley.edu/
[23] CodeSurfer
CodeSurfer is a code-understanding tool for C and C++ source code and for Intel x86 machine code. CodeSurfer performs a deep semantic analysis of a program and provides sophisticated queries for understanding your code. It enables you to effortlessly identify and navigate the deep structure of your program: the semantic threads that reveal exactly how your program works. CodeSurfer can be used either interactively or programmatically.
http://www.grammatech.com/research/technologies/codesurfer
[24] Decompilation as search
将反编译看成了搜索问题
http://www.rendezvousalpha.com
[25] snowman
With a new decompiler for C/C++, developers can gain insight into the workings of a program without looking at source code. That's the plan for Snowman, which the project's lead developer hopes to make akin to an LLVM for decompilation.
http://derevenets.com/index.html
[26] libcpu
"libcpu" is an open source library that emulates several CPU architectures, allowing itself to be used as the CPU core for different kinds of emulator projects. It uses its own frontends for the different CPU types, and uses LLVM for the backend.
https://github.com/libcpu/libcpu
[27] BARF Project
BARF is an open source binary analysis framework that aims to support a wide range of binary code analysis tasks that are common in the information security discipline. It is a scriptable platform that supports instruction lifting from multiple architectures, binary translation to an intermediate representation, an extensible framework for code analysis plugins and interoperation with external tools such as debuggers, SMT solvers and instrumentation tools. The framework is designed primarily for human-assisted analysis but it can be fully automated.
https://github.com/programa-stic/barf-project
[28] miasm
Miasm is a free and open source (GPLv2) reverse engineering framework. Miasm aims to analyze / modify / generate binary programs. Here is a non exhaustive list of features:
Opening / modifying / generating PE / ELF 32 / 64 LE / BE using Elfesteem
Assembling / Disassembling X86 / ARM / MIPS / SH4 / MSP430
Representing assembly semantic using intermediate language
Emulating using JIT (dynamic code analysis, unpacking, ...)
Expression simplification for automatic de-obfuscation
https://github.com/cea-sec/miasm
[29] obfuscator-llvm
The aim of this project is to provide an open-source fork of theLLVMcompilation suite able to provide increased software security throughcode obfuscationand tamper-proofing. As we currently mostly work at theIntermediate Representation(IR) level, our tool is compatible with all programming languages (C, C++, Objective-C, Ada and Fortran) and target platforms (x86, x86-64, PowerPC, PowerPC-64, ARM, Thumb, SPARC, Alpha, CellSPU, MIPS, MSP430, SystemZ, and XCore) currently supported by LLVM.
https://github.com/obfuscator-llvm/obfuscator/wiki
[30] DAVA
Dava is a decompiler for arbitrary Java bytecode. It can be used to decompile bytecode produced by Java compilers, compilers for other languages (AspectJ, SML, C) that generate Java bytecode and tools like Java bytecode obfuscators, instrumentors and optimizers.
http://www.sable.mcgill.ca/dava/
[31] ded
ded is a project which aims at decompiling Android applications. The ded tool retargets Android applications in .dex format to traditional .class files. These .class files can then be processed by existing Java tools, including decompilers. Thus, Android applications can be analyzed using a vast range of techniques developed for traditional Java applications.
http://siis.cse.psu.edu/ded/
[32] Dare
http://siis.cse.psu.edu/dare/index.html
[33] Procyon
Procyon is a suite of Java metaprogramming tools focused on code generation and analysis
https://bitbucket.org/mstrobel/procyon
主要的反编译中间表达式
[1] BIL http://bap.ece.cmu.edu/
[2] REIL http://www.zynamics.com/binnavi/manual/html/reil_language.htm
[3] LLVM IR
主要的研究机构
[1] CMU University
http://bap.ece.cmu.edu|http://security.ece.cmu.edu/
[2] Berkeley University
http://bitblaze.cs.berkeley.edu/
[3] Maryland University
https://www.isr.umd.edu/research/posters/secondwrite
[4] Saarland university Compiler Group
http://compilers.cs.uni-saarland.de/
[5] IDA
https://www.hex-rays.com/index.shtml