Angr 基本介绍
- angr 来源于CGC项目,最初用于自动攻防。
- 平台无关(platform-agnostic)的二进制分析框架
- ( Computer Security Lab ) UCSB,Shellphish
Angr可以干什么?
- Disassembly and intermediate-representation lifting
- Program instrumentation
- Symbolic execution
- Control-flow analysis
- Data-dependency analysis
- Value-set analysis (VSA)
Angr安装
# dependency
sudo apt-get install python-dev libffi-dev build-essential virtualenvwrapper
# install
# we'd better use it in virtual environment
mkvirtualenv angr && pip install angr
# more see https://docs.angr.io/INSTALL.html
ubuntu 16.04 安装
virtualenvwrapper是一个Python虚拟环境,使用虚拟环境的主要原因是angr会修改libz3和libVEX,可能会影响其他程序的正常使用。
新建一个Python虚拟机环境:
$ export WORKON_HOME=~/Envs
$ mkdir -p $WORKON_HOME
$ source /usr/share/virtualenvwrapper/virtualenvwrapper.sh
$ mkvirtualenv angr
基本操作
Project
用来加载 binary,是使用 angr 的基础。
>>> import angr
>>> proj = angr.Project('/bin/true')
基本属性
查看对应binary的基本属性
- arch
- entry
- filename, absolute filename of the binary
- loader,
- min_addr
- max_addr
- main_object,Project 加载的二进制文件,即主二进制文件。
- pic,位置独立
- execstack,栈是否可以执行
- shared_objects,共享目标文件信息,名字以及映射地址
>>> proj.arch
>>> proj.entry
0x401670
>>> proj.filename
'/bin/true'
>>> proj.loader
>>> proj.loader.shared_objects # may look a little different for you!
{'ld-linux-x86-64.so.2': ,
'libc.so.6': }
>>> proj.loader.min_addr
0x400000
>>> proj.loader.max_addr
0x5004000
>>> proj.loader.main_object # we've loaded several binaries into this project. Here's the main one!
>>> proj.loader.main_object.execstack # sample query: does this binary have an executable stack?
False
>>> proj.loader.main_object.pic # sample query: is this binary position-independent?
True
加载选项
基本选项
- auto_load_libs 是否自动加载程序的依赖。
- except_missing_libs,当加载一个程序的依赖不成功时,就会产生异常。
- force_load_libs,强制加载的库。
- skip_libs,防止加载的库。
- custom_ld_path,优先的共享库的搜寻路径。
高级选项
- main_ops 是选项到选项值的映射。
- lib_opts 是库名到一个字典的映射,这个字典将名字映射到其对应的值上。
angr.Project(main_opts={'backend': 'ida', 'custom_arch': 'i386'}, lib_opts={'libc.so.6': {'backend': 'elf'}})
Loader
loader (CLE Load Everything,CLE)用于将一个 binary 加载到对应的虚拟地址空间。每类 binary 都有对应的加载器后端 (cle.Backend)。比如 cle.ELF 用来加载ELF文件。此外,angr 加载的 binary 都有自己的内存空间,但是并不是内存空间中每一个对象都会有对应的binary。
主加载对象信息
我们可以得出 loader 加载的主对象的基本信息
# This is the "main" object, the one that you directly specified when loading the project
>>> proj.loader.main_object
>>> obj = proj.loader.main_object
# The entry point of the object
>>> obj.entry
0x400580
>>> obj.min_addr, obj.max_addr
(0x400000, 0x60105f)
# Retrieve this ELF's segments and sections
>>> obj.segments
,
]>
>>> obj.sections
,
<.interp | offset 0x238, vaddr 0x400238, size 0x1c>,
<.note.ABI-tag | offset 0x254, vaddr 0x400254, size 0x20>,
...etc
# You can get an individual segment or section by an address it contains:
>>> obj.find_segment_containing(obj.entry)
>>> obj.find_section_containing(obj.entry)
<.text | offset 0x580, vaddr 0x400580, size 0x338>
# Get the address of the PLT stub for a symbol
>>> addr = obj.plt['__libc_start_main']
>>> addr
0x400540
>>> obj.reverse_plt[addr]
'__libc_start_main'
# Show the prelinked base of the object and the location it was actually mapped into memory by CLE
>>> obj.linked_base
0x400000
>>> obj.mapped_base
0x400000
其它加载对象信息
# All loaded objects
>>> proj.loader.all_objects
[,
,
,
,
,
# This is a dictionary mapping from shared object name to object
>>> proj.loader.shared_objects
{ 'libc.so.6':
'ld-linux-x86-64.so.2': }
# Here's all the objects that were loaded from ELF files
# If this were a windows program we'd use all_pe_objects!
>>> proj.loader.all_elf_objects
[,
,
]
# Here's the "externs object", which we use to provide addresses for unresolved imports and angr internals
>>> proj.loader.extern_object
# This object is used to provide addresses for emulated syscalls
>>> proj.loader.kernel_object
# Finally, you can to get a reference to an object given an address in it
>>> proj.loader.find_object_containing(0x400000)
符号以及重定位信息
我们还可以使用 CLE 来操作二进制文件中的符号。
- 查找符号,传入符号名或者对应的地址。
>>> malloc = proj.loader.find_symbol('malloc')
>>> malloc
- 基本符号信息,符号名,所属者,它的地址
>>> malloc.name
'malloc'
>>> malloc.owner_obj
# .rebased_addr is its address in the global address space. This is what is shown in the print output.
>>> malloc.rebased_addr
0x1054400
# .linked_addr is its address relative to the prelinked base of the binary. This is the address reported in, for example, readelf(1)
>>> malloc.linked_addr
0x54400
# .relative_addr is its address relative to the object base. This is known in the literature (particularly the Windows literature) as an RVA (relative virtual address).
>>> malloc.relative_addr
0x54400
- 符号的导入导出信息
>>> malloc.is_export
True
>>> malloc.is_import
False
# On Loader, the method is find_symbol because it performs a search operation to find the symbol.
# On an individual object, the method is get_symbol because there can only be one symbol with a given name.
>>> main_malloc = proj.loader.main_object.get_symbol("malloc")
>>> main_malloc
>>> main_malloc.is_export
False
>>> main_malloc.is_import
True
>>> main_malloc.resolvedby
后端
backend name | description | requires custom_arch ? |
---|---|---|
elf | Static loader for ELF files based on PyELFTools | no |
pe | Static loader for PE files based on PEFile | no |
mach-o | Static loader for Mach-O files. Does not support dynamic linking or rebasing. | no |
cgc | Static loader for Cyber Grand Challenge binaries | no |
backedcgc | Static loader for CGC binaries that allows specifying memory and register backers | no |
elfcore | Static loader for ELF core dumps | no |
ida | Launches an instance of IDA to parse the file | yes |
blob | Loads the file into memory as a flat image | yes |
Symbolic Function
默认情况下,angr 会尝试将程序中调用的库函数用自己模拟的函数来代替,这些函数一般对应的对象为SimProcedures 。我们可以从 angr.SIM_PROCEDURES 中找到所有的函数。这些函数的命名规范为package name(libc, posix, win32, etc...)+function name。
需要注意的是
- 当
auto_load_libs
是True
的时候,真正的库函数会被执行。 - 。。。
hook
hook 指定的函数,使得angr执行自己给定的函数。
>>> stub_func = angr.SIM_PROCEDURES['stubs']['ReturnUnconstrained'] # this is a CLASS
>>> proj.hook(0x10000, stub_func()) # hook with an instance of the class
>>> proj.is_hooked(0x10000) # these functions should be pretty self-explanitory
True
>>> proj.unhook(0x10000)
>>> proj.hooked_by(0x10000)
# length keyword argument to make execution jump some number of bytes forward after your hook finishes.
>>> @proj.hook(0x20000, length=5)
... def my_hook(state):
... state.regs.rax = 1
>>> proj.is_hooked(0x20000)
True
factory
原因
- 很多 angr 中的类需要使用到 project 才能实例化,使用 factory可以避免传递 project 对象。
- factory 也可以提供一些方便的构造器。
方法
- block(addr)
- 提取给定地址的基本块,返回一个块对象。
- 需要注意的是Angr分析程序的单元是基本。
- bitvector
- 寄存器使用位向量来描述
基本块
属性
- instructions
- 对应基本块的指令的个数。
- instructions_addrs
- 基本块每个
- capstone
- capstone block对象
- vex
-
方法
- pp()
- 漂亮地输出对象基本块的汇编代码。
state
project 只是给出程序最初镜像的信息,state 可以给出模拟程序执行到某条指令时的进程的具体状态。在 angr 中,则使用 SimState 来描述。
- state 中所有的信息均使用位向量.
- 可以直接向寄存器和内存中存储整数,angr 会将其转换为位向量。
预置执行状态
我们可以根据 factory 来设置程序执行到指定地址的默认状态。
-
.blank_state()
constructs a "blank slate" blank state, with most of its data left uninitialized. When accessing uninitialized data, an unconstrained symbolic value will be returned. -
.entry_state()
constructs a state ready to execute at the main binary's entry point. -
.full_init_state()
constructs a state that is ready to execute through any initializers that need to be run before the main binary's entry point, for example, shared library constructors or preinitializers. When it is finished with these it will jump to the entry point. -
.call_state()
constructs a state ready to execute a given function.- you should call it with
.call_state(addr, arg1, arg2, ...)
, whereaddr
is the address of the function you want to call andargN
is the Nth argument to that function, either as a python integer, string, or array, or a bitvector.
- you should call it with
基本状态信息
寄存器
- state.regs.rip
内存
模式:state.mem[addr].type.xxx
- 要访问的内存地址
- type指定相应地址应该被解释成的类型。
- xxx
- 空,可直接存储数据。
- 使用.resolved 来把数据输出为位向量。
- 使用.concrete 来把数据输出为int值。
>>> import angr
>>> proj = angr.Project('/bin/true')
>>> state = proj.factory.entry_state()
# copy rsp to rbp
>>> state.regs.rbp = state.regs.rsp
# store rdx to memory at 0x1000
>>> state.mem[0x1000].uint64_t = state.regs.rdx
# dereference rbp
>>> state.regs.rbp = state.mem[state.regs.rbp].uint64_t.resolved
# add rax, qword ptr [rsp + 8]
>>> state.regs.rax += state.mem[state.regs.rsp + 8].uint64_t.resolved
文件系统
执行
基本执行
>>> proj = angr.Project('examples/fauxware/fauxware')
>>> state = proj.factory.entry_state()
>>> while True:
... succ = state.step()
... if len(succ.successors) == 2:
... break
... state = succ.successors[0]
>>> state1, state2 = succ.successors
>>> state1
>>> state2
低层次内存访问
- 默认大端序存储。
>>> s = proj.factory.blank_state()
>>> s.memory.store(0x4000, s.solver.BVV(0x0123456789abcdef0123456789abcdef, 128))
>>> s.memory.load(0x4004, 6) # load-size is in bytes
>>> import archinfo
>>> s.memory.load(0x4000, 4, endness=archinfo.Endness.LE)
State Option
# Example: enable lazy solves, an option that causes state satisfiability to be checked as infrequently as possible.
# This change to the settings will be propagated to all successor states created from this state after this line.
>>> s.options.add(angr.options.LAZY_SOLVES)
# Create a new state with lazy solves enabled
>>> s = proj.factory.entry_state(add_options={angr.options.LAZY_SOLVES})
# Create a new state without simplification options enabled
>>> s = proj.factory.entry_state(remove_options=angr.options.simplification)
solver
solver 基本就是一个约束求解引擎。
操作位向量
位向量与 python 中的整形的转换。
- 将给定数值转换为指定位数的位向量。
# 64-bit bitvectors with concrete values 1 and 100
>>> one = state.solver.BVV(1, 64)
>>> one
>>> one_hundred = state.solver.BVV(100, 64)
>>> one_hundred
# create a 27-bit bitvector with concrete value 9
>>> weird_nine = state.solver.BVV(9, 27)
>>> weird_nine
- 位向量运算,位向量的位数必须一样。
>>> one + one_hundred
# You can provide normal python integers and they will be coerced to the appropriate type:
>>> one_hundred + 0x100
# The semantics of normal wrapping arithmetic apply
>>> one_hundred - one*200
# use extend to extent the length of bitvector
# also there is sign_extend
>>> weird_nine.zero_extend(64 - 27)
>>> one + weird_nine.zero_extend(64 - 27)
- 位向量符号
# Create a bitvector symbol named "x" of length 64 bits
>>> x = state.solver.BVS("x", 64)
>>> x
>>> y = state.solver.BVS("y", 64)
>>> y
- 混合位向量符号的运算
>>> x + one
>>> (x + one) / 2
>>> x - y
AST 查看
>>> tree = (x + 1) / (y + 2)
>>> tree
>>> tree.op
'__div__'
>>> tree.args
(, )
>>> tree.args[0].op
'__add__'
>>> tree.args[0].args
(, )
>>> tree.args[0].args[1].op
'BVV'
>>> tree.args[0].args[1].args
(1, 64)
符号约束
- 比较默认情况下按照无符号进行比较。
>>> x == 1
>>> x == one
>>> x > 2
0x2>
>>> x + y == one_hundred + 5
>>> one_hundred > 5
>>> one_hundred > -5
- 如何判断
>>> yes = one == 1
>>> no = one == 2
>>> maybe = x == y
>>> state.solver.is_true(yes)
True
>>> state.solver.is_false(yes)
False
>>> state.solver.is_true(no)
False
>>> state.solver.is_false(no)
True
>>> state.solver.is_true(maybe)
False
>>> state.solver.is_false(maybe)
False
约束求解
基本步骤
- 添加约束
- 求解
>>> state.solver.add(x > y)
>>> state.solver.add(y > 2)
>>> state.solver.add(10 > x)
>>> state.solver.eval(x)
4
# get a fresh state without constraints
>>> state = proj.factory.entry_state()
>>> input = state.solver.BVS('input', 64)
>>> operation = (((input + 4) * 3) >> 1) + input
>>> output = 200
>>> state.solver.add(operation == output)
>>> state.solver.eval(input)
0x3333333333333381
# If we add conflicting or contradictory constraints
>>> state.solver.add(input < 2**32)
>>> state.satisfiable()
False
Simulation Managers
我们用 state 来描述程序执行到某个地址时程序的具体状态。同时,我们使用 Simulation Managers 来管理程序如何由一个状态到另一个状态。它是 angr 中模拟控制程序的重要接口。
创建模拟管理器
>>> simgr = proj.factory.simgr(state) # TODO: change name before merge
查看状态信息
对于一个管理器来说,它可以存储多个状态,自然也可以查看每个状态的具体信息。其中 active 状态由我们默认传入的状态初始化得到。
>>> simgr.active
[]
>>> simgr.active[0].regs.rip # new and exciting!
>>> state.regs.rip # still the same!
执行
执行一个基本块,这并不会修改最初的时候传入的状态。
>>> simgr.step()
# Step until the first symbolic branch
>>> while len(simgr.active) == 1:
... simgr.step()
>>> simgr
>>> simgr.active
[, ]
# Step until everything terminates
>>> simgr.run()
>>> simgr
Stash Management
- 转移stash
>>> simgr.move(from_stash='deadended', to_stash='authenticated', filter_func=lambda s: 'Welcome' in s.posix.dumps(1))
>>> simgr
- 列举stash
>>> for s in simgr.deadended + simgr.authenticated:
... print hex(s.addr)
0x1000030
0x1000078
0x1000078
# If you prepend the name of a stash with one_, you will be given the first state in the stash.
>>> simgr.one_deadended
# If you prepend the name of a stash with mp_, you will be given a mulpyplexed version of the stash.
>>> simgr.mp_authenticated
MP([, ])
>>> simgr.mp_authenticated.posix.dumps(0)
MP(['\x00\x00\x00\x00\x00\x00\x00\x00\x00SOSNEAKY\x00',
'\x00\x00\x00\x00\x00\x00\x00\x00\x00S\x80\x80\x80\x80@\x80@\x00'])
explore!!!!
寻找到达指定地址时程序的状态。 一般会有一个find参数
- 要停止的指令的地址
- 一组停止的指令地址
- 一个检查某个状态是否满足要求的函数
对于找到的状态会放在 find 对应的 store 中。
同时,也可以在explore中添加avoid条件,即避免 angr 探索这些对应的地址。
>>> proj = angr.Project('examples/CSCI-4968-MBE/challenges/crackme0x00a/crackme0x00a')
>>> simgr = proj.factory.simgr()
>>> simgr.explore(find=lambda s: "Congrats" in s.posix.dumps(1))
>>> s = simgr.found[0]
>>> print s.posix.dumps(1)
Enter password: Congrats!
>>> flag = s.posix.dumps(0)
>>> print(flag)
g00dJ0B!
extra
- stash 类型
Stash | Description |
---|---|
active | This stash contains the states that will be stepped by default, unless an alternate stash is specified. |
deadended | A state goes to the deadended stash when it cannot continue the execution for some reason, including no more valid instructions, unsat state of all of its successors, or an invalid instruction pointer. |
pruned | When using LAZY_SOLVES , states are not checked for satisfiability unless absolutely necessary. When a state is found to be unsat in the presence of LAZY_SOLVES , the state hierarchy is traversed to identify when, in its history, it initially became unsat. All states that are descendants of that point (which will also be unsat, since a state cannot become un-unsat) are pruned and put in this stash.( 使用LAZY_SOLVES时,不检查可满足性,当一个状态在LAZY_SOLVES之前就被抛弃时,当被遍历去识别这个状态的时候,直到找到一个不能被抛弃的节点。修剪到这个节点,并将这个状态存起来。) |
unconstrained | If the save_unconstrained option is provided to the SimulationManager constructor, states that are determined to be unconstrained (i.e., with the instruction pointer controlled by user data or some other source of symbolic data) are placed here.(这个save_unconstrained选项被SMC激活,状态不在被约束,指令将会用户数据和一些其它的符号数据源控制) |
unsat | If the save_unsat option is provided to the SimulationManager constructor, states that are determined to be unsatisfiable (i.e., they have constraints that are contradictory, like the input having to be both "AAAA" and "BBBB" at the same time) are placed here. (save_unsat表示状态的满足条件) |
analysis
给出程序的各种分析信息。
如控制流图
# Originally, when we loaded this binary it also loaded all its dependencies into the same virtual address space
# This is undesirable for most analysis.
>>> proj = angr.Project('/bin/true', auto_load_libs=False)
>>> cfg = proj.analyses.CFGFast()
# cfg.graph is a networkx DiGraph full of CFGNode instances
# You should go look up the networkx APIs to learn how to use this!
>>> cfg.graph
>>> len(cfg.graph.nodes())
951
# To get the CFGNode for a given address, use cfg.get_any_node
>>> entry_node = cfg.get_any_node(proj.entry)
>>> len(list(cfg.graph.successors(entry_node)))
2
class angr.block.CapstoneBlock(addr, insns, thumb, arch)
Deep copy of the capstone blocks, which have serious issues with having extended lifespans outside of capstone itself
///////////////////////////////////////////////////////////////////////////////////////////////////////////////////
angr-0ctf_momo
1.需要逆向找到约束求解的三个条件
dx, dword ptr [edx*4 + 0x81fe260]
al, byte ptr [0x81fe6e0]
dl, byte ptr [0x81fe6e4]
2.需要掌握“逆向MoVfuscator编译程序”能力
1.使用qira+ida进行人工分析,
2.或使用“movfuscator的反混淆器”
3.使用Makefile+二进制插桩
4.angr求解是建立在对程序逆向的理解程度
3.angr约束求解的过程,有一部分还理解的不是很清楚
参考网站:
1:angr学习(四):
http://www.cnblogs.com/fancystar/p/7893248.html
2:Makefile+二进制插桩:
https://blog.xy14qg.top/2016/0ctf-2016-writeup/#momo-reverse
3:angr用例解析——0ctf_momo_3:
http://blog.csdn.net/doudoudouzoule/article/details/79537019
///////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
相关资料:
在网上发现一个开源项目,https://github.com/kirschju/demovfuscator 是专门来应该movfuscator的反混淆器,果断安装
momo使用qira解决movfuscator
http://blog.csdn.net/charlie_heng/article/details/79206863