full_sym_exec
以及内部的部分。z3
进行约束求解的理解。这里我参考360核心安全技术博客的区块链安全-以太坊智能合约静态分析。
00000: PUSH1 0x80
00002: PUSH1 0x40
00004: MSTORE
00005: PUSH1 0x04
00007: CALLDATASIZE
00008: LT
00009: PUSH1 0x3e
0000b: JUMPI
0000c: PUSH4 0xffffffff
00011: PUSH29 0x0100000000000000000000000000000000000000000000000000000000
0002f: PUSH1 0x00
00031: CALLDATALOAD
00032: DIV
00033: AND
00034: PUSH4 0x1003e2d2
00039: DUP2
0003a: EQ
0003b: PUSH1 0x43
0003d: JUMPI
0003e: JUMPDEST
0003f: PUSH1 0x00
00041: DUP1
00042: REVERT
00043: JUMPDEST
00044: CALLVALUE
00045: DUP1
00046: ISZERO
00047: PUSH1 0x4e
00049: JUMPI
0004a: PUSH1 0x00
0004c: DUP1
0004d: REVERT
0004e: JUMPDEST
0004f: POP
00050: PUSH1 0x58
00052: PUSH1 0x04
00054: CALLDATALOAD
00055: PUSH1 0x73
00057: JUMP
00058: JUMPDEST
00059: PUSH1 0x40
0005b: DUP1
0005c: MLOAD
0005d: SWAP3
0005e: ISZERO
0005f: ISZERO
00060: DUP4
00061: MSTORE
00062: PUSH1 0x20
00064: DUP4
00065: ADD
00066: SWAP2
00067: SWAP1
00068: SWAP2
00069: MSTORE
0006a: DUP1
0006b: MLOAD
0006c: SWAP2
0006d: DUP3
0006e: SWAP1
0006f: SUB
00070: ADD
00071: SWAP1
00072: RETURN
00073: JUMPDEST
00074: PUSH1 0x00
00076: DUP1
00077: SLOAD
00078: DUP3
00079: ADD
0007a: DUP1
0007b: DUP3
0007c: SSTORE
0007d: DUP2
0007e: SWAP1
0007f: DUP4
00080: GT
00081: ISZERO
00082: PUSH1 0x86
00084: JUMPI
00085: Missing opcode 0xfe
00086: JUMPDEST
00087: SWAP2
00088: POP
00089: SWAP2
0008a: JUMP
0008b: STOP
.........
00000
地址处是程序的第一条指令,根据构建基本块的第一个原则,将其作为新的基本块的第一条指令;0000b
地址处是一条跳转指令,根据构建基本块的第二个原则,将其作为新的基本块的最后一条指令。00000
到0000b
的代码构建成一个基本块,为了之后方便描述,我们把这个基本块命名为基本块1。0000c
地址处的指令,我们作为新的基本块的第一条指令。0003d
地址处是一条跳转指令,根据构建基本块的第二个原则,将其作为新的基本块的最后一条指令。于是从地址0000c
到0003d
就构成了一个新的基本块,我们把这个基本块命名为基本块2。基本块1->基本块2
和基本块1->基本块3
两条边。基本块6的最后一条指令是跳转指令,该指令会直接跳转到基本块8,所以基本块6就存在基本块6->基本块8
这一条边。{
'基本块1': ['基本块2','基本块3'],
'基本块2': ['基本块3','基本块4'],
'基本块3': ['基本块11'],
'基本块4': ['基本块5','基本块6'],
'基本块5': ['基本块11'],
'基本块6': ['基本块8'],
'基本块7': ['基本块8'],
'基本块8': ['基本块9','基本块10'],
'基本块9': ['基本块11'],
'基本块10': ['基本块7']
}
基本块1->基本块2
和基本块1->基本块3
两条边,所以基本块1
对应的list为['基本块2', '基本块3']
。oyente
的控制流图在哪里vertices
和edge
两个变量就已经获取到了所有需要的信息了。oyente
框架中,他们使用了z3
对控制流图进行约束求解,即求出能够满足所有约束条件的每个变量的值。z3
:z3
是由微软公司开发的一个优秀的约束求解器,用它能求解满足约束条件的变量的值,这和以前接触过的线性规划求解器如Gurobi
或者CPLEX
类似。z3
对约束条件进行求解,根据求解的结果我们就能够判断出基本块的跳转方向,如此一来我们就能模拟整个程序的执行。z3
的使用z3
的python实现为例介绍z3
是如何使用的。from z3 import *
x = Int('x')
y = Int('y')
solve(x>2, y< 10, x+2*y == 7)
Int('x')
在z3中创建了一个名为x的变量,之后调用了solve
函数求在三个约束条件下的解,这三个约束条件分别是x>2
,y<10
和x+2*y==7
,运行上面的代码,输出结果为[y==0, x==7]
from z3 import *
p = Bool('p')
q = Bool('q')
r = Bool('r')
solve(Implies(p,q), r==Not(q), Or(Not(p),r))
求解结果是
[q = False, p = False, r = True]
z3
中我们可以创建固定长度的位向量,不如在下面的代码中BitVec('x',16)
创建了一个长度为16位,名为x的变量from z3 import *
x = BitVec('x', 16)
y = BitVec('y', 16)
solve(x + y > 5)
z3
中畜类可以创建向量之外,也可以创建位向量常量。下面代码中的BitVecVal(-1,16)
创建了一个长度为16位,值位1的位向量常量。from z3 import *
a = BitVecVal(-1,16)
b = BitVecVal(65535, 16)
print(simplify(a==b))
from z3 import *
x = Int('x')
y = Int('y')
s = Solver()
s.add(x > 10, y == x+2)
print(s)
print(s.check())
Solver()
创建了一个通用的求解器,之后调用add()
添加约束,调用check()
判断是否有满足约束的解。如果有解则返回sat
,如果没有则返回unsat
。z3
进行约束求解CALLDATASIZE
、CALLDATALOAD
等指令的时候,表示程序要获取外部的输入数据,此时我们用z3
中的BitVec
函数创建一个位向量来代替输入数据;当执行到LT
、EQ
等指令时,此时我们用z3
创建一个类似If(ULE(xx,xx),0,1)
的表达式。stack=[]
来表示以太坊虚拟机的栈,用变量memory={}
来表示以太坊虚拟机的内存,用变量storage={}
来表示storage。00000: PUSH1 0x80
00002: PUSH1 0x40
00004: MSTORE
00005: PUSH1 0x04
00007: CALLDATASIZE
00008: LT
00009: PUSH1 0x3e
0000b: JUMPI
PUSH
指令是入栈指令,执行两次入栈后,stack
的值为[0x80,0x40]
。MSTORE
执行之后,stack为空,memory的值为{0x40: 0x80}
CALLDATASIZE
指令表示要获取输入数据的长度,我们使用z3
中的BitVec("Id_size", 256)
,生成一个长度为256位,名为Id_size
的变量来表示此时输入数据的长度。LT
指令用来比较0x04
和变量Id_size
的大小,如果0x04
小于变量Id_size
则值为0,否则值为1。使用z3
转换成表达式则为:If(ULE(4, Id_size), 0, 1)
。JUMPI
是条件跳转指令,是否跳转到0x3e
地址处取决于上一步中LT
指令的结果,即表达式If(ULE(4, Id_size), 0, 1)
的结果。如果部位0则跳转,否则不跳转,使用z3
转换表达式则为:If(ULE(4,Id_size),0,1)!=0
。from z3 import *
Id_size = BitVec("Id_size",256)
exp = If(ULE(4, Id_size), 0, 1) != 0
solver = Solver()
solver.add(exp)
if solver.check() == sat:
print "jump to BasicBlock3"
else:
print "error "
solver
的check()
方法来判断此表达式是否有解,如果返回值等于sat
则表示表达式有解,也就是说LT
的指令结果不为0,那么接下来可以跳转到基本块3。check()
方法的返回值不等于sat
时,我们并没有跳转到基本块2,而是直接输出错误,这是因为当条件表达式无解时,继续向下执行没有任何意义。Id_size = BitVec("Id_size",256)
exp = If(ULE(4, Id_size), 0, 1) != 0
negated_exp = Not(If(ULE(4, Id_size), 0, 1) != 0)
solver = Solver()
solver.push()
solver.add(exp)
if solver.check() == sat:
print "jump to BasicBlock3"
else:
print "error"
solver.pop()
solver.push()
solver.add(negated_exp)
if solver.check() == sat:
print "falls to BasicBlock2"
else:
print "error"
full_sym_exec()
def full_sym_exec():
# executing, starting from beginning
path_conditions_and_vars = {
"path_condition" : []}
global_state = get_init_global_state(path_conditions_and_vars)
analysis = init_analysis()
params = Parameter(path_conditions_and_vars=path_conditions_and_vars, global_state=global_state, analysis=analysis)
if g_src_map:
start_block_to_func_sig = get_start_block_to_func_sig()
return sym_exec_block(params, 0, 0, 0, -1, 'fallback')
def get_init_global_state(path_conditions_and_vars):
global_state = {
"balance" : {
}, "pc": 0}
init_is = init_ia = deposited_value = sender_address = receiver_address = gas_price = origin = currentCoinbase = currentNumber = currentDifficulty = currentGasLimit = callData = None
# INPUT_STATE 指的是假设链的状态,这里的默认值是False
if global_params.INPUT_STATE:
with open('state.json') as f:
state = json.loads(f.read())
if state["Is"]["balance"]:
init_is = int(state["Is"]["balance"], 16)
if state["Ia"]["balance"]:
init_ia = int(state["Ia"]["balance"], 16)
if state["exec"]["value"]:
deposited_value = 0
if state["Is"]["address"]:
sender_address = int(state["Is"]["address"], 16)
if state["Ia"]["address"]:
receiver_address = int(state["Ia"]["address"], 16)
if state["exec"]["gasPrice"]:
gas_price = int(state["exec"]["gasPrice"], 16)
if state["exec"]["origin"]:
origin = int(state["exec"]["origin"], 16)
if state["env"]["currentCoinbase"]:
currentCoinbase = int(state["env"]["currentCoinbase"], 16)
if state["env"]["currentNumber"]:
currentNumber = int(state["env"]["currentNumber"], 16)
if state["env"]["currentDifficulty"]:
currentDifficulty = int(state["env"]["currentDifficulty"], 16)
if state["env"]["currentGasLimit"]:
currentGasLimit = int(state["env"]["currentGasLimit"], 16)
# for some weird reason these 3 vars are stored in path_conditions insteaad of global_state
else:
# 定义BitVec型的变量,作为CALLDATASIZE中可能传入的变量
sender_address = BitVec("Is", 256)
receiver_address = BitVec("Ia", 256)
deposited_value = BitVec("Iv", 256)
init_is = BitVec("init_Is", 256)
init_ia = BitVec("init_Ia", 256)
# 对path_conditions_and_vars进行赋值
path_conditions_and_vars["Is"] = sender_address
path_conditions_and_vars["Ia"] = receiver_address
path_conditions_and_vars["Iv"] = deposited_value
# 先设定约束,deposited_value需要大于0
constraint = (deposited_value >= BitVecVal(0, 256))
path_conditions_and_vars["path_condition"].append(constraint)
# 发送者的钱要大于储蓄才能发???
constraint = (init_is >= deposited_value)
path_conditions_and_vars["path_condition"].append(constraint)
# 接收地址的值需要大于0
constraint = (init_ia >= BitVecVal(0, 256))
path_conditions_and_vars["path_condition"].append(constraint)
# update the balances of the "caller" and "callee"
# 更新发送者和接受者的储值
global_state["balance"]["Is"] = (init_is - deposited_value)
global_state["balance"]["Ia"] = (init_ia + deposited_value)
# 下面的值原先都为None,由gen_xxx 指定一个变量名
# 如gas_price就被制定了变量名Ip
if not gas_price:
new_var_name = gen.gen_gas_price_var()
gas_price = BitVec(new_var_name, 256)
path_conditions_and_vars[new_var_name] = gas_price
if not origin:
new_var_name = gen.gen_origin_var()
origin = BitVec(new_var_name, 256)
path_conditions_and_vars[new_var_name] = origin
if not currentCoinbase:
new_var_name = "IH_c"
currentCoinbase = BitVec(new_var_name, 256)
path_conditions_and_vars[new_var_name] = currentCoinbase
if not currentNumber:
new_var_name = "IH_i"
currentNumber = BitVec(new_var_name, 256)
path_conditions_and_vars[new_var_name] = currentNumber
if not currentDifficulty:
new_var_name = "IH_d"
currentDifficulty = BitVec(new_var_name, 256)
path_conditions_and_vars[new_var_name] = currentDifficulty
if not currentGasLimit:
new_var_name = "IH_l"
currentGasLimit = BitVec(new_var_name, 256)
path_conditions_and_vars[new_var_name] = currentGasLimit
new_var_name = "IH_s"
currentTimestamp = BitVec(new_var_name, 256)
path_conditions_and_vars[new_var_name] = currentTimestamp
# the state of the current current contract
if "Ia" not in global_state:
global_state["Ia"] = {
}
global_state["miu_i"] = 0
global_state["value"] = deposited_value
global_state["sender_address"] = sender_address
global_state["receiver_address"] = receiver_address
global_state["gas_price"] = gas_price
global_state["origin"] = origin
global_state["currentCoinbase"] = currentCoinbase
global_state["currentTimestamp"] = currentTimestamp
global_state["currentNumber"] = currentNumber
global_state["currentDifficulty"] = currentDifficulty
global_state["currentGasLimit"] = currentGasLimit
return global_state
path_conditions_and_vars
指的是在block
跳转的时候可能会调用的变量和需要处理的约束。global_state
是在正常block执行的时候就有可能会调用的变量。def init_analysis():
analysis = {
"gas": 0,
"gas_mem": 0,
"money_flow": [("Is", "Ia", "Iv")], # (source, destination, amount)
"reentrancy_bug":[],
"money_concurrency_bug": [],
"time_dependency_bug": {
}
}
return analysis
Is
对应的是source
,Ia
对应的是destination
,Iv
对应的是amount
。 def __init__(self, **kwargs):
attr_defaults = {
"stack": [],
"calls": [],
"memory": [],
"visited": [],
"overflow_pcs": [],
"mem": {
},
"analysis": {
},
"sha3_list": {
},
"global_state": {
},
"path_conditions_and_vars": {
}
}
for (attr, default) in six.iteritems(attr_defaults):
setattr(self, attr, kwargs.get(attr, default))
six
是一个兼容性的库,你可以直接认为是iteritems
。setattr
可以参考下面的小案例#为指定属性设置属性值
setattr(c, 'detail', 'python接口自动化')
setattr(c, 'view_times', 32)
#输出重新设置后的属性值
print(c.detail) #python接口自动化
print(c.view_times) #32
kwargs.get(attr, default)
遍历的结果是:[]
[]
[]
[]
[]
{
}
{
'gas': 0, 'gas_mem': 0, 'money_flow': [('Is', 'Ia', 'Iv')], 'reentrancy_bug': [], 'money_concurrency_bug': [], 'time_dependency_bug': {
}}
{
}
{
'balance': {
'Is': init_Is - Iv, 'Ia': init_Ia + Iv}, 'pc': 0, 'Ia': {
}, 'miu_i': 0, 'value': Iv, 'sender_address': Is, 'receiver_address': Ia, 'gas_price': Ip, 'origin': Io, 'currentCoinbase': IH_c, 'currentTimestamp': IH_s, 'currentNumber': IH_i, 'currentDifficulty': IH_d, 'currentGasLimit': IH_l}
{
'path_condition': [0 <= Iv, init_Is >= Iv, 0 <= init_Ia], 'Is': Is, 'Ia': Ia, 'Iv': Iv, 'Ip': Ip, 'Io': Io, 'IH_c': IH_c, 'IH_i': IH_i, 'IH_d': IH_d, 'IH_l': IH_l, 'IH_s': IH_s}
stack
,calls
,memory
等…def get_start_block_to_func_sig():
state = 0
func_sig = None
for pc, instr in six.iteritems(instructions):
if state == 0 and instr.startswith('PUSH4'):
state += 1
func_sig = instr.split(' ')[1][2:]
elif state == 1 and instr.startswith('EQ'):
state += 1
elif state == 2 and instr.startswith('PUSH'):
state = 0
pc = instr.split(' ')[1]
pc = int(pc, 16)
start_block_to_func_sig[pc] = func_sig
else:
state = 0
return start_block_to_func_sig
PUSH4
,且后一位是EQ
且再后一位是PUSH
的,然后start_block_to_func_sig
就记录下func_sig
。start_block_to_func_sig[60]=78(0x4e)
Listing 2
有提到PUSH4 0x06fdde03
EQ
PUSH2 0x00e2
JUMPI
P2 searches the contract bytecode for the instruction sequence: PUSH4 x; EQ; PUSH2 y; JUMPI. If an instruction sequence is found, we obtain an encoded function id, x.
(params, 0, 0, 0, -1, 'fallback')
# Symbolically executing a block from the start address
# 我们现在实际上已经获得了block和边了,但是对于block之间连续的逻辑,我们需要做一个深度优先遍历
# 所以你看的 sys_exec_block 会是一个递归函数
def sym_exec_block(params, block, pre_block, depth, func_call, current_func_name):
global solver
global visited_edges
global money_flow_all_paths
global path_conditions
global global_problematic_pcs
global all_gs
global results
global g_src_map
# 对已经访问过的进行标记
visited = params.visited
# 作为符号化执行的虚拟出来的栈
stack = params.stack
# ???
mem = params.mem
# 符号化执行虚拟出来的内存
memory = params.memory
# 这是在上面定义的一些链的常量(主要是z3)
global_state = params.global_state
# ???
sha3_list = params.sha3_list
# 用于填充block与block之间的中间条件以及变量
path_conditions_and_vars = params.path_conditions_and_vars
# 代表着分析的j结果
analysis = params.analysis
# ???
calls = params.calls
# ???
overflow_pcs = params.overflow_pcs
Edge = namedtuple("Edge", ["v1", "v2"]) # Factory Function for tuples is used as dictionary key
if block < 0:
log.debug("UNKNOWN JUMP ADDRESS. TERMINATING THIS PATH")
return ["ERROR"]
log.debug("Reach block address %d \n", block)
# 如果存在source_map
if g_src_map:
# 如果block在起始block,或者在函数清单内
if block in start_block_to_func_sig:
func_sig = start_block_to_func_sig[block]
current_func_name = g_src_map.sig_to_func[func_sig]
pattern = r'(\w[\w\d_]*)\((.*)\)$'
match = re.match(pattern, current_func_name)
if match:
current_func_name = list(match.groups())[0]
current_edge = Edge(pre_block, block)
if current_edge in visited_edges:
updated_count_number = visited_edges[current_edge] + 1
visited_edges.update({
current_edge: updated_count_number})
# 如果当前的edges没有被visited过,则更新
else:
visited_edges.update({
current_edge: 1})
# 如果这一个edges大于了循环的最高限制
if visited_edges[current_edge] > global_params.LOOP_LIMIT:
log.debug("Overcome a number of loop limit. Terminating this path ...")
return stack
# 计算当前的gas,如果大于了限制,则返回stack
current_gas_used = analysis["gas"]
if current_gas_used > global_params.GAS_LIMIT:
log.debug("Run out of gas. Terminating this path ... ")
return stack
# Execute every instruction, one at a time
try:
# 获取当前block所有的指令
block_ins = vertices[block].get_instructions()
except KeyError:
log.debug("This path results in an exception, possibly an invalid jump address")
return ["ERROR"]
# 循环执行当前block的指令,所有的符号化执行的内容全部都在sym_exec_ins函数中
for instr in block_ins:
sym_exec_ins(params, block, instr, func_call, current_func_name)
# Mark that this basic block in the visited blocks
# visited中加入此block
visited.append(block)
depth += 1
# 把之前添加的一些bug结果进行汇总
reentrancy_all_paths.append(analysis["reentrancy_bug"])
if analysis["money_flow"] not in money_flow_all_paths:
global_problematic_pcs["money_concurrency_bug"].append(analysis["money_concurrency_bug"])
money_flow_all_paths.append(analysis["money_flow"])
path_conditions.append(path_conditions_and_vars["path_condition"])
global_problematic_pcs["time_dependency_bug"].append(analysis["time_dependency_bug"])
all_gs.append(copy_global_values(global_state))
# Go to next Basic Block(s)
# 然后前往下一个block
# 如果这个block的类型是terminal 或者 递归的深度大于最大深度限制了
if jump_type[block] == "terminal" or depth > global_params.DEPTH_LIMIT:
global total_no_of_paths
global no_of_test_cases
total_no_of_paths += 1
# 如果要求生成测试用例,则...
if global_params.GENERATE_TEST_CASES:
try:
model = solver.model()
no_of_test_cases += 1
filename = "test%s.otest" % no_of_test_cases
with open(filename, 'w') as f:
for variable in model.decls():
f.write(str(variable) + " = " + str(model[variable]) + "\n")
if os.stat(filename).st_size == 0:
os.remove(filename)
no_of_test_cases -= 1
except Exception as e:
pass
log.debug("TERMINATING A PATH ...")
# 显示结果
display_analysis(analysis)
if is_testing_evm():
compare_storage_and_gas_unit_test(global_state, analysis)
# 如果是没有条件语句的跳转
elif jump_type[block] == "unconditional": # executing "JUMP"
# 继任者 = 当前block跳转的目标
successor = vertices[block].get_jump_target()
# 新的参数
new_params = params.copy()
# 获取新的program counter
new_params.global_state["pc"] = successor
if g_src_map:
# 通过program counter和之前的source map获取源码
source_code = g_src_map.get_source_code(global_state['pc'])
# 不太懂
if source_code in g_src_map.func_call_names:
func_call = global_state['pc']
sym_exec_block(new_params, successor, block, depth, func_call, current_func_name)
# 如果跳转类型是fall to,即什么都不做
elif jump_type[block] == "falls_to": # just follow to the next basic block
successor = vertices[block].get_falls_to()
new_params = params.copy()
new_params.global_state["pc"] = successor
sym_exec_block(new_params, successor, block, depth, func_call, current_func_name)
# 如果跳转类型是条件跳转
elif jump_type[block] == "conditional": # executing "JUMPI"
# A choice point, we proceed with depth first search
# 则先获取分支的表达式
branch_expression = vertices[block].get_branch_expression()
log.debug("Branch expression: " + str(branch_expression))
# 设置solver的一个边界
solver.push() # SET A BOUNDARY FOR SOLVER
# 给solver增加一个边界表达式
solver.add(branch_expression)
# 下面的这一部分是对JUMPI的条件为true检查
try:
# 如果solver检测处有不满足的地方
if solver.check() == unsat:
# 则返回有不可解的路径
log.debug("INFEASIBLE PATH DETECTED")
else:
# 则跳转到下一个目标
left_branch = vertices[block].get_jump_target()
new_params = params.copy()
new_params.global_state["pc"] = left_branch
# 在path_...的变量中加入这一个分支的expression
new_params.path_conditions_and_vars["path_condition"].append(branch_expression)
last_idx = len(new_params.path_conditions_and_vars["path_condition"]) - 1
# 定位上一个inx发生的bug并保存
new_params.analysis["time_dependency_bug"][last_idx] = global_state["pc"]
# 继续进入下一个block
sym_exec_block(new_params, left_branch, block, depth, func_call, current_func_name)
except TimeoutError:
raise
except Exception as e:
if global_params.DEBUG_MODE:
traceback.print_exc()
# 下面的条件是对JUMPI为false条件的检查
solver.pop() # POP SOLVER CONTEXT
solver.push() # SET A BOUNDARY FOR SOLVER
negated_branch_expression = Not(branch_expression)
solver.add(negated_branch_expression)
log.debug("Negated branch expression: " + str(negated_branch_expression))
try:
if solver.check() == unsat:
# Note that this check can be optimized. I.e. if the previous check succeeds,
# no need to check for the negated condition, but we can immediately go into
# the else branch
log.debug("INFEASIBLE PATH DETECTED")
else:
right_branch = vertices[block].get_falls_to()
new_params = params.copy()
new_params.global_state["pc"] = right_branch
new_params.path_conditions_and_vars["path_condition"].append(negated_branch_expression)
last_idx = len(new_params.path_conditions_and_vars["path_condition"]) - 1
new_params.analysis["time_dependency_bug"][last_idx] = global_state["pc"]
sym_exec_block(new_params, right_branch, block, depth, func_call, current_func_name)
except TimeoutError:
raise
except Exception as e:
if global_params.DEBUG_MODE:
traceback.print_exc()
solver.pop() # POP SOLVER CONTEXT
updated_count_number = visited_edges[current_edge] - 1
visited_edges.update({
current_edge: updated_count_number})
else:
updated_count_number = visited_edges[current_edge] - 1
visited_edges.update({
current_edge: updated_count_number})
raise Exception('Unknown Jump-Type')