一定的二进制指令的解释规则也即指令系统,这种指令就是字节码。
语言 | 字节码格式 |
---|---|
Python | pyc |
Java | class |
Apk | smali |
opcode是将python源代码进行编译之后的结果。在执行python程序时一般会先生成一个pyc文件,pyc文件就是编译后的结果,其中含有opcode序列。opcode和字节码的关系可以类比成汇编语言和机器语言的关系,不过前者是在虚拟机下,后者是在真机下。
import opcode
for op in range(len(opcode.opname)):
print('ox%.2X(%.3d): %s' % (op, op, opcode.opname[op]))
运行结果为:
ox00(000): <0>
ox01(001): POP_TOP
ox02(002): ROT_TWO
ox03(003): ROT_THREE
ox04(004): DUP_TOP
ox05(005): DUP_TOP_TWO
ox06(006): <6>
ox07(007): <7>
ox08(008): <8>
ox09(009): NOP
ox0A(010): UNARY_POSITIVE
ox0B(011): UNARY_NEGATIVE
ox0C(012): UNARY_NOT
ox0D(013): <13>
ox0E(014): <14>
ox0F(015): UNARY_INVERT
ox10(016): BINARY_MATRIX_MULTIPLY
ox11(017): INPLACE_MATRIX_MULTIPLY
ox12(018): <18>
ox13(019): BINARY_POWER
ox14(020): BINARY_MULTIPLY
ox15(021): <21>
ox16(022): BINARY_MODULO
ox17(023): BINARY_ADD
ox18(024): BINARY_SUBTRACT
ox19(025): BINARY_SUBSCR
ox1A(026): BINARY_FLOOR_DIVIDE
ox1B(027): BINARY_TRUE_DIVIDE
ox1C(028): INPLACE_FLOOR_DIVIDE
ox1D(029): INPLACE_TRUE_DIVIDE
ox1E(030): <30>
ox1F(031): <31>
ox20(032): <32>
ox21(033): <33>
ox22(034): <34>
ox23(035): <35>
ox24(036): <36>
ox25(037): <37>
ox26(038): <38>
ox27(039): <39>
ox28(040): <40>
ox29(041): <41>
ox2A(042): <42>
ox2B(043): <43>
ox2C(044): <44>
ox2D(045): <45>
ox2E(046): <46>
ox2F(047): <47>
ox30(048): <48>
ox31(049): <49>
ox32(050): GET_AITER
ox33(051): GET_ANEXT
ox34(052): BEFORE_ASYNC_WITH
ox35(053): <53>
ox36(054): <54>
ox37(055): INPLACE_ADD
ox38(056): INPLACE_SUBTRACT
ox39(057): INPLACE_MULTIPLY
ox3A(058): <58>
ox3B(059): INPLACE_MODULO
ox3C(060): STORE_SUBSCR
ox3D(061): DELETE_SUBSCR
ox3E(062): BINARY_LSHIFT
ox3F(063): BINARY_RSHIFT
ox40(064): BINARY_AND
ox41(065): BINARY_XOR
ox42(066): BINARY_OR
ox43(067): INPLACE_POWER
ox44(068): GET_ITER
ox45(069): GET_YIELD_FROM_ITER
ox46(070): PRINT_EXPR
ox47(071): LOAD_BUILD_CLASS
ox48(072): YIELD_FROM
ox49(073): GET_AWAITABLE
ox4A(074): <74>
ox4B(075): INPLACE_LSHIFT
ox4C(076): INPLACE_RSHIFT
ox4D(077): INPLACE_AND
ox4E(078): INPLACE_XOR
ox4F(079): INPLACE_OR
ox50(080): BREAK_LOOP
ox51(081): WITH_CLEANUP_START
ox52(082): WITH_CLEANUP_FINISH
ox53(083): RETURN_VALUE
ox54(084): IMPORT_STAR
ox55(085): SETUP_ANNOTATIONS
ox56(086): YIELD_VALUE
ox57(087): POP_BLOCK
ox58(088): END_FINALLY
ox59(089): POP_EXCEPT
ox5A(090): STORE_NAME
ox5B(091): DELETE_NAME
ox5C(092): UNPACK_SEQUENCE
ox5D(093): FOR_ITER
ox5E(094): UNPACK_EX
ox5F(095): STORE_ATTR
ox60(096): DELETE_ATTR
ox61(097): STORE_GLOBAL
ox62(098): DELETE_GLOBAL
ox63(099): <99>
ox64(100): LOAD_CONST
ox65(101): LOAD_NAME
ox66(102): BUILD_TUPLE
ox67(103): BUILD_LIST
ox68(104): BUILD_SET
ox69(105): BUILD_MAP
ox6A(106): LOAD_ATTR
ox6B(107): COMPARE_OP
ox6C(108): IMPORT_NAME
ox6D(109): IMPORT_FROM
ox6E(110): JUMP_FORWARD
ox6F(111): JUMP_IF_FALSE_OR_POP
ox70(112): JUMP_IF_TRUE_OR_POP
ox71(113): JUMP_ABSOLUTE
ox72(114): POP_JUMP_IF_FALSE
ox73(115): POP_JUMP_IF_TRUE
ox74(116): LOAD_GLOBAL
ox75(117): <117>
ox76(118): <118>
ox77(119): CONTINUE_LOOP
ox78(120): SETUP_LOOP
ox79(121): SETUP_EXCEPT
ox7A(122): SETUP_FINALLY
ox7B(123): <123>
ox7C(124): LOAD_FAST
ox7D(125): STORE_FAST
ox7E(126): DELETE_FAST
ox7F(127): <127>
ox80(128): <128>
ox81(129): <129>
ox82(130): RAISE_VARARGS
ox83(131): CALL_FUNCTION
ox84(132): MAKE_FUNCTION
ox85(133): BUILD_SLICE
ox86(134): <134>
ox87(135): LOAD_CLOSURE
ox88(136): LOAD_DEREF
ox89(137): STORE_DEREF
ox8A(138): DELETE_DEREF
ox8B(139): <139>
ox8C(140): <140>
ox8D(141): CALL_FUNCTION_KW
ox8E(142): CALL_FUNCTION_EX
ox8F(143): SETUP_WITH
ox90(144): EXTENDED_ARG
ox91(145): LIST_APPEND
ox92(146): SET_ADD
ox93(147): MAP_ADD
ox94(148): LOAD_CLASSDEREF
ox95(149): BUILD_LIST_UNPACK
ox96(150): BUILD_MAP_UNPACK
ox97(151): BUILD_MAP_UNPACK_WITH_CALL
ox98(152): BUILD_TUPLE_UNPACK
ox99(153): BUILD_SET_UNPACK
ox9A(154): SETUP_ASYNC_WITH
ox9B(155): FORMAT_VALUE
ox9C(156): BUILD_CONST_KEY_MAP
ox9D(157): BUILD_STRING
ox9E(158): BUILD_TUPLE_UNPACK_WITH_CALL
ox9F(159): <159>
oxA0(160): LOAD_METHOD
oxA1(161): CALL_METHOD
oxA2(162): <162>
oxA3(163): <163>
oxA4(164): <164>
oxA5(165): <165>
oxA6(166): <166>
oxA7(167): <167>
oxA8(168): <168>
oxA9(169): <169>
oxAA(170): <170>
oxAB(171): <171>
oxAC(172): <172>
oxAD(173): <173>
oxAE(174): <174>
oxAF(175): <175>
oxB0(176): <176>
oxB1(177): <177>
oxB2(178): <178>
oxB3(179): <179>
oxB4(180): <180>
oxB5(181): <181>
oxB6(182): <182>
oxB7(183): <183>
oxB8(184): <184>
oxB9(185): <185>
oxBA(186): <186>
oxBB(187): <187>
oxBC(188): <188>
oxBD(189): <189>
oxBE(190): <190>
oxBF(191): <191>
oxC0(192): <192>
oxC1(193): <193>
oxC2(194): <194>
oxC3(195): <195>
oxC4(196): <196>
oxC5(197): <197>
oxC6(198): <198>
oxC7(199): <199>
oxC8(200): <200>
oxC9(201): <201>
oxCA(202): <202>
oxCB(203): <203>
oxCC(204): <204>
oxCD(205): <205>
oxCE(206): <206>
oxCF(207): <207>
oxD0(208): <208>
oxD1(209): <209>
oxD2(210): <210>
oxD3(211): <211>
oxD4(212): <212>
oxD5(213): <213>
oxD6(214): <214>
oxD7(215): <215>
oxD8(216): <216>
oxD9(217): <217>
oxDA(218): <218>
oxDB(219): <219>
oxDC(220): <220>
oxDD(221): <221>
oxDE(222): <222>
oxDF(223): <223>
oxE0(224): <224>
oxE1(225): <225>
oxE2(226): <226>
oxE3(227): <227>
oxE4(228): <228>
oxE5(229): <229>
oxE6(230): <230>
oxE7(231): <231>
oxE8(232): <232>
oxE9(233): <233>
oxEA(234): <234>
oxEB(235): <235>
oxEC(236): <236>
oxED(237): <237>
oxEE(238): <238>
oxEF(239): <239>
oxF0(240): <240>
oxF1(241): <241>
oxF2(242): <242>
oxF3(243): <243>
oxF4(244): <244>
oxF5(245): <245>
oxF6(246): <246>
oxF7(247): <247>
oxF8(248): <248>
oxF9(249): <249>
oxFA(250): <250>
oxFB(251): <251>
oxFC(252): <252>
oxFD(253): <253>
oxFE(254): <254>
oxFF(255): <255>
字节码顾名思义是指令以字节为单位,最多只能表示256个不同的字节码指令,但是实际上Python只用了101条字节码指令。字节码指令分为两种,编码小于90的为无参数的指令,仅包含操作码自身,共1字节;大于等于90的指令还带有1个参数,参数长度为2字节,共3字节,但是对于带参数指令来说,实际上只会用两个字节,第三个字节恒为0x00。
Python程序的字节码在运行时以PyStringObject的形式保存在PyCodeObject的co_code域里。co_code域只含有指令而不包含别的程序数据,变量名、常量等数据均放在别的域里。
下面通过两个小例子来看opcode:
import dis
def test(x,y):
return x+y
dis.dis(test)
运行结果为:
3 0 LOAD_FAST 0 (x)
2 LOAD_FAST 1 (y)
4 BINARY_ADD
6 RETURN_VALUE
import dis
def func():
a=1
b=2
return a+b
dis.dis(func)
运行结果为:
3 0 LOAD_CONST 1 (1)
2 STORE_FAST 0 (a)
4 4 LOAD_CONST 2 (2)
6 STORE_FAST 1 (b)
5 8 LOAD_FAST 0 (a)
10 LOAD_FAST 1 (b)
12 BINARY_ADD
14 RETURN_VALUE
第一列是对应的源代码的行数,第二列是当前的字节码指令在co_code中的偏移位置,第三个字节是字节码指令,第四列是指令参数。
各字节码指令的含义:https://docs.python.org/3/library/dis.html
#define OFF(x) offsetof(PyCodeObject, x)
static PyMemberDef code_memberlist[] = {
{"co_argcount", T_INT, OFF(co_argcount), READONLY},
{"co_nlocals", T_INT, OFF(co_nlocals), READONLY},
{"co_stacksize",T_INT, OFF(co_stacksize), READONLY},
{"co_flags", T_INT, OFF(co_flags), READONLY},
{"co_code", T_OBJECT, OFF(co_code), READONLY},
{"co_consts", T_OBJECT, OFF(co_consts), READONLY},
{"co_names", T_OBJECT, OFF(co_names), READONLY},
{"co_varnames", T_OBJECT, OFF(co_varnames), READONLY},
{"co_freevars", T_OBJECT, OFF(co_freevars), READONLY},
{"co_cellvars", T_OBJECT, OFF(co_cellvars), READONLY},
{"co_filename", T_OBJECT, OFF(co_filename), READONLY},
{"co_name", T_OBJECT, OFF(co_name), READONLY},
{"co_firstlineno", T_INT, OFF(co_firstlineno), READONLY},
{"co_lnotab", T_OBJECT, OFF(co_lnotab), READONLY},
{NULL} /* Sentinel */
};
import marshal
import dis
f = open('hash.pyc', 'rb')
print(f.read(4))
print(f.read(4))
code = marshal.load(f)
print(code.co_code.encode("hex"))
print(code.co_consts)
print(code.co_names)
dis.disassemble(code)
运行结果如下:
�
ȸ�Z
6400006401006c00005a00006402005a01006403005a02006404005a0300787e00650400640500640600830200445d6d005a05006505006407006b0500725d0065010065030065050018146502006503006505001714175a06006e1a0065010065030065050017146502006503006505001814175a06006500006a07008300005a08006508006a0900650600830100016508006a0a008300004748712e005764010053
(-1, None, 'deadbeaf', '3&!2309', 4, 0, 6, 3)
('hashlib', 'a', 'b', 'c', 'range', 'i', 'st', 'md5', 'm', 'update', 'hexdigest')
1 0 LOAD_CONST 0 (-1)
3 LOAD_CONST 1 (None)
6 IMPORT_NAME 0 (hashlib)
9 STORE_NAME 0 (hashlib)
4 12 LOAD_CONST 2 ('deadbeaf')
15 STORE_NAME 1 (a)
5 18 LOAD_CONST 3 ('3&!2309')
21 STORE_NAME 2 (b)
6 24 LOAD_CONST 4 (4)
27 STORE_NAME 3 (c)
7 30 SETUP_LOOP 126 (to 159)
33 LOAD_NAME 4 (range)
36 LOAD_CONST 5 (0)
39 LOAD_CONST 6 (6)
42 CALL_FUNCTION 2
45 GET_ITER
>> 46 FOR_ITER 109 (to 158)
49 STORE_NAME 5 (i)
8 52 LOAD_NAME 5 (i)
55 LOAD_CONST 7 (3)
58 COMPARE_OP 5 (>=)
61 POP_JUMP_IF_FALSE 93
9 64 LOAD_NAME 1 (a)
67 LOAD_NAME 3 (c)
70 LOAD_NAME 5 (i)
73 BINARY_SUBTRACT
74 BINARY_MULTIPLY
75 LOAD_NAME 2 (b)
78 LOAD_NAME 3 (c)
81 LOAD_NAME 5 (i)
84 BINARY_ADD
85 BINARY_MULTIPLY
86 BINARY_ADD
87 STORE_NAME 6 (st)
90 JUMP_FORWARD 26 (to 119)
11 >> 93 LOAD_NAME 1 (a)
96 LOAD_NAME 3 (c)
99 LOAD_NAME 5 (i)
102 BINARY_ADD
103 BINARY_MULTIPLY
104 LOAD_NAME 2 (b)
107 LOAD_NAME 3 (c)
110 LOAD_NAME 5 (i)
113 BINARY_SUBTRACT
114 BINARY_MULTIPLY
115 BINARY_ADD
116 STORE_NAME 6 (st)
12 >> 119 LOAD_NAME 0 (hashlib)
122 LOAD_ATTR 7 (md5)
125 CALL_FUNCTION 0
128 STORE_NAME 8 (m)
13 131 LOAD_NAME 8 (m)
134 LOAD_ATTR 9 (update)
137 LOAD_NAME 6 (st)
140 CALL_FUNCTION 1
143 POP_TOP
14 144 LOAD_NAME 8 (m)
147 LOAD_ATTR 10 (hexdigest)
150 CALL_FUNCTION 0
153 PRINT_ITEM
154 PRINT_NEWLINE
155 JUMP_ABSOLUTE 46
>> 158 POP_BLOCK
>> 159 LOAD_CONST 1 (None)
162 RETURN_VALUE
查看readme内容:
小祥为了保护自己的代码,修改了部分Python Bytecode指令集,并把这个指令集称之为JPython,
JPython只能在他私人定制的环境上才能运行,其他人无法得到这个环境。
现在,小明为了获取小祥代码中的秘密,收集到了三个文件
hash.pyc 可以直接使用Python 2.7运行的脚本
Jhash.pyc 通过hash.pyc转化而成的只能在JPython环境上运行的脚本
Jflag.pyc 藏着小祥最大的秘密,但是只能在JPython的环境运行。
谁能帮助小明得到小祥代码里的秘密呢?
从中我们可以发现,hash.pyc是可以直接运行的,而Jhash.pyc相当于是可以在JPython环境上面运行的hash.pyc。然后给出可以在JPython环境上运行的脚本,让我们将这个脚本还原可以在python环境中运行,由此我们可以猜测Python指令集和JPython指令集应该有一种对应关系,而我们最重要的任务就是找出这种关系。
用010Edit打开hash.pyc和Jhash.pyc进行对比:
经过对比我们发现:
hash.pyc | Jhash.pyc | 操作 |
---|---|---|
0x64 | 0x94 | LOAD_CONST |
0x6c | 0x75 | IMPORT_NAME |
0x5a | 0x45 | STORE_NAME |
0x65 | 0x95 | LOAD_NAME |
0x18 | 0x27 | BINARY_SUBTRACT |
0x14 | 0x23 | BINARY_MULTIPLY |
0x17 | 0x26 | BINARY_ADD |
通过观察我们发现表格最后三列对应的操作正好是加减乘,于是我们猜测会不会还有一个除,BINARY_DIVIDE的字节码是0x15,观察对应关系我们猜测JPython环境中是0x24。
然后我们再看Jflag.pyc的字节码:
根据上面的对照表我们修改Jflag,将其还原成flag。
但是仍然不能反编译成功。查看opcode,发现 “1 0 JUMP_ABSOLUTE 12 ”,好吧,绝对跳转,字节码混淆。将“71 0c 00” nop掉。发现还是反编译不成功。把前四条指令全部nop掉,反编译出来的代码是
#!/usr/bin/env python
# visit http://tool.lu/pyc/ for more information
import base64
import sys
flag = sys.argv[1]
jd = 'jd'
if len(flag) == 30:
base64_str = base64.b64encode(flag + '+1s+1s+1s' + jd * 2)
b = ''
for i in range(0, 44):
head = ord(base64_str[i]) / 10
b += chr(ord(base64_str[i]) ^ 7)
if b == '^P]mc@]0emE7VOE2_}A}VBwpbQ?5e5>lN4UwSSM>L}A}':
print 'Congratulations!You Get Flag'
else:
print 'Wrong!'
解密代码为:
import base64
enc = '^P]mc@]0emE7VOE2_}A}VBwpbQ?5e5>lN4UwSSM>L}A}'
enc_ = ""
for c in enc:
enc_ += chr(ord(c) ^ 7)
flag = base64.b64decode(enc_)
print flag
得到flag为:afctf{n0t@py_1s@Jpy_6ood#tiM2}