译者注:学习汇编语言的时候用的是王爽老师的那本,里面讲的是Intel系统的语法。但是在实际工作中,都是在Linux下研究C,就发现GCC编译后的汇编代码都看不懂。
The difference(不同之处)
--------------
This document is more related to coding than hacking, although assembly is a very useful programming language, as its machine level and provides direct access to the CPU, hardware, etc. Now in all Unix-derived systems, the compilers like gcc use att syntax assembly and not intel. For example: movl %esp, %ebp
汇编语言是直接面向硬件的,他提供了对CPU,硬件等的直接访问。因此,汇编语言是一门非常有用的编程语言。但是本篇文档将更多滴描述程序编写的一些东西,而不是怎么将汇编的功能发挥到极致。现在,在所有的类Unix系统中,类此GCC这样的编译器使用的是AT&T汇编语法系统,而不是Intel系统。比如:movl %esp, %ebp
Now this is unfortunate for DOS assembly programmers who recently switched to Unix-derived systems. They are used to Intel syntax, whereas Linux (and others) uses AT&T syntax. Where in the example above you would use: mov ebp, esp.
这对那些以前在DOS下编写汇编语言的程序员来说确实很悲催。因为DOS下面使用的是Intel语法系列,而不是AT&T系列。上面那句汇编在DOS洗啊是这样的:mov ebp, esp
I wrote this because I have only seen one document that explained the differences between AT&T and Intel syntax. That document was the GAS (GNU assembler) reference manual.
我之所以要写这篇文章,是因为我发现只有一篇文档讲到了这两个汇编语法体系的区别:GAS (GNU assembler) reference manual
You can get the GAS reference manual at:http://www.cs.utah.edu/csinfo/texinfo under "gas".
可以从这个网址获取这篇手册。
First let me give a few examples.
Intel: push 4
AT&T: pushl $4
首先,让我们看些例子:
Intel: push 4
AT&T: push $4
All the immediate operands have a $ in front of them, in intel syntax, you don't have prefix.
在AT&T语法中,所有立即数的前面都加了一个$符号。Intel语法中,立即数是没有这个前缀的。
The register operands, have a % in front of them, intel has none.
Intel: mov eax, 4
att: movl $4, %eax
寄存器操作数前面都加了一个百分号%,Intel语法中是没有。
Intel: mov eax, 4
ATT: movl $4, %eax
You notice there is a diff in intel/att's src/dst... 也许你已经注意到了,这两个语法系统在 数据源/目标位置 的书写顺序上是不同的。
Intel: you do dst, src like mov ax, 2 Intel: 顺序是这样的:数据源,目标位置 比如:mov ax, 2
att: it's the opposite, src, dst like movl $2, %ax att: 顺序刚好反了:目标位置,数据源 比如:movl $2, %ax
You can use 'b' for byte, 'w' for word, 'l' for long, etc...as the memory suffix: 在att语法中,我们可以使用以下后缀来标识所操作数据的长度:b 字节 w 字 l 长字
movl, movb, movw, etc.
in intel you wold do this like mov ax, byte ptr foo... 在Intel中,是这样表示操作数长度的:mov ax, byte ptr foo
The far instruction for att is lret $stack-adjust, in intel it's ret far stack-adjust.
在AT&T语法中,远返回指令是lret $stack-adjust,而Intel使用retfar stack-adjust。
The l in front of mov, is the byte/memory operand..... this is actually more convient if you ask me.
我认为在mov指令前加l,来标识字节操作数,真是太方便了!
In Intel you have: Intel的指令系统里有:
section:[base + index*scale + disp] 段地址:[基地址 + index*scale + 偏移量]
disp = displacement 偏移量
scale = 1 if not given
In AT&T, however, you would have:
section:disp(base, index, scale)
So "es:[ebp-5]" in Intel would be "%es:-4(%ebp)" in AT&T syntax.
Intel: [foo] AT&T: foo(,1) the ,1 means an index of one...
Intel: [foor + eax*4] AT&T: foor(, %eax, 4)
I hope this helps :)
How to Get some assembly examples in unix: 怎样在Unix系统下搞一段AT&T语法的汇编看看
-----------------------------------------
Now how to get a few examples on how to get some assembly code for Unix.
Use this (assuming you called it test.c):
现在看下怎么样在Unix下获取一段汇编代码。使用下面这一段C代码,保存到tesst.c。然后运行 gcc -S test.c,会产生test.s。
void main()
{
printf("hi\n");
}
now to compile it, do gcc -S test.c, this will make a file test.s in
assembly......look at it it contains great info....and some examples of
the macros and what not defined/shown in gas' (GNU assembler) manual.
(Which can be found at http://www.cs.utah.edu/csinfo/texinfo, under gas.
here is what test.s will look like:
.file "test.c"
.version "01.01"
gcc2_compiled.:
.section .rodata
.LC0:
.string "test\n"
.text
.align 4
.globl main
.type main,@function
main:
pushl %ebp
movl %esp,%ebp
pushl $.LC0
call printf
addl $4,%esp
.L1:
leave
ret
.Lfe1:
.size main,.Lfe1-main
.ident "GCC: (GNU) 2.7.2.1"
As you know, the l's in front of push, mov, add, etc....that means it's
type long. and the % goes in front of all register operands, whereas in
intel syntax, it is undelimited. Likewise, the immediate operands, have a
'$' in front of them, whereas once again, intel is undelimited.
movl $3, %eax
is equal to:
mov eax, 3
in intel
The other way to get asm code is with gdb......you compile your program
with gcc -g .......and for even more......gcc -g -a...
here is our test.c ......in gdb,
获得汇编(asm)代码的另一种方法是使用GDB。使用GCC编译程序的时候加上-g选项,也可是-g -a,更详细。
进入GDB以后,执行 'disassemble main'
we do 'disassemble main':
(gdb) disassemble main
Dump of assembler code for function main:
0x8048474 <main>: pushl %ebp
0x8048475 <main+1>: movl %esp,%ebp
0x8048477 <main+3>: pushl $0x80484c8
0x804847c <main+8>: call 0x8048378 <printf>
0x8048481 <main+13>: addl $0x4,%esp
0x8048484 <main+16>: leave
0x8048485 <main+17>: ret
End of assembler dump.
That is with just -g.......with -a as well you can see the difference
(more instructions show up that usually wouldn't):
(gdb) disassemble main
Dump of assembler code for function main:
0x80485d8 <main>: pushl %ebp
0x80485d9 <main+1>: movl %esp,%ebp
0x80485db <main+3>: cmpl $0x0,0x8049a6c
0x80485e2 <main+10>: jne 0x80485f1 <main+25>
0x80485e4 <main+12>: pushl $0x8049a6c
0x80485e9 <main+17>: call 0x80488fc <__bb_init_func>
0x80485ee <main+22>: addl $0x4,%esp
0x80485f1 <main+25>: incl 0x8049b78
0x80485f7 <main+31>: pushl $0x8048978
0x80485fc <main+36>: call 0x8048468 <printf>
0x8048601 <main+41>: addl $0x4,%esp
0x8048604 <main+44>: incl 0x8049b7c
0x804860a <main+50>: leave
0x804860b <main+51>: ret
End of assembler dump.
I of course need to give credit of this to the gas manual, as parts were taken from there.