前言:在学校时,学过的汇编是Intel语言的汇编,编译器是MASM,使用的是DOS系统。惭愧的是那时没有对汇编语言有较深入的研究,有许多问题也不慎了解,迷迷糊糊至今。最近,在看《使用开源软件-自己写操作系统》http://code.google.com/p/writeos/ 和《自己动手写操作系统》里面提到了GNU AS编译器 和NASM编译器 ,于是,重新复习了一下汇编语言编程的相关知识,对汇编语言有了进一步的了解。
在Linux0.11内核源码中,bootsect.s和setup.s是实模式下运行的16位代码程序,采用近似Intel的汇编语言语法并且需要使用Intel8086汇编编译器和连接器as86和Ld86,而head.s使用GNU的汇编程序格式,并且运行在保护模式下,需要GNU的as(gas)进行编译,使用的是AT&T语法。
Linus使用这两种编译器的原因是linus那时的汇编编译器无法支持16位实模式代码程序编译,在内核2.4.x开始,bootsect.s和head.s程序完全使用统一的as来编写。关于GNU as的使用,可参考GNU汇编器手册《Using as-The GNU Assembler》。由此可见,汇编 语法 与 编译器 是相互对应的。看来,应该了解一下编译原理。。。
DOS下的汇编语言编程:
在电脑上安装了纯DOS系统,至于如何安装请参考http://blog.csdn.net/sanlinux/archive/2010/05/01/5549311.aspx
安装了DOS了以后,再下载MASM611编译器,并安装,这样DOS下的汇编环境就搭建好了。MASM使用的是Intel语法,在学校用的就是这种语法,所以现在对这种语法比较熟悉,心里面比较容易接受。
Linux下的汇编语言编程:
一般GNU/Linux系统都会安装好了GNU Assembler,所以就不用单独安装了,可以直接使用了。GAS使用的是AT&T语法。
此外,还有一个汇编编译器-NASM,它既可以在Linux中使用,也可在Windows中使用,它使用的语法是Intel语法,与MASM类似。
Intel语法和AT&T语法的区别:
以下是一段关于两者区别的描述
http://www1.imada.sdu.dk/~kslarsen/dm516/Litteratur/IntelnATT.htm
Intel and AT&T syntax Assembly language are very different from each other in appearance, and this will lead to confusion when one first comes across AT&T syntax after having learnt Intel syntax first, or vice versa. So lets start with the basics.
In Intel syntax there are no register prefixes or immed prefixes. In AT&T however registers are prefixed with a '%' and immed's are prefixed with a '$'. Intel syntax hexadecimal or binary immed data are suffixed with 'h' and 'b' respectively. Also if the first hexadecimal digit is a letter then the value is prefixed by a '0'.
Example:
Intex Syntax mov mov int |
AT&T Syntax movl movl int |
The direction of the operands in Intel syntax is opposite from that of AT&T syntax. In Intel syntax the first operand is the destination, and the second operand is the source whereas in AT&T syntax the first operand is the source and the second operand is the destination. The advantage of AT&T syntax in this situation is obvious. We read from left to right, we write from left to right, so this way is only natural.
Example:
Intex Syntax instr mov |
AT&T Syntax instr movl |
Memory operands as seen above are different also. In Intel syntax the base register is enclosed in '[' and ']' whereas in AT&T syntax it is enclosed in '(' and ')'.
Example:
Intex Syntax mov mov |
AT&T Syntax movl movl |
The AT&T form for instructions involving complex operations is very obscure compared to Intel syntax. The Intel syntax form of these is segreg:[base+index*scale+disp]. The AT&T syntax form is %segreg:disp(base,index,scale).
Index/scale/disp/segreg are all optional and can simply be left out. Scale, if not specified and index is specified, defaults to 1. Segreg depends on the instruction and whether the app is being run in real mode or pmode. In real mode it depends on the instruction whereas in pmode its unnecessary. Immediate data used should not '$' prefixed in AT&T when used for scale/disp.
Example:
Intel Syntax instr mov add lea sub |
AT&T Syntax instr movl addl leal subl |
As you can see, AT&T is very obscure. [base+index*scale+disp] makes more sense at a glance than disp(base,index,scale).
As you may have noticed, the AT&T syntax mnemonics have a suffix. The significance of this suffix is that of operand size. 'l' is for long, 'w' is for word, and 'b' is for byte. Intel syntax has similar directives for use with memory operands, i.e. byte ptr, word ptr, dword ptr. "dword" of course corresponding to "long". This is similar to type casting in C but it doesnt seem to be necessary since the size of registers used is the assumed datatype.
Example:
Intel Syntax mov mov mov mov |
AT&T Syntax movb movw movl movl |
官网或在线文档
NASM(Netwide Assembler)
The Netwide Assembler, NASM, is an 80x86 and x86-64 assembler designed for portability and modularity. It supports a range of object file formats, including Linux and *BSD
a.out
, ELF
, COFF
, Mach-O
, Microsoft 16-bit OBJ
, Win32
and Win64
. It will also output plain binary files. Its syntax is designed to be simple and easy to understand, similar to Intel's but less complex . It supports all currently known x86 architectural extensions, and has strong support for macros.
The Netwide Assembler grew out of an idea on comp.lang.asm.x86
(or possibly alt.lang.asm
- I forget which), which was essentially that there didn't seem to be a good free x86-series assembler around, and that maybe someone ought to write one
a86
is good, but not free, and in particular you don't get any 32-bit capability until you pay. It's DOS only, too. gas
is free, and ports over to DOS and Unix, but it's not very good, since it's designed to be a back end to gcc
, which always feeds it correct code. So its error checking is minimal. Also, its syntax is horrible, from the point of view of anyone trying to actually write anything in it. Plus you can't write 16-bit code in it (properly.) as86
is specific to Minix and Linux , and (my version at least) doesn't seem to have much (or any) documentation. MASM
isn't very good, and it's (was) expensive, and it runs only under DOS. TASM
is better, but still strives for MASM compatibility, which means millions of directives and tons of red tape. And its syntax is essentially MASM's, with the contradictions and quirks that entails (although it sorts out some of those by means of Ideal mode.) It's expensive too. And it's DOS-only. http://www.nasm.us/
http://www.nasm.us/doc/
GNU Assembler
http://tigcc.ticalc.org/doc/gnuasm.html
http://sources.redhat.com/binutils/docs-2.12/as.info/
《Using as, the GNU Assembler 》
http://sourceware.org/binutils/docs/as/index.html