NASM的ORG 0100h的实际含义

PSP程序段前缀

要了解ORG 0100h,就必须先了解程序段前缀PSP(Program Segment Prefix)

 

程序段前缀是一个操作系统(DOS)概念。当输入一个外部命令或通过EXEC子功能(系统功能调用INT21h,子功能号为4Bh)加载一个程序时,COMMAND确定当前可用内存的最低端作为程序段的起点,也就是程序被加载到内存空间的起点。在程序所占用内存的前256(0100h)个字节中,DOS会为程序创建前缀(PSP)数据区。PSP结构与CP/M中的“控制区域”概念十分接近,这是因为DOS就是从CP/M演变而来的。

 

DOS利用PSP与被加载的程序进行通信。PSP中有程序的返回地址、程序文件名等信息。

详细信息可以参考:

http://www.programfan.com/blog/article.asp?id=25154,PSP格式

http://wenku.baidu.com/view/b5f039addd3383c4bb4cd293.html,PSP介绍

 

虽然第2篇文档中,介绍说PSP和被加载程序在内存上是物理地址上连续的,却拥有不同的段地址,但我认为这是错误的,因为PSP如果跟程序拥有不同的段地址,那么就没有必要引入ORG 0100h了,通过与PSP不同的段地址,应该就可以直接访问到正确的内存,而不需要加上0100h的偏移量。而且与“程序段前缀”这个名字相呼应,PSP应该与程序使用同样的段基址,只是PSP是个前缀。

你需要了解的几个规则

有了上面的知识,我们再来看ORG 0100h,下面我介绍几个规则,你可能一下看不明白其中一些,不过没关系,继续往后看,看完整篇文章后,再回过头来,你就明白了。

1.        16位DOS中,内存的物理地址=段地址*16 + 偏移量,比如段地址0xC0h,偏移量0x50h,则最后的物理地址是0xC0h * 16 + 0x50h = 0xC50h

2.        ORG用来告诉汇编器,程序加载到内存时的初始偏移量为0x100h,用于跳过PSP。
比如你有一个标号Test的偏移地址是0x0Bh,当编译器看见ORG 0x100h后,就会给这个偏移加上0x100h,编译完成的.com文件中,这个偏移就变成了0x10Bh。
如果你没有加ORG 0100h,则偏移仍然是0x0Bh,则访问该标号时,就跑到PSP里了,因为程序段的前0100h个字节,都是PSP的数据,而不是用户数据。
同理,你写ORG 0x200h,则该标号的偏移地址在编译的.com文件中,会变成0x20Bh。

3.        DOS下的.com程序肯定被加载的CS段+0x100h处,因为段的前0100h(256字节)是留给PSP的。这是操作系统DOS的概念,跟ORG没有关系,所以你写ORG 0200h,也不会让程序加载到CS段的0200h处。

4.        JMP、CALL等会改变IP指针的指令所涉及的偏移地址,在程序载入内存后,会自动加上0100h偏移,如你在代码中写JMP 0x0Bh,在程序载入内存后,会自动修正为JMP 0x10Bh。而其他指令不会做自动修正,如你写MOV AX, 0x000Bh,程序载入内存后,仍然是MOV AX, 0x000Bh。

5.        ORG指定偏移是一个编译期概念,编译产生的汇编程序就已经加上了0100h的偏移。而JMP等指令的偏移是在操作系统加载com文件到内存时调整的。

示例

这几个例子都不能执行,得益于binary格式文件的所见即所得,及你编的什么,输出的就是什么,我们可以通过这几个小例子清晰的看到PSP和ORG 0100h对程序偏移地址的影响

示例1,不使用ORG 0100h

例子1:test1.asm

    jmp test
    jmp short 0x04
test:
    mov ax, test
    mov ax, 0x0004

这个例子非常简单,就是执行jmp和mov,有通过标号的,有通过立即数的。

编译

$ nasm –o test1.com test1.asm

先反汇编看看

$ ndisasm test1.com

00000000  EB02              jmp short 0x4
00000002  EB00              jmp short 0x4
00000004  B80400            mov ax,0x4
00000007  B80400            mov ax,0x4

标号已经被替换为实际的偏移地址0x04。

下面我们用Turbo Debuger调试test1.com看看


图1


图2

图1是TD中显示的机器指令和汇编指令,图2是各个段的基址。

我们可以看到,IP指针指向0100h,CS和DS段地址都是01445h,也就是说,程序被加载014550h,跳过了PSP。

我们先看图1最后的两条MOV指令

MOV ax, 0004 【原始语句是MOV ax, test】
MOV ax,0004

这跟我们在test1.asm中写的是一样。在这里,我们可以看到这个程序肯定有问题,因为MOV AX, test,是希望将标号test的地址放入AX中,可是MOV AX, 0004h,肯定是读到PSP中去了。

 

另外,我们也注意到,jmp指令中的偏移地址发生了变化,不再是代码和反汇编中的0x04h,而变成了0x104h。这就是规则3中说明的,JMP,CALL等会改变IP指针的指令涉及的偏移地址,程序载入内存后,会自动加上0x100h。

 

因为有了上面的问题,因此我们需要加入ORG 0100h,以访问正确的偏移,下面看例子2

示例2,使用ORG 0100h

例子2:test2.asm

    org0x100
    jmp test
    jmp short 0x04
test:
    mov ax, test
    movax, 0x0004

在这个例子中,我们在代码头部加上了org 0x100,指示编译器给我们的标号偏移加上0x100。编译然后反编译看看

$ nasm –o test1.com test1.asm

$ ndisasm test1.com

为了方便对比,我把例子1和例子2反编译后的代码都贴出来了。

例子1:

00000000  EB02    jmp short 0x4
00000002  EB00    jmp short 0x4
00000004  B80400  mov ax,0x4
00000007  B80400  mov ax,0x4

 

例子2:

00000000  EB02    jmp short 0x4
00000002  EB00    jmp short 0x4
00000004  B80401  mov ax,0x104
00000007  B80400  mov ax,0x4

注意看,MOV AX, test,在例子1中是mov ax, 0xb,而例子2中是mov ax, 0x10b,0xBh加上了0x100h的偏移。同时,JMP指令的偏移并没有被修改。

这就是ORG 0100h的用途,它告诉编译器,程序被载入内存的初始偏移地址为0100h,因此编译产生的代码中,涉及MOV等指令的标号的偏移量都加上了0x100h。

同时我们也注意到代码最后的MOV AX, 0x0b,在反编译的代码中并没有被修改为MOV AX, 0x10B。这是因为我们是试图向AX中传入一个立即数,因此不会被编译器修改。

下面放入DOS中,用TD打开看看


图3


图4

可以看到,图3中,mov和jmp都已经指向正确到偏移。

总结

1.        参考第1部分提到的几个规则

2.        ORG地址偏移调整是编译期概念,而JMP等会修改IP指针的指令的地址偏移是程序被操作系统加载到内存时进行的

 

问题是:如果PSP是一个操作系统的概念,那么DOS是怎么在加载程序时,修正程序中的JMP等指令的地址偏移的?

附录

为了方便查看,现把NASM手册中,关于ORG的几节贴在最后

6.1.1 ORG: Binary File Program Origin

Thebinformat provides an additional directive to the list given in chapter5: ORG.The function of the ORG directive is to specify the origin address whichNASM will assume the program begins at when it is loaded into memory.

Forexample, the following code will generate the longword 0x00000104:

        org     0x100 
        dd      label 
label:

Unlikethe ORGdirective provided by MASM-compatible assemblers, which allows you to jumparound in the object file and overwrite code you have already generated, NASM'sORGdoes exactly what the directive says: origin. Its sole function is to specify one offsetwhich is added to all internal address references within the section; it doesnot permit any of the trickery that MASM's version does. See section11.1.3 for further comments.

7.2.1 Using the bin Format To Generate .COM Files

.COM files expectto be loaded at offset 100h into their segment (though the segment maychange). Execution then begins at 100h, i.e. right at the start of the program. So towrite a .COMprogram, you would create a source file looking like

        org 100h 
 
section .text 
 
start: 
        ; put your code here 
 
section .data 
 
        ; put data items here 
 
section .bss 
 
        ; put uninitialized data here

Thebinformat puts the .text section first in the file, so you can declaredata or BSS items before beginning to write code if you want to and the codewill still end up at the front of the file where it belongs.

TheBSS (uninitialized data) section does not take up space in the .COMfile itself: instead, addresses of BSS items are resolved to point at spacebeyond the end of the file, on the grounds that this will be free memory whenthe program is run. Therefore you should not rely on your BSS being initializedto all zeros when you run.

Toassemble the above program, you should use a command line like

nasm myprog.asm -fbin -o myprog.com

Thebinformat would produce a file called myprog if no explicit output file name were specified,so you have to override it and give the desired file name.

7.2.2 Using the obj Format To Generate .COM Files

Ifyou are writing a .COM program as more than one module, you may wishto assemble several .OBJ files and link them together into a .COMprogram. You can do this, provided you have a linker capable of outputting .COMfiles directly (TLINK does this), or alternatively a converter program such as EXE2BINto transform the .EXE file output from the linker into a .COMfile.

Ifyou do this, you need to take care of several things:

·        The first object file containing code should start its code segment with a line like RESB100h. This is to ensure that the code begins at offset 100h relative to the beginning of the code segment, so that the linker or converter program does not have to adjust address references within the file when generating the .COM file. Other assemblers use an ORG directive for this purpose, but ORG in NASM is a format-specific directive to the bin output format, and does not mean the same thing as it does in MASM-compatible assemblers.

·        You don't need to define a stack segment.

·        All your segments should be in the same group, so that every time your code or data references a symbol offset, all offsets are relative to the same segment base. This is because, when a .COM file is loaded, all the segment registers contain the same value.

11.1.3 ORG Doesn't Work

Peoplewriting boot sector programs in the bin format often complain that ORGdoesn't work the way they'd like: in order to place the 0xAA55signature word at the end of a 512-byte boot sector, people who are used toMASM tend to code

        ORG 0 
 
        ; some boot sector code 
 
        ORG 510 
        DW 0xAA55

Thisis not the intended use of the ORG directive in NASM, and will not work. Thecorrect way to solve this problem in NASM is to use the TIMESdirective, like this:

        ORG 0 
 
        ; some boot sector code 
 
        TIMES 510-($-$$) DB 0 
        DW 0xAA55

TheTIMESdirective will insert exactly enough zero bytes into the output to move theassembly point up to 510. This method also has the advantage that if youaccidentally fill your boot sector too full, NASM will catch the problem atassembly time and report it, so you won't end up with a boot sector that youhave to disassemble to find out what's wrong with it.

 

你可能感兴趣的:(OS,Ubuntu/Linux)