解读PE/COFF文件格式

Windows平台内开发操作系统,在用Cygwin版的GCC编译源文件是遇到这样一个问题,如果使用gcc –c bootpack.c会生成bootpack.o文件,使用HEX编辑器打开改文件你会看到这样的代码:

解读PE/COFF文件格式_第1张图片

1

代码中红色框内的可读文字是COFF标准文件格式中定义的文字。这些文字以及其他部分二进制,都是程序本身以外定义的用来便于系统运行的。这些辅助的数据可以被windows识别,但是不能被Linux以及其他操作系统识别。更不用说是自己开发的小型操作系统。

那么如何解决这个问题呢?Cygwin内提供了很多处理二进制文件的工具,例如objcopy,运行objcopy bootpack.o –O binary bootpack.bin。我们再来看看bootpack.bin的内容。

2

内容少了很多,上图中红色框的数据都没有了。再仔细看看就发现图2中的数字来自图1 0x0000008C位置到0x000000CB位置的内容。这些内容才是真正的执行指令。那么bootpack.o大部分内容都有什么含意和作用呢?出于好奇心,我在微软下载了Visual Studio, Microsoft Portable Executable and Common Object File Format Specification。并根据该文件编写了一个解析.exe,.obj,.dll等文件的小程序olink

 

先看看使用该程序解析上文bootpack.o后会是什么输出结果。

bootpack.c的源代码很简单:

/*Colimas Simple OS*/

void io_hlt(void);

void write_mem8(int addr, int data);

//entry

void ColimasMain(void)

{

                                                                                                                                             int i;

                                                                                                                                             for(i=0xa000;i<=0xaffff;i++){

                                                                                                                                                          write_mem8(i,15);

                                                                                                                                             }

                                                                                                                                            

                                                                                                                                             for(;;)    io_hlt();

 

}

 

您完全可以忽略源代码的具体内容。使用olink bootpack.o的结果如下:

This is an image file.

1. Image file header info:

              Image file machine type:Intel 386 or later processors and compatible processors

              The number of sections:3

              Number of symbols:12

              Pointer of symbols table:0xe0

              Characteristics:

                            Machine is based on a 32-bit-word architecture.

2. The sections info of image file:

              1 .text:

                            The virtual size       :0

                            The virtual address    :0x0

                            The size of raw data   :64

                            The pointer to raw data:0x8c

                            The characteristics of the section:

                                          The section contains executable code.

                                          The section contains initialized data.

              The section has relocations.

              2 .data:

                            The virtual size       :0

                            The virtual address    :0x0

                            The size of raw data   :0

                            The pointer to raw data:0x0

                            The characteristics of the section:

                                          The section contains initialized data.

                                          The section contains uninitialized data.

              3 .bss:

                            The virtual size       :0

                            The virtual address    :0x0

                            The size of raw data   :0

                            The pointer to raw data:0x0

                            The characteristics of the section:

                                          The section contains initialized data.

                                          The section contains uninitialized data.

3. Symbol table of image file(12).

              1. .file

                            Value:Not yet assigned a section.

                            type:Base type.

                            Storage class:A value that Microsoft tools, as well as traditional COFF format, use for the source-file symbol record.

                            Number of section:-2

              2. Files

                            name:bootpack.c

              3. _ColimasMain

                            Value:Not yet assigned a section.

                            type:A function that returns a base type.

                            Storage class:A value that Microsoft tools use for external symbols.

                            Number of section:1

              4. Function Definitions

                            Tag index:0

                            Total size:0

                            Pointer to line number:0x0

                            Pointer to next function:0x0

              5. .text

                            Value:Not yet assigned a section.

                            type:Base type.

                            Storage class:The offset of the symbol within the section.

                            Number of section:1

              6. Section Definitions:

                            Length:55

                            Number of relocations:2

                            Number of line numbers:0

                            One-based index into the section table:0

              7. .data

                            Value:Not yet assigned a section.

                            type:Base type.

                            Storage class:The offset of the symbol within the section.

                            Number of section:2

              8. Section Definitions:

                            Length:0

                            Number of relocations:0

                            Number of line numbers:0

                            One-based index into the section table:0

              9. .bss

                            Value:Not yet assigned a section.

                            type:Base type.

                            Storage class:The offset of the symbol within the section.

                            Number of section:3

              10. Section Definitions:

                            Length:0

                            Number of relocations:0

                            Number of line numbers:0

                            One-based index into the section table:0

              11. _io_hlt

                            Value:Not yet assigned a section.

                            type:A function that returns a base type.

                            Storage class:A value that Microsoft tools use for external symbols.

                            Number of section:0

              12. _write_mem8

                            Value:Not yet assigned a section.

                            type:A function that returns a base type.

                            Storage class:A value that Microsoft tools use for external symbols.

                            Number of section:0

看上内容如此之多,其实跟其他负责的EXE程序相比,这个输出结果已经很少了,毕竟源文件很简单,也没有使用任何的动态链接库。如果你急于看看更复杂的结果,请在调试模式下编译生成的中间文件obj的结果吧。调试模式下的obj保存着调试用的代码行数,已经其他信息。这是为什么调试模式下的文件要比Release模式下编译的文件大,而Release模式下编译的文件无法调试的原因。

olink程序实现并不复杂,由于我有曾经解析Java Class文件的经验,这次的实现变得更为轻松。程序简单分为2步,获取数据和输出结果。

获取的数据有:

1. PE/COFF文件头数据,该数据包括编译机器的类型,例如上文输出结果中的Image file machine type:Intel 386 or later processors and compatible processorsSection大小,Section指的是文件内容被分为不同类型,例如,代码为.text section,而数据则定义在.data section等;TimeDateStamp,本文略过;符号表地址PointerToSymbolTable,符号表指的是文件内作为各种标识的ASCII符号以及一些属性值,例如一个函数名,以及该函数指令的地址;符号个数NumberOfSymbols可选文件头数据大小,Optional header info of image file,该数据是存在于.exe,.dll文件里,上文中间文件的输出结果中就没有;文件特性Characteristics,例如Image only, Windows CE, and Microsoft Windows NT and later,或The image file is a dynamic-link library (DLL),或The image file is a system file, not a user program等等。

2. 可选文件头数据,该数据内容有:文件标示Magic,所有PE32格式的Magic0x10bPE32+格式的为0x20bPE32+允许64位的地址空间;连接器版本号;指令(.text section)总长度;初始化数据(.data)长度;未初始化数据(.bss)长度;该文件所需要的子系统,例如:Device drivers and native Windows processes, 或The Windows graphical user interface (GUI) subsystem,或Windows CE,或XBOX等;程序入口地址,例如WinMainmain等;Data Directories,一组数据,每组包括数据地址和长度,这些数据分别表示Export TableImport TableResource TableException TableBase Relocation TableDebug等;还有一些数据,详见PE/COFF格式规范。

3. Section表,每个表定长40Bytes。包括名称,例如.text, .data, .bss等;长度;地址;Section标志,例如该Section包含可执行代码,或者包含初始化数据,或者包含未初始化数据等。

4. 符号表,Symbol Table。包括符号名称,例如函数名,Section名等;所在Section Number;类型,例如该符号是类型名还是函数名;

5. 字符串表,表内保存着所有符号表所需要的超过8bytes的字符串。

olink解析的文件内容可以看出PE/COFF文件格式的复杂和健全性。

 

你可能感兴趣的:(Compiler,微软技术)