python之pefile模块(解析PE)

发现很多的朋友经常用到PE格式相关的开发,如解析PE文件的格式,获取相关的内容。比如常常用到的静态的病毒启发式检测模型的建立、病毒样本分类、查壳脱壳等。
搜索了一下发现论坛里面没有我要讲的这个东西,于是我在这里向大家推荐pefile这个python库。

这个是基于MIT licence的一个开源项目,你可以在上面做更多的开发。
开发包的下载地址
http://code.google.com/p/pefile/

我觉得有以下几点大家可以注意:
1. 这个需要使用python语言开发,优点是敏捷开发,方便快捷,而且源代码可读,易懂,当然肯定不会用于商业的,作为学习研究非常方便。
2. 由于基于PE的结构pefile已经做了非常充分的解析,所以对于我们做二次开发非常方便。各种关键的数据结构能够非常容易的获得。
3. 由于python的编写的快速、低门槛。另外pefile已经做了很多的功能,这个pefile模块非常适合需要快速达到目的和一些需要入门的朋友。
4. 免费的开源项目

话不多说,直接教大家使用,看完后,方可知道pefile的强大。

代码:
1. 当然是要安装python开发包。
2. 下载pefile到本地,解压,新建一个文件petest.py
首先实验一

代码:
实验一
import os, string, shutil,re
import pefile ##记得import pefile

PEfile_Path = r"C:\temp\test.exe"

pe = pefile.PE(PEfile_Path)
print PEfile_Path
print pe
代码:
实验一结果
C:\temp\test.exe
----------DOS_HEADER----------

[IMAGE_DOS_HEADER]
e_magic:                       0x5A4D    
e_cblp:                        0x90      
e_cp:                          0x3       
e_crlc:                        0x0       
e_cparhdr:                     0x4       
e_minalloc:                    0x0       
e_maxalloc:                    0xFFFF    
e_ss:                          0x0       
e_sp:                          0xB8      
e_csum:                        0x0       
e_ip:                          0x0       
e_cs:                          0x0       
e_lfarlc:                      0x40      
e_ovno:                        0x0       
e_res:                         
e_oemid:                       0x0       
e_oeminfo:                     0x0       
e_res2:                        
e_lfanew:                      0xD0      

----------NT_HEADERS----------

[IMAGE_NT_HEADERS]
Signature:                     0x4550    

----------FILE_HEADER----------

[IMAGE_FILE_HEADER]
Machine:                       0x14C     
NumberOfSections:              0x2       
TimeDateStamp:                 0x46A8C07C [Thu Jul 26 15:40:44 2007 UTC]
PointerToSymbolTable:          0x0       
NumberOfSymbols:               0x0       
SizeOfOptionalHeader:          0xE0      
Characteristics:               0x10F     
Flags: IMAGE_FILE_LOCAL_SYMS_STRIPPED, IMAGE_FILE_32BIT_MACHINE, IMAGE_FILE_EXECUTABLE_IMAGE, IMAGE_FILE_LINE_NUMS_STRIPPED, IMAGE_FILE_RELOCS_STRIPPED

----------OPTIONAL_HEADER----------

[IMAGE_OPTIONAL_HEADER]
Magic:                         0x10B     
MajorLinkerVersion:            0x6       
MinorLinkerVersion:            0x0       
SizeOfCode:                    0x420     
SizeOfInitializedData:         0x130     
SizeOfUninitializedData:       0x0       
AddressOfEntryPoint:           0x522     
BaseOfCode:                    0x220     
BaseOfData:                    0x640     
ImageBase:                     0x400000  
SectionAlignment:              0x10      
FileAlignment:                 0x10      
MajorOperatingSystemVersion:   0x4       
MinorOperatingSystemVersion:   0x0       
MajorImageVersion:             0x0       
MinorImageVersion:             0x0       
MajorSubsystemVersion:         0x4       
MinorSubsystemVersion:         0x0       
Reserved1:                     0x0       
SizeOfImage:                   0x768     
SizeOfHeaders:                 0x420     
CheckSum:                      0x0       
Subsystem:                     0x2       
DllCharacteristics:            0x0       
SizeOfStackReserve:            0x100000  
SizeOfStackCommit:             0x1000    
SizeOfHeapReserve:             0x100000  
SizeOfHeapCommit:              0x1000    
LoaderFlags:                   0x0       
NumberOfRvaAndSizes:           0x10      
DllCharacteristics: 

----------PE Sections----------

[IMAGE_SECTION_HEADER]
Name:                          .text
Misc:                          0x418     
Misc_PhysicalAddress:          0x418     
Misc_VirtualSize:              0x418     
VirtualAddress:                0x220     
SizeOfRawData:                 0x420     
PointerToRawData:              0x420     
PointerToRelocations:          0x0       
PointerToLinenumbers:          0x0       
NumberOfRelocations:           0x0       
NumberOfLinenumbers:           0x0       
Characteristics:               0x60000020
Flags: IMAGE_SCN_CNT_CODE, IMAGE_SCN_MEM_EXECUTE, IMAGE_SCN_MEM_READ
Entropy: 6.385628 (Min=0.0, Max=8.0)
MD5     hash: 37ae973124ba5655ce156536f4018759
SHA-1   hash: 6354d772105b66ac33fb8950b76a289edafa230f
SHA-256 hash: f6dfe337c6c6278e60a687552d8fc3be2a2ed41a4278713cfd0dc631296befdc
SHA-512 hash: 9d22cdd011d7276f47e3b1844804d58be2e73eef826ad285769d449f03dbfcde743303b31a9172e513be571432b7b2080afe571e5819ec7968acd76c0d82207a

[IMAGE_SECTION_HEADER]
Name:                          .rsrc
Misc:                          0x128     
Misc_PhysicalAddress:          0x128     
Misc_VirtualSize:              0x128     
VirtualAddress:                0x640     
SizeOfRawData:                 0x130     
PointerToRawData:              0x840     
PointerToRelocations:          0x0       
PointerToLinenumbers:          0x0       
NumberOfRelocations:           0x0       
NumberOfLinenumbers:           0x0       
Characteristics:               0x40000040
Flags: IMAGE_SCN_CNT_INITIALIZED_DATA, IMAGE_SCN_MEM_READ
Entropy: 2.905524 (Min=0.0, Max=8.0)
MD5     hash: cfd4f1a98445485c616ea2ff9390278e
SHA-1   hash: 7480ffe5427a540e17353df9c490dbba86fd0c3b
SHA-256 hash: 93f9ad56e464614b6aa9521f2b80f3f7f2fd5e2b6d8d6fd6489a0b1cdb1f948e
SHA-512 hash: b054ba77825a4bb92d9beecb606d04f7a4bf4d16529d909e03e6b882175e23fb495c1c3dc9d921c3124210a6567bf68e70879d3163ece1a1cbb786f3ec94af43

----------Directories----------

[IMAGE_DIRECTORY_ENTRY_EXPORT]
VirtualAddress:                0x0       
Size:                          0x0       
[IMAGE_DIRECTORY_ENTRY_IMPORT]
VirtualAddress:                0x574     
Size:                          0x3C      
[IMAGE_DIRECTORY_ENTRY_RESOURCE]
VirtualAddress:                0x640     
Size:                          0x128     
[IMAGE_DIRECTORY_ENTRY_EXCEPTION]
VirtualAddress:                0x0       
Size:                          0x0       
[IMAGE_DIRECTORY_ENTRY_SECURITY]
VirtualAddress:                0x0       
Size:                          0x0       
[IMAGE_DIRECTORY_ENTRY_BASERELOC]
VirtualAddress:                0x0       
Size:                          0x0       
[IMAGE_DIRECTORY_ENTRY_DEBUG]
VirtualAddress:                0x0       
Size:                          0x0       
[IMAGE_DIRECTORY_ENTRY_COPYRIGHT]
VirtualAddress:                0x0       
Size:                          0x0       
[IMAGE_DIRECTORY_ENTRY_GLOBALPTR]
VirtualAddress:                0x0       
Size:                          0x0       
[IMAGE_DIRECTORY_ENTRY_TLS]
VirtualAddress:                0x0       
Size:                          0x0       
[IMAGE_DIRECTORY_ENTRY_LOAD_CONFIG]
VirtualAddress:                0x0       
Size:                          0x0       
[IMAGE_DIRECTORY_ENTRY_BOUND_IMPORT]
VirtualAddress:                0x0       
Size:                          0x0       
[IMAGE_DIRECTORY_ENTRY_IAT]
VirtualAddress:                0x220     
Size:                          0x1C      
[IMAGE_DIRECTORY_ENTRY_DELAY_IMPORT]
VirtualAddress:                0x0       
Size:                          0x0       
[IMAGE_DIRECTORY_ENTRY_COM_DESCRIPTOR]
VirtualAddress:                0x0       
Size:                          0x0       
[IMAGE_DIRECTORY_ENTRY_RESERVED]
VirtualAddress:                0x0       
Size:                          0x0       

----------Imported symbols----------

[IMAGE_IMPORT_DESCRIPTOR]
OriginalFirstThunk:            0x5B0     
Characteristics:               0x5B0     
TimeDateStamp:                 0x0        [Thu Jan 01 00:00:00 1970 UTC]
ForwarderChain:                0x0       
Name:                          0x5E0     
FirstThunk:                    0x220     

KERNEL32.dll.GetModuleHandleA Hint[294]

[IMAGE_IMPORT_DESCRIPTOR]
OriginalFirstThunk:            0x5B8     
Characteristics:               0x5B8     
TimeDateStamp:                 0x0        [Thu Jan 01 00:00:00 1970 UTC]
ForwarderChain:                0x0       
Name:                          0x62C     
FirstThunk:                    0x228     

USER32.dll.EndDialog Hint[185]
USER32.dll.GetDlgItemTextA Hint[260]
USER32.dll.DialogBoxParamA Hint[147]
USER32.dll.MessageBoxA Hint[446]

----------Resource directory----------

[IMAGE_RESOURCE_DIRECTORY]
Characteristics:               0x0       
TimeDateStamp:                 0x0        [Thu Jan 01 00:00:00 1970 UTC]
MajorVersion:                  0x0       
MinorVersion:                  0x0       
NumberOfNamedEntries:          0x0       
NumberOfIdEntries:             0x1       
  Id: [0x5] (RT_DIALOG)
  [IMAGE_RESOURCE_DIRECTORY_ENTRY]
  Name:                          0x5       
  OffsetToData:                  0x80000018
    [IMAGE_RESOURCE_DIRECTORY]
    Characteristics:               0x0       
    TimeDateStamp:                 0x0        [Thu Jan 01 00:00:00 1970 UTC]
    MajorVersion:                  0x0       
    MinorVersion:                  0x0       
    NumberOfNamedEntries:          0x0       
    NumberOfIdEntries:             0x1       
      Id: [0x65]
      [IMAGE_RESOURCE_DIRECTORY_ENTRY]
      Name:                          0x65      
      OffsetToData:                  0x80000030
        [IMAGE_RESOURCE_DIRECTORY]
        Characteristics:               0x0       
        TimeDateStamp:                 0x0        [Thu Jan 01 00:00:00 1970 UTC]
        MajorVersion:                  0x0       
        MinorVersion:                  0x0       
        NumberOfNamedEntries:          0x0       
        NumberOfIdEntries:             0x1       
          [IMAGE_RESOURCE_DIRECTORY_ENTRY]
          Name:                          0x804     
          OffsetToData:                  0x48      
            [IMAGE_RESOURCE_DATA_ENTRY]
            OffsetToData:                  0x6A0     
            Size:                          0xC8      
            CodePage:                      0x0       
            Reserved:                      0x0       
实验一只是做了简简单单的print,但是可以看出pefile对test.exe做了全面的解析从DOS_Header 到 OPTIONAL_HEADER 再到PE SECTIONS。每个结构都可以完全的取得。细心的朋友还可以发现,他甚至可以做对一个section header的hash运算,包括md5, sha1, sha-256, sha-512,对导入导出函数也做了列举。
当然大家会问,未必我们就直接一个print就行了,然后做字符串解析,匹配来获得我们想要的信息?那pefile肯定不至于那么愚昧,当然要提供更多的接口。比如得到entrypoint
代码:
print hex(pe.OPTIONAL_HEADER.AddressOfEntryPoint)
实验二
代码:
实验二-节表
import os, string, shutil,re
import pefile ##记得import pefile

PEfile_Path = r"C:\temp\test.exe"

pe = pefile.PE(PEfile_Path)
print PEfile_Path

for section in pe.sections:
    print section
代码:
实验二结果
C:\temp\test.exe
[IMAGE_SECTION_HEADER]
Name:                          .text
Misc:                          0x418     
Misc_PhysicalAddress:          0x418     
Misc_VirtualSize:              0x418     
VirtualAddress:                0x220     
SizeOfRawData:                 0x420     
PointerToRawData:              0x420     
PointerToRelocations:          0x0       
PointerToLinenumbers:          0x0       
NumberOfRelocations:           0x0       
NumberOfLinenumbers:           0x0       
Characteristics:               0x60000020
[IMAGE_SECTION_HEADER]
Name:                          .rsrc
Misc:                          0x128     
Misc_PhysicalAddress:          0x128     
Misc_VirtualSize:              0x128     
VirtualAddress:                0x640     
SizeOfRawData:                 0x130     
PointerToRawData:              0x840     
PointerToRelocations:          0x0       
PointerToLinenumbers:          0x0       
NumberOfRelocations:           0x0       
NumberOfLinenumbers:           0x0       
Characteristics:               0x40000040
可以看出此文件有2个节.text 和 .rsrc,并且给出了节的相关信息。当然如果你需要获得某一节的具体的某个信息如Characteristics,可以采用
代码:
print hex(pe.sections[i].Characteristics)
实验三
代码:
实验三-导入表
import os, string, shutil,re
import pefile ##记得import pefile

PEfile_Path = r"C:\temp\test.exe"

pe = pefile.PE(PEfile_Path)
print PEfile_Path

for importeddll in pe.DIRECTORY_ENTRY_IMPORT:
    print importeddll.dll
    ##or use
    #print pe.DIRECTORY_ENTRY_IMPORT[0].dll
    for importedapi in importeddll.imports:
        print importedapi.name
    ##or use
    #print pe.DIRECTORY_ENTRY_IMPORT[0].imports[0].name
代码:
实验三-结果
C:\temp\test.exe
KERNEL32.dll
GetModuleHandleA
USER32.dll
EndDialog
GetDlgItemTextA
DialogBoxParamA
MessageBoxA
实验三得出test.exe导入了kernel32.dll和user32.dll然后分别导入了1个和4个API函数。

关于pefile的使用和他的强大功能想必大家也是有所体会,他还有很多的其他功能,比如修改PE结构,另外导入PEiD的特征库就可以支持查壳等等。大家可以试着用一下。

希望这个pefile和强大功能和python的简单易用能帮助到大家。 

# -*- coding: utf-8 -*-
import os,os.path
import pefile
def parsePe(filePath,dllDict):
    pe = pefile.PE(filePath)
    for importeddll in pe.DIRECTORY_ENTRY_IMPORT:
        #print '----------------'+importeddll.dll+'--------------------------'
        dllDict[importeddll.dll]=None
        num=0
        funcList=list()
        for importedapi in importeddll.imports:
            num+=1
            funcList.append(importedapi.name)
            #print importedapi.name
        dllDict[importeddll.dll]=funcList
def traversalFiles(filePath,dllDict):
    #print filePath
    if os.path.isfile(filePath):
        if os.path.basename(filePath).endswith('.exe'):
            #print filePath
            parsePe(filePath,dllDict)
    else:
        for item in os.listdir(filePath):
            #print item
            subpath = filePath+os.path.sep+item
            traversalFiles(subpath,dllDict)
def main():
    dirPath=raw_input("input the dir or file of PE_file address=>")
    importDllDic=dict();
    while True:
        if os.path.isdir(dirPath)==False and os.path.isfile(dirPath)==False:
            dirPath=raw_input("the dir that you input is empty,please input again=>")
            continue
        else:
            break
    #destDirPath = u'F:\\解析PE'
    traversalFiles(dirPath,importDllDic)
    for key in importDllDic:
        print '\n'+key,importDllDic[key]
if __name__=='__main__':
    main()
    


你可能感兴趣的:(python)