https://pkware.cachefly.net/webdocs/APPNOTE/APPNOTE-6.2.0.txt
本文即参考该文档,对一个示例zip文件的格式进行分析。
本文只是对整个zip文件结构的粗略分析,后续再详细分析其中的一些属性。
这里把appnote.txt中的一部分内容截取出来,这些是典型的压缩文件中会用到的数据结构。在某些场景中,还会用到本文没有提到的一些记录,此时需要查阅appnote.txt。
Overall .ZIP file format:
[local file header 1]
[file data 1]
[data descriptor 1]
.
.
.
[local file header n]
[file data n]
[data descriptor n]
[archive decryption header] (EFS)
[archive extra data record] (EFS)
[central directory]
[zip64 end of central directory record]
[zip64 end of central directory locator]
[end of central directory record]
A. Local file header:
local file header signature 4 bytes (0x04034b50)
version needed to extract 2 bytes
general purpose bit flag 2 bytes
compression method 2 bytes
last mod file time 2 bytes
last mod file date 2 bytes
crc-32 4 bytes
compressed size 4 bytes
uncompressed size 4 bytes
file name length 2 bytes
extra field length 2 bytes
file name (variable size)
extra field (variable size)
B. File data
Immediately following the local header for a file
is the compressed or stored data for the file.
The series of [local file header][file data][data
descriptor] repeats for each file in the .ZIP archive.
F. Central directory structure:
[file header 1]
.
.
.
[file header n]
[digital signature]
File header:
central file header signature 4 bytes (0x02014b50)
version made by 2 bytes
version needed to extract 2 bytes
general purpose bit flag 2 bytes
compression method 2 bytes
last mod file time 2 bytes
last mod file date 2 bytes
crc-32 4 bytes
compressed size 4 bytes
uncompressed size 4 bytes
file name length 2 bytes
extra field length 2 bytes
file comment length 2 bytes
disk number start 2 bytes
internal file attributes 2 bytes
external file attributes 4 bytes
relative offset of local header 4 bytes
file name (variable size)
extra field (variable size)
file comment (variable size)
Digital signature:
header signature 4 bytes (0x05054b50)
size of data 2 bytes
signature data (variable size)
每个压缩文件必须有且只有一个EOCD记录。
I. End of central directory record:
end of central dir signature 4 bytes (0x06054b50)
number of this disk 2 bytes
number of the disk with the
start of the central directory 2 bytes
total number of entries in the
central directory on this disk 2 bytes
total number of entries in
the central directory 2 bytes
size of the central directory 4 bytes
offset of start of central
directory with respect to
the starting disk number 4 bytes
.ZIP file comment length 2 bytes
.ZIP file comment (variable size)
这里只考虑最简单的一种场景,只包括一个文本文件的压缩文件。如果有多个文件,只是上述一些record会有多份。
下面直接给出二进制格式的分析结果。——最后的附件给出了原始文件、压缩文件、二进制分析结果、以及字体有不同颜色的分析结果。
[Local File Header 1]
A. Local file header:
local file header signature 4 bytes (0x04034b50)
version needed to extract 2 bytes
general purpose bit flag 2 bytes
compression method 2 bytes
last mod file time 2 bytes
last mod file date 2 bytes
crc-32 4 bytes
compressed size 4 bytes
uncompressed size 4 bytes
file name length 2 bytes
extra field length 2 bytes
file name (variable size)
extra field (variable size)
00000000h: 50 4B 03 04 --- local file header signature(4 bytes, 0x04034b50)
14 00 --- version needed to extract(2 bytes)
00 00 --- general purpose bit flag(2 bytes)
08 00 --- compression method(2 bytes)
6F 9D --- last mod file time(2 bytes)
D9 46 --- last mod file date(2 bytes)
D0 1E ; PK........o澷F?
00000010h: FE B9 --- crc-32(4 bytes)
A4 01 00 00 --- compressed size(4 bytes)
72 04 00 00 --- uncompressed size(4 bytes)
08 00 --- file name length(2 bytes)
00 00 --- extra field length(2 bytes)
70 72 ; ?..r.......pr
00000020h: 69 6D 65 2E 70 79 --- file name (variable size, 8 bytes)
[File Data 1]
compressed size = 0x01A4
0x0026(start_address) + 0x01A4 = 0x01CA
7D 53 C1 4E C3 30 0C BD 4F DA ; ime.py}S罭?.絆?
00000030h: 3F 98 03 A2 15 65 5A 87 76 41 94 23 12 17 84 04 ; ???eZ噕A?..?
00000040h: 37 84 A2 B0 79 2C A8 75 47 92 02 9F 8F 9D B6 5B ; 7劉皔,╱G?煆澏[
00000050h: B6 6E F4 50 29 F6 CB F3 F3 73 6C AA 4D 6D 3D 54 ; 秐鬚)鏊篌sl狹m=T
00000060h: DA AF C7 A3 F1 68 89 2B 30 4E 6D AC A9 30 A1 F4 ; 诏牵駂?0Nm0◆
00000070h: 66 3C 02 FE CC 0A 08 6E 61 D6 1D E5 B3 E8 1B 4B ; f<...na?宄?K
00000080h: 70 AF 4B 87 72 51 62 48 4B 45 4D F5 8E 16 0A 30 ; p疜噐QbHKEM鯉..0
00000090h: E4 13 61 9D B8 2F EB 99 2A 85 4B C8 5B DC AA B6 ; ?a澑/霗*匥萚塥?
000000a0h: 60 18 00 56 D3 07 26 B3 2C BA 99 46 35 42 D9 73 ; `..V?&?簷F5B賡
000000b0h: 86 16 05 4C A3 F8 B0 BE 44 DA 7F 17 7F B1 4D 90 ; ?.Lx熬D?.盡?
000000c0h: B5 28 B5 73 F0 14 DA A9 DF 3F 71 E1 FB 02 D2 A9 ; ?祍?讴?q猁.药
000000d0h: 52 86 8C 57 2A 71 58 AE 32 70 BE DE 14 8F 35 E1 ; R唽W*qX?p巨.??
000000e0h: 81 08 89 B3 2B 40 B5 07 49 1F 48 91 CB 93 80 29 ; ?壋+@?I.H懰搥)
000000f0h: 02 74 97 45 16 F7 0F F8 2A 8F 33 BD 8B 5B D4 A2 ; .t桬.???綃[寓
00000100h: B1 0F B4 C4 5F 46 E6 87 A0 4E BD 47 DB A9 4F 87 ; ?茨_F鎳燦紾郓O?
00000110h: B3 91 F0 B1 7B 3C 5F F6 FB BF 5B AD C0 B3 A0 50 ; 硲鸨{<_鳆縖碃P
00000120h: D3 F2 40 CE 5D B1 03 C5 BC 84 BF 7E C0 4A AC 5E ; 域@蝅?偶効~繨琟
00000130h: 3C DB 45 7E D6 A6 44 8E B3 A1 47 CC 64 BB C3 B4 ; <跡~枝D幊蘢幻?
00000140h: 26 27 54 6E D5 6A E3 10 9E 59 C2 03 9B A0 BD A9 ; &'Tn誮?瀁?洜僵
00000150h: 29 36 B0 A3 DA 3E E5 BD 0E 8E F1 89 CE FD 36 4F ; )6埃?褰.庱壩?O
00000160h: 8C 66 38 9E CB 42 1E F6 C0 48 EA F7 49 79 74 3E ; 宖8炈B.隼H犄Iyt>
00000170h: E9 8B 06 3D 5C AC 7D 93 F9 74 9A 46 4B E1 B1 92 ; 閶.=\瑌擓t欶K岜?
00000180h: BD 08 98 48 24 9F C9 87 6C B6 47 DA 35 D7 53 7F ; ?楬$熒噇禛?譙
00000190h: EB B2 41 C7 DC AF BC 52 D7 19 CC 33 C8 E7 6F 3B ; 氩A擒R??如o;
000001a0h: FA 90 17 FE 16 38 28 10 C2 19 0C EA F6 26 86 7C ; 鷲.?8(.?.牿&唡
000001b0h: 2A 0A 8C BC 3E D2 15 2A 25 CB 79 A1 54 A5 0D 29 ; *.尲>?*%藋?)
000001c0h: 75 D1 71 EE A9 8B 42 72 F8 03 --- 此处截止地址为 0x01CA
[Central Directory Header 1]
File header:
central file header signature 4 bytes (0x02014b50)
version made by 2 bytes
version needed to extract 2 bytes
general purpose bit flag 2 bytes
compression method 2 bytes
last mod file time 2 bytes
last mod file date 2 bytes
crc-32 4 bytes
compressed size 4 bytes
uncompressed size 4 bytes
file name length 2 bytes
extra field length 2 bytes
file comment length 2 bytes
disk number start 2 bytes
internal file attributes 2 bytes
external file attributes 4 bytes
relative offset of local header 4 bytes
file name (variable size)
extra field (variable size)
file comment (variable size)
50 4B 01 02 --- central file header signature(4 bytes, 0x02014b50)
3F 00 --- version made by(2 bytes)
; u裶瞟婤r?PK..?.
000001d0h: 14 00 --- version needed to extract(2 bytes)
00 00 --- general purpose bit flag(2 bytes)
08 00 --- compression method(2 bytes)
6F 9D --- last mod file time(2 bytes)
D9 46 --- last mod file date(2 bytes)
D0 1E FE B9 --- crc-32(4 bytes)
A4 01 ; ......o澷F??
000001e0h: 00 00 --- compressed size(4 bytes)
72 04 00 00 --- uncompressed size(4 bytes)
08 00 --- file name length(2 bytes)
24 00 --- extra field length(2 bytes)
00 00 --- file comment length(2 bytes)
00 00 --- disk number start(2 bytes)
00 00 --- internal file attributes(2 bytes)
; ..r.....$.......
000001f0h: 20 00 00 00 --- external file attributes(4 bytes)
00 00 00 00 --- relative offset of local header(4 bytes)
70 72 69 6D 65 2E 70 79 --- file name (variable size, 8 bytes)
; .......prime.py
00000200h: 0A 00 20 00 00 00 00 00 01 00 18 00 76 36 37 27 ; .. .........v67'
00000210h: 3C AF D0 01 D7 80 CE 6B 39 AF D0 01 D7 80 CE 6B ; <.讇蝛9.讇蝛
00000220h: 39 AF D0 01 --- extra field (variable size, 0x24)
[End of Central Directory Recor]
I. End of central directory record:
end of central dir signature 4 bytes (0x06054b50)
number of this disk 2 bytes
number of the disk with the
start of the central directory 2 bytes
total number of entries in the
central directory on this disk 2 bytes
total number of entries in
the central directory 2 bytes
size of the central directory 4 bytes
offset of start of central
directory with respect to
the starting disk number 4 bytes
.ZIP file comment length 2 bytes
.ZIP file comment (variable size)
50 4B 05 06 --- end of central dir signature(4 bytes, 0x06054b50)
00 00 --- number of this disk(2 bytes)
00 00 --- number of the disk with the start of the central directory(2 bytes)
01 00 --- total number of entries in the central directory on this disk(2 bytes)
01 00 --- total number of entries in the central directory(2 bytes)
; 9.PK..........
00000230h: 5A 00 00 00 --- size of the central directory(4 bytes)
CA 01 00 00 --- offset of start of central directory with respect to the starting disk number(4 bytes)
00 00 --- .ZIP file comment length(2 bytes)
; Z...?....
本文用到的示例文件放在下载资源中:http://download.csdn.net/detail/u013344915/8839437
其中包括对分析结果用不同颜色字体进行区分,如下:
Win64位上可用的一个UltraEdit:http://download.csdn.net/detail/leandzgc/5380771