在研究LZ77的细节之前,先看一个简单的例子(J. Weiss and D. Schremp, “Putting Data on a Diet”, IEEE Spectrum,August 1993)。考虑这样一句话: the brown fox jumped over the brown foxy jumping frog
这个短语的长度总共是53个八位组 = 424 bit。算法从左向右处理这个文本。初始时,每个字符被映射成9 bit的编码,二进制的1跟着该字符的8 bit ASCII码。在处理进行时,算法查找重复的序列。当碰到一个重复时,算法继续扫描直到该重复序列终止。换句话说,每次出现一个重复时,算法包括尽可能多的字符。碰到的第一个这样的序列是the brown fox。这个序列被替换成指向前一个序列的指针和序列的长度。在这种情况下,前一个序列的thebrown fox出现在26个字符之前,序列的长度是13个字符。对于这个例子,假定存在两种编码选项:8 bit的指针和4 bit的长度,或者12 bit的指针和6 bit的长度。使用2 bit的首部来指示选择了哪种选项,00表示第一种选项,01表示第二种选项。因此,the brown fox的第二次出现被编码为 <00b><26d><13 d >,或者00 00011010 1101。
压缩报文的剩余部分是字母y;序列<00b><27d><5 d >替换了由一个空格跟着jump组成的序列,以及字符序列ing frog。部分代码如下:
# -*- coding: utf-8 -*-
#lempel-ziv算法
#code:[email protected]
my_str="""Ubuntu 14.04 LTS includes a wealth of smart filters to make it faster and easier to find the content you need, whether it’s stored on your computer or on the web.Type any query into the Dash home and the Smart Scopes server will determine which categories of content are the most relevant to your search, returning only the best results. The server constantly improves its results by learning which categories and results are most useful to you over time."""
#码表
codeword_dictionary={}
#待压缩文本长度
str_len=len(my_str)
#码字最大长度
dict_maxlen=1
#将解析文本段的位置(下一次解析文本的起点)
now_index=0
#码表的最大索引
max_index=0
#
compressed_str=""
while (now_indexstr_len-now_index:
now_len=str_len-now_index
#查找到的码表索引,0表示没有找到
cw_addr=0
while (now_len>0):
cw_index=codeword_dictionary.get(my_str[now_index:now_index+now_len])
if cw_index!=None:
#找到码字
cw_addr=cw_index
mystep=now_len
break
now_len-=1
if cw_addr==0:
#没有找到码字,增加新的码字
max_index+=1
mystep=1
codeword_dictionary[my_str[now_index:now_index+mystep]]=max_index
print "don't find the Code word,add Code word:%s index:%d"%(my_str[now_index:now_index+mystep],max_index)
else:
#找到码字,增加新的码字
max_index+=1
codeword_dictionary[my_str[now_index:now_index+mystep+1]]=max_index
if mystep+1>dict_maxlen:
dict_maxlen=mystep+1
print "find the Code word:%s add Code word:%s index:%d"%(my_str[now_index:now_index+now_len],my_str[now_index:now_index+mystep+1],max_index)
.....................
......................
运行程序:
don't find the Code word,add Code word:U index:1
don't find the Code word,add Code word:b index:2
don't find the Code word,add Code word:u index:3
don't find the Code word,add Code word:n index:4
don't find the Code word,add Code word:t index:5
find the Code word:u add Code word:u index:6
don't find the Code word,add Code word:1 index:7
don't find the Code word,add Code word:4 index:8
don't find the Code word,add Code word:. index:9
don't find the Code word,add Code word:0 index:10
find the Code word:4 add Code word:4 index:11
don't find the Code word,add Code word:L index:12
don't find the Code word,add Code word:T index:13
don't find the Code word,add Code word:S index:14
don't find the Code word,add Code word: index:15
don't find the Code word,add Code word:i index:16
find the Code word:n add Code word:nc index:17
don't find the Code word,add Code word:l index:18
find the Code word:u add Code word:ud index:19
don't find the Code word,add Code word:e index:20
don't find the Code word,add Code word:s index:21
find the Code word: add Code word: a index:22
find the Code word: add Code word: w index:23
find the Code word:e add Code word:ea index:24
find the Code word:l add Code word:lt index:25
don't find the Code word,add Code word:h index:26
find the Code word: add Code word: o index:27
don't find the Code word,add Code word:f index:28
find the Code word: add Code word: s index:29
don't find the Code word,add Code word:m index:30
don't find the Code word,add Code word:a index:31
don't find the Code word,add Code word:r index:32
find the Code word:t add Code word:t index:33
find the Code word:f add Code word:fi index:34
find the Code word:lt add Code word:lte index:35
find the Code word:r add Code word:rs index:36
find the Code word: add Code word: t index:37
don't find the Code word,add Code word:o index:38
find the Code word: add Code word: m index:39
find the Code word:a add Code word:ak index:40
find the Code word:e add Code word:e index:41
find the Code word:i add Code word:it index:42
find the Code word: add Code word: f index:43
find the Code word:a add Code word:as index:44
find the Code word:t add Code word:te index:45
find the Code word:r add Code word:r index:46
find the Code word:a add Code word:an index:47
don't find the Code word,add Code word:d index:48
find the Code word: add Code word: e index:49
find the Code word:as add Code word:asi index:50
find the Code word:e add Code word:er index:51
find the Code word: t add Code word: to index:52
find the Code word: f add Code word: fi index:53
find the Code word:n add Code word:nd index:54
find the Code word: t add Code word: th index:55
find the Code word:e add Code word:e c index:56
find the Code word:o add Code word:on index:57
find the Code word:te add Code word:ten index:58
find the Code word:t add Code word:t y index:59
find the Code word:o add Code word:ou index:60
find the Code word: add Code word: n index:61
find the Code word:e add Code word:ee index:62
find the Code word:d add Code word:d, index:63
find the Code word: w add Code word: wh index:64
find the Code word:e add Code word:et index:65
find the Code word:h add Code word:he index:66
find the Code word:r add Code word:r i index:67
find the Code word:t add Code word:tindex:68
don't find the Code word,add Code word:€ index:69
don't find the Code word,add Code word:index:70
find the Code word:s add Code word:s index:71
find the Code word:s add Code word:st index:72
find the Code word:o add Code word:or index:73
find the Code word:e add Code word:ed index:74
find the Code word: o add Code word: on index:75
find the Code word: add Code word: y index:76
find the Code word:ou add Code word:our index:77
find the Code word: add Code word: c index:78
find the Code word:o add Code word:om index:79
don't find the Code word,add Code word:p index:80
find the Code word:u add Code word:ut index:81
find the Code word:er add Code word:er index:82
find the Code word:or add Code word:or index:83
find the Code word:on add Code word:on index:84
find the Code word:t add Code word:th index:85
find the Code word:e add Code word:e w index:86
find the Code word:e add Code word:eb index:87
find the Code word:. add Code word:.T index:88
don't find the Code word,add Code word:y index:89
find the Code word:p add Code word:pe index:90
find the Code word: a add Code word: an index:91
find the Code word:y add Code word:y index:92
don't find the Code word,add Code word:q index:93
find the Code word:u add Code word:ue index:94
find the Code word:r add Code word:ry index:95
find the Code word: add Code word: i index:96
find the Code word:n add Code word:nt index:97
find the Code word:o add Code word:o index:98
find the Code word:th add Code word:the index:99
find the Code word: add Code word: D index:100
find the Code word:as add Code word:ash index:101
find the Code word: add Code word: h index:102
find the Code word:om add Code word:ome index:103
find the Code word: an add Code word: and index:104
find the Code word: th add Code word: the index:105
find the Code word: add Code word: S index:106
find the Code word:m add Code word:ma index:107
find the Code word:r add Code word:rt index:108
find the Code word: S add Code word: Sc index:109
find the Code word:o add Code word:op index:110
find the Code word:e add Code word:es index:111
find the Code word: s add Code word: se index:112
find the Code word:r add Code word:rv index:113
find the Code word:er add Code word:er w index:114
find the Code word:i add Code word:il index:115
find the Code word:l add Code word:l index:116
find the Code word:d add Code word:de index:117
find the Code word:te add Code word:ter index:118
find the Code word:m add Code word:mi index:119
find the Code word:n add Code word:ne index:120
find the Code word: wh add Code word: whi index:121
don't find the Code word,add Code word:c index:122
find the Code word:h add Code word:h index:123
find the Code word:c add Code word:ca index:124
find the Code word:te add Code word:teg index:125
find the Code word:or add Code word:ori index:126
find the Code word:es add Code word:es index:127
find the Code word:o add Code word:of index:128
find the Code word: c add Code word: co index:129
find the Code word:nt add Code word:nte index:130
find the Code word:nt add Code word:nt index:131
find the Code word:a add Code word:ar index:132
find the Code word:e add Code word:e t index:133
find the Code word:he add Code word:he index:134
find the Code word:m add Code word:mo index:135
find the Code word:st add Code word:st index:136
find the Code word:r add Code word:re index:137
find the Code word:l add Code word:le index:138
don't find the Code word,add Code word:v index:139
find the Code word:an add Code word:ant index:140
find the Code word: to add Code word: to index:141
find the Code word:y add Code word:yo index:142
find the Code word:u add Code word:ur index:143
find the Code word: se add Code word: sea index:144
find the Code word:r add Code word:rc index:145
find the Code word:h add Code word:h, index:146
find the Code word: add Code word: r index:147
find the Code word:et add Code word:etu index:148
find the Code word:r add Code word:rn index:149
find the Code word:i add Code word:in index:150
don't find the Code word,add Code word:g index:151
find the Code word: on add Code word: onl index:152
find the Code word:y add Code word:y t index:153
find the Code word:he add Code word:he b index:154
find the Code word:es add Code word:est index:155
find the Code word: r add Code word: re index:156
find the Code word:s add Code word:su index:157
find the Code word:lt add Code word:lts index:158
find the Code word:. add Code word:. index:159
find the Code word:T add Code word:Th index:160
find the Code word:e add Code word:e s index:161
find the Code word:er add Code word:erv index:162
find the Code word:er add Code word:er c index:163
find the Code word:on add Code word:ons index:164
find the Code word:t add Code word:ta index:165
find the Code word:nt add Code word:ntl index:166
find the Code word:y add Code word:y i index:167
find the Code word:m add Code word:mp index:168
find the Code word:r add Code word:ro index:169
find the Code word:v add Code word:ve index:170
find the Code word:s add Code word:s i index:171
find the Code word:t add Code word:ts index:172
find the Code word: re add Code word: res index:173
find the Code word:u add Code word:ul index:174
find the Code word:ts add Code word:ts index:175
find the Code word:b add Code word:by index:176
find the Code word: add Code word: l index:177
find the Code word:ea add Code word:ear index:178
find the Code word:n add Code word:ni index:179
find the Code word:n add Code word:ng index:180
find the Code word: whi add Code word: whic index:181
find the Code word:h add Code word:h c index:182
find the Code word:a add Code word:at index:183
find the Code word:e add Code word:eg index:184
find the Code word:ori add Code word:orie index:185
find the Code word:s add Code word:s a index:186
find the Code word:nd add Code word:nd index:187
find the Code word:re add Code word:res index:188
find the Code word:ul add Code word:ult index:189
find the Code word:s a add Code word:s ar index:190
find the Code word:e add Code word:e m index:191
find the Code word:o add Code word:os index:192
find the Code word:t add Code word:t u index:193
find the Code word:s add Code word:se index:194
find the Code word:f add Code word:fu index:195
find the Code word:l add Code word:l t index:196
find the Code word:o add Code word:o y index:197
find the Code word:ou add Code word:ou index:198
find the Code word:o add Code word:ov index:199
find the Code word:er add Code word:er t index:200
find the Code word:i add Code word:im index:201
find the Code word:e add Code word:e. index:202
...............
.............
最后将下面文本进行压缩
Ubuntu 14.04 LTS includes a wealth of smart filters to make it faster and easier to find the content you need, whether it’s stored on your computer or on the web.Type any query into the Dash home and the Smart Scopes server will determine which categories of content are the most relevant to your search, returning only the best results. The server constantly improves its results by learning which categories and results are most useful to you over time.
将压缩成:
0U0b0u0n0t3 01040.008 0L0T0S0 0i4c0l3d0e0s15a15w20a18t0h15o0f15s0m0a0r5 28i25e32s15t0o15m31k20 16t15f31s5e32 31n0d15e44i20r37o43i4d37h41c38n45n33y38u15n20e48,23h20t26e46i5€01 21t38r20d27n15y60r15c38m0p3t51 73 57 5h41w20b9T0y80e22n89 0q3e32y15i4t38 85e15D44h15h79e91d55e15S30a32t106c38p20s29e32v82w16l18 48e45r30i4e64i0c26 122a45g73i111 38f78o97e97 31r41t66 30o72 32e18e0v47t52 89o3r112a32c26,15r65u32n16n0g75l92t134b111t147e21u25s9 13h41s51v82c57s5a97l92i30p32o139e71i5s156s3l172 2y15l24r4i4g121c123c31t20g126e71a54 137s174t186r41m38s33u21e28u116t98y60 38v82t16m20.