LZW和哈夫曼编码一样,是无损压缩中的一种。该算法通过建立字典,实现字符重用与编码,适用于source中重复率很高的文本压缩。本文首先讲下LZW的编解码原理,然后给出LZW的实现code。
*********************原理*********************
编码:
算法流程:
举例:
解码:
编码的逆过程,若编码是string到int的映射,我们可以将解码过程描述为int到string的映射。
算法流程:
解码的例子建议读者用下面的代码直接调试吧~
*********************实现*********************
我用C++实现的,Compress和Decompress两个函数分别实现编解码
/************************************************************************/ /* File Name: LZW.cpp * @Function: Lossless Compression @Author: Sophia Zhang @Create Time: 2012-9-19 10:00 @Last Modify: 2012-9-19 11:10 */ /************************************************************************/ #include"iostream" #include "map" #include "string" #include "iterator" #include "vector" using namespace std; /************************************************************************/ /* Compress Module * input: str - the string need to be compressed result - compress result */ /************************************************************************/ template<typename TypeIterator> TypeIterator Compress(string str, TypeIterator result) { //Build the dictionary map<string,int>dictionary; int Dictsize=256; for(int i=0;i<Dictsize;i++) dictionary[string(1,i)]=i; char z; string S; for(string::const_iterator it = str.begin(); it!=str.end(); it++) { z = *it; if(dictionary.count(S+z))//can find S S+=z; else//S is not in dictionary D { *result++ = dictionary[S]; //output pointer (S,D) dictionary[S+z] = Dictsize++; //add to dictionary S = z; } } if(!S.empty()) *result++ = dictionary[S]; return result; } /************************************************************************/ /* Decompress Module * input: TypeIterator result - compression result, to be decompressed */ /************************************************************************/ template<typename TypeIterator> string Decompress(TypeIterator result) { map<int,string>inv_dictionary; int Dictsize=256; for(int i=0;i<Dictsize;i++) inv_dictionary[i] = string(1,i); char z; string S; string entry; string res; Dictsize--;//because the first "Dictsize++" make no sense, it has only one char in [0,255] for(TypeIterator::iterator it = result.begin(); it!=result.end(); it++) { int k = *it; if(inv_dictionary.count(k)) entry = inv_dictionary[k]; else if(k==Dictsize) entry = S+ S[0]; else throw "Bad compression code"; res += entry; inv_dictionary[Dictsize++] = S + entry[0]; S = entry; } return res; } int main() { typedef vector<int> TypeIterator; TypeIterator compress_res; string S = "the/rain/in/Spain/falls/mainly/on/the/plain"; Compress(S,std::back_inserter(compress_res)); // copy(compress_res.begin(),compress_res.end(),std::ostream_iterator<int>(std::cout,",")); // std::cout<<std::endl; //output the compressed result for( TypeIterator::iterator it= compress_res.begin(); it!=compress_res.end(); it++) cout<<(*it)<<endl; //decompress the compressed result string decompress = Decompress(compress_res); cout<<decompress<<endl; }
Reference:
1. http://www.stringology.org/DataCompression/lzw-e/index_en.html
2. http://www.dspguide.com/ch27/5.htm
3. http://marknelson.us/1989/10/01/lzw-data-compression/
4. http://www.ccs.neu.edu/home/jnl22/oldsite/cshonor/jeff.html
关于Compression更多的学习资料将继续更新,敬请关注本博客和新浪微博Sophia_qing。