LZW(Lempel-Ziv-Welch) is the first widely used universal data compression method on computers. It would typically compress large English texts to about half of their original sizes. Now LZW is still used in GIF and PDF.
The basic idea: a sequence of adjacent input symbols is called a phrase, the phrases are put into a table along reading input stream, the indices of the phrases in the table is used to form the output.
There are two columns in the table: phrase and its index. Each phrase is composed of a prefix and a symbol, the prefix is an index in the table referencing another phrase, the symbol is appended to the prefix to form the new phrase.
Encode Algorithm:
initialize table;
word <- NIL;while (there is input)
{
symbol <- next symbol from input;
phrase <- word + symbol;
if (phrase exists in the table)
{
word <- phrase;
}
else
{
output (index(word));
add phrase to the table;
word <- symbol;
}
}
output (index(word));
Decode Algorithm:
initialize table;
phrase <- NIL;while (there is input)
{
wordIndex <- next code from input;
if (wordIndex exists in the table)
{
word <- dictionary[wordIndex];
phrase <- phrase + head(word);
if(phrase.Length > 1)
{
add phrase to the dictionary;
}
}
else
{
phrase <- phrase + head(phrase);
add phrase to the dictionary;
word <- phrase; //word <- dictionary[wordIndex];
}
phrase <- word;
output (word);
}
I implemented the algorithm in C# according to <PDF Reference>, which includes more encode details: