文本压缩笔记(一):总述

1. huffman coding

about five bits per character

 

2. Ziv-Lempel coding和Arithmetic Coding都是使用自适应的压缩方式

 

其中Arithmetic Coding更基础,它可以使一类基于它的自适应的压缩方法得以实现。

 

Ziv-Lempel about four bits per character

 

Arithmetic Coding over two bits per character.-->压缩和解压都会慢,同时内存也会多。

 

3. PPM(Prediction by Partial matching)是基于Arithmetic Coding的。

 

4. 压缩方法可以分类两类: 符号方法(symbolwise methods)和字典方法(dictionary methods)。

 

Symbolwise methods are usually based on either Huffman coding or arithmetic coding, and they

differ mainly in how they estimate probabilities for symbols。

 

Dictionary methods generally use quite simple representations to code references to

entries in the dictionary。The most significant dictionary methods are based on

Ziv-Lempel coding,which use the idea of replacing strings of characters with a reference

to a previous occurrence of the string。

 

2.2 Adaptive models

The method that always uses the same model regardless of what text is being coded is 
called static modeling.

One solution is to generate a model specially for each file that is to be compressed.
An initial pass is made through the file to estimate symbol probabilities, and these
are transmitted to the decoder before transmitting the encoded symbols.This approach
is called semi-static modeling.

Adaptive modeling is an elegant solution to these problems. An adaptive model begins
with a bland probability distribution and gradually alters it as more symbols are
encountered.

Zero-frequency problem solution: One is to allow one extra count, which is divided evenly
among any symbols that have not been obaserved int the input. Another possibility is 
to artificially inflate the count of ervery character in the alphabet by one, thereby
ensuring that none has a zero frequency.

2.3 Huffman coding

Coding is the task of determining the output representation of a symbol, based on
a probability distribution supplied by a model.

你可能感兴趣的:(压缩)