HBase数据压缩方式的选择

官方文档:http://hbase.apache.org/book.html#_which_compressor_or_data_block_encoder_to_use


The compression or codec type to use depends on the characteristics of your data. Choosing the wrong type could cause your data to take more space rather than less, and can have performance implications.
In general, you need to weigh your options between smaller size and faster compression/decompression. Following are some general guidelines, expanded from a discussion at Documenting Guidance on compression and codecs.

  • If you have long keys (compared to the values) or many columns, use a prefix encoder. FAST_DIFF is recommended, as more testing is needed for Prefix Tree encoding.

  • If the values are large (and not precompressed, such as images), use a data block compressor.

  • Use GZIP for cold data, which is accessed infrequently. GZIP compression uses more CPU resources than Snappy or LZO, but provides a higher compression ratio.
    GZIP压缩适合冷数据场景,相比较Snappy和LZO压缩,压缩率更高,但是CPU消耗的也更多。

  • Use Snappy or LZO for hot data, which is accessed frequently. Snappy and LZO use fewer CPU resources than GZIP, but do not provide as high of a compression ratio.

  • In most cases, enabling Snappy or LZO by default is a good choice, because they have a low performance overhead and provide space savings.

  • Before Snappy became available by Google in 2011, LZO was the default. Snappy has similar qualities as LZO but has been shown to perform better.
    Snappy压缩出现之前谷歌默认使用的是LZO,但是Snappy出现之后在性能上更加出色,因此Snappy成了默认压缩方式。


HBase配置Snappy压缩:http://blog.csdn.net/maomaosi2009/article/details/47019913

你可能感兴趣的:(HBase)