在Hbase中选择多少个column family才合适呢?

下面主要说的是在设计Hbase schema的时候,要尽量只有一个column family,至于为什么主要从flush和compaction说起,它们触发的基本单位都是Region级别,所以当一个column family有大量的数据的时候会触发整个region里面的其他column family的memstore(其实这些memstore可能仅有少量的数据,还不需要flush的)也发生flush动作;另外compaction触发的条件是当store file的个数(不是总的store file的大小)达到一定数量的时候会发生,而flush产生的大量store file通常会导致compaction,flush/compaction会发生很多IO相关的负载,这对Hbase的整体性能有很大影响,所以选择合适的column family个数很重要。

 

下面是关于这方面的英文原文:

 

HBase currently does not do well with anything about two or three column families so keep the number of column families in your schema low. Currently, flushing and compactions are done on a per Region basis so if one column family is carrying the bulk of the data bringing on flushes, the adjacent families will also be flushed though the amount of data they carry is small. Compaction is currently triggered by the total number of files under a column family. Its not size based. When many column families the flushing and compaction interaction can make for a bunch of needless i/o loading (To be addressed by changing flushing and compaction to work on a per column family basis).

 

Try to make do with one column famliy if you can in your schemas. Only introduce a second and third column family in the case where data access is usually column scoped; i.e. you query one column family or the other but usually not both at the one time.

 

 

你可能感兴趣的:(schema,IO,File,hbase,query,Access)