Hbase the definitive guide - Advanced Usage章 阅读札记

hbase的高级用法 - 读书札记。


2.hbase表的两种设计: (tall-narrow and flat-wide)高而窄型设计、宽而平型设计。





3.部分键值扫描器:通过设置startrow 和endrow来查询一段RowKey。(startrow + 1是一个很好的技巧)


4.分页:分页可通过partial key scan实现,原理是先通过设置start key和stop key确定一个范围,然后再在客户端







byte prefix = (byte) (Long.hashCode(timestamp) % <number of regionservers>);
byte[] rowkey = Bytes.add(Bytes.toBytes(prefix), Bytes.toBytes(timestamp);












6.bloom Filter:


Type Description
NONE Disables the filter (default)
ROW Use the row key for the filter
ROWCOL Use the row key and column key (family+qualifier) for the filter


The final question is whether to use a row or a row+column Bloom filter. The answer depends on your usage pattern. If you are doing only row scans, having the more specific row+column filter will not help at all: having a row-level Bloom filter enables you to narrow down the number of files that need to be checked, even when you do row+column read operations, but not the other way around.

The row+column Bloom filter is useful when you cannot batch updates for a specific row, and end up with store files which all contain parts of the row. The more specific row+column filter can then identify which of the files contain the data you are re-questing. Obviously, if you always load the entire row, this filter is once again hardly useful, as the region server will need to load the matching block out of each file anyway.

Since the row+column filter will require more storage, you need to do the math to
determine whether it is worth the extra resources. It is also interesting to know that there is a maximum number of elements a Bloom filter can hold. If you have too many cells in your store file, you might exceed that number and would need to fall back to the row-level filter.

Depending on your use case, it may be useful to enable Bloom filters, to increase the overall performance of your system. If possible, you should try to use the row-level Bloom filter, as it strikes a good balance between the additional space requirements and the gain in performance coming from its store file selection filtering. Only resort to the more costly row+column Bloom filter when you would otherwise gain no ad-vantage from using the row-level one.


也就是说如果要使用bloom Filter,尽量把它加在row级别上,而不是row+col都加,第三中ROWCOL的方式将



如果你只是做Scan,row级别的bloom Filter就足够了,row+col毫无用处。
