(不过根据需求设定表的结构,我有一个表,设定为每天的数据一个列,每天的put操作99%都是put到一个列里,于是其他的Store的MemStoreSize为0.)
让我们来看看regionserver的log里在flush过程中会记录神马信息:
2011-11-01 00:00:09,737 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Flush requested on acookie_log_201110,\x08,1317949412354.9ca121717e4e8545d3d6b5806b6dccb0.
2011-11-01 00:00:09,737 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for acookie_log_201110,\x08,1317949412354.9ca121717e4e8545d3d6b5806b6dccb0., current region memstore size 64.0m
2011-11-01 00:00:10,606 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Finished snapshotting, commencing flushing stores
2011-11-01 00:00:12,432 INFO org.apache.hadoop.hbase.regionserver.Store: Renaming flushed file at hdfs://peckermaster:9000/hbase-pecker/acookie_log_201110/9ca121717e4e8545d3d6b5806b6dccb0/.tmp/3648164384178649648 to hdfs://peckermaster:9000/hbase-pecker/acookie_log_201110/9ca121717e4e8545d3d6b5806b6dccb0/log_31/6967914182982470371
2011-11-01 00:00:12,448 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://peckermaster:9000/hbase-pecker/acookie_log_201110/9ca121717e4e8545d3d6b5806b6dccb0/log_31/6967914182982470371, entries=65501, sequenceid=4492353, memsize=64.4m, filesize=14.9m
2011-11-01 00:00:12,451 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~64.4m for region acookie_log_201110,\x08,1317949412354.9ca121717e4e8545d3d6b5806b6dccb0. in 2714ms, sequenceid=4492353, compaction requested=true
next,如果这种14.9m的文件不断增加,hbase会进一步对这个region下的store做神马呢?请等待下一篇"HBase Region操作实战分析之StoreFile Compaction"
ps:补充flush代码流程
从HRegion的public void put(Put put, Integer lockid, boolean writeToWAL)开始
1)在put函数里,首先调用了private void checkResources() ,此函数主要检查memstoresize是否超过了blockingsize(hbase.hregion.memstore.flush.size * hbase.hregion.memstore.block.multiplier),超过了,则block update,flush先
2)取出put的familyMap,调用private void put(final Map<byte [], List<KeyValue>> familyMap, boolean writeToWAL). 函数里会将familyMap的内容apply
到memstore,并且,之后判断memstore的大小是否超过hbase.hregion.memstore.flush.size , 超过,则触发 HRegion::requesFlush函数
3)HRegion::requesFlush 触发MemStoreFlush::requestFlush函数 ,代码里会把region的flush信息增加到MemStoreFlush的flush队列里
4)MemStoreFlush的队列在run函数里(因为MemStoreFlush是个后台线程)不断的被pop出需要flush的队列,
5)取出一个队列fre之后,调用MemStoreFlush::flushRegion函数,内部调用HRegion::flushcache函数,内部继续调用HRegion::internalFlushcache,此函数是key point,简化之后的代码如下:
funcinternalFlushcache
1) for (Store s : stores.values()) {
storeFlushers.add(s.getStoreFlusher(completeSequenceId));
}
2)
// prepare flush (take a snapshot)
for (StoreFlusher flusher : storeFlushers) {
flusher.prepare();
遍历region所有的store,得到每个store的storeflusher,take a snapshot
3) for (StoreFlusher flusher : storeFlushers) {
flusher.flushCache();
}
StoreFlusher类为StoreFlusherImpl
调用Store.java里的internalFlushCache,执行每个Store的flush
4) for (StoreFlusher flusher : storeFlushers) {
boolean needsCompaction = flusher.commit();
if (needsCompaction) {
compactionRequested = true;
}
}
flusher.commit()里会检查return this.storefiles.size() >= this.compactionThreshold,store里的个数超过配置,
有一个store需要compact,则internalFlushcache返回true.
返回的compactionRequested回溯到HRegion::flushRegion函数,告诉线程是否需要在flush之后进行compact,是的话,调用this.server.compactSplitThread.requestCompaction,进行compact