今天同事问到hbase中in-memory属性的作用,以前没有注意过,今天仔细看了下代码:
// Instantiate priority buckets
BlockBucket bucketSingle = new BlockBucket(bytesToFree, blockSize,
singleSize());
BlockBucket bucketMulti = new BlockBucket(bytesToFree, blockSize,
multiSize());
BlockBucket bucketMemory = new BlockBucket(bytesToFree, blockSize,
memorySize());
// Scan entire map putting into appropriate buckets
for(CachedBlock cachedBlock : map.values()) {
switch(cachedBlock.getPriority()) {
case SINGLE: {
bucketSingle.add(cachedBlock);
break;
}
case MULTI: {
bucketMulti.add(cachedBlock);
break;
}
case MEMORY: {
bucketMemory.add(cachedBlock);
break;
}
}
}
PriorityQueue<BlockBucket> bucketQueue =
new PriorityQueue<BlockBucket>(3);
bucketQueue.add(bucketSingle);
bucketQueue.add(bucketMulti);
bucketQueue.add(bucketMemory);
int remainingBuckets = 3;
long bytesFreed = 0;
BlockBucket bucket;
while((bucket = bucketQueue.poll()) != null) {
long overflow = bucket.overflow();
if(overflow > 0) {
long bucketBytesToFree = Math.min(overflow,
(bytesToFree - bytesFreed) / remainingBuckets);
bytesFreed += bucket.free(bucketBytesToFree);
}
remainingBuckets--;
}
hbase内部的blockcache分三个队列:single、multi以及memory,分别占用25%,50%,25%的大小。这涉及到family属性中的in-memory选项,默认是false。
设为false的话,第一次访问到该数据时,会将它写入single队列,否则写入memory队列。当再次访问该数据并且在single中读到了该数据时,single会升级为multi
这三个队列其实是在共用blockcache的资源,区别是在LRU淘汰数据时,single会优先淘汰,其次为multi,最后为memory。
所以结论有两点:
1 同一个family不会占用全部的blockcache资源
2 当某些family特别重要时,可以将它的in-memory设为true,单独使用一个缓存队列,保证cache的优先使用