2011-04-18 14:50:22,942 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 6 on 60020' on region data1,70712707089004,1303109282990.da1eccd9d9ebf0f8bfe1116fe7046763.: memstore size 128.1m is >= than blocking 128.0m size 2011-04-18 14:50:22,944 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 4 on 60020' on region data1,70712707089004,1303109282990.da1eccd9d9ebf0f8bfe1116fe7046763.: memstore size 128.3m is >= than blocking 128.0m size 2011-04-18 14:50:22,955 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 8 on 60020' on region data1,70712707089004,1303109282990.da1eccd9d9ebf0f8bfe1116fe7046763.: memstore size 128.3m is >= than blocking 128.0m size 2011-04-18 14:50:22,955 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 7 on 60020' on region data1,70712707089004,1303109282990.da1eccd9d9ebf0f8bfe1116fe7046763.: memstore size 128.3m is >= than blocking 128.0m size
查找了原代码,在HRegion中发现如下片断:
private void checkResources() { // If catalog region, do not impose resource constraints or block updates. if (this.getRegionInfo().isMetaRegion()) return; boolean blocked = false; while (this.memstoreSize.get() > this.blockingMemStoreSize) { requestFlush(); if (!blocked) { LOG.info("Blocking updates for '" + Thread.currentThread().getName() + "' on region " + Bytes.toStringBinary(getRegionName()) + ": memstore size " + StringUtils.humanReadableInt(this.memstoreSize.get()) + " is >= than blocking " + StringUtils.humanReadableInt(this.blockingMemStoreSize) + " size"); } blocked = true; synchronized(this) { try { wait(threadWakeFrequency); } catch (InterruptedException e) { // continue; } } } if (blocked) { LOG.info("Unblocking updates for region " + this + " '" + Thread.currentThread().getName() + "'"); } }
原来是因为region server在写入时会检查每个region对应的memstore的总大小是否超过了memstore默认大小的2倍(hbase.hregion.memstore.block.multiplier决定),如果超过了则锁住memstore不让新写请求进来并触发flush,避免产生OOM。由于在flush时还会触发compact/split等操作。因此这个过程通常比较长,必须要紧持到对应的memstore完全刷新到磁盘才会结束,因此regionserver会睡眠10s再检查memstore是不是低于阀值。
对于在线应用来说,10s的时间是不可接受的,但这个过程确实非常漫长,因此可以调整以下配置以减少或者避免这种情况的出现。
hbase.hregion.memstore.block.multiplier 8 //内存充足确保不会产生OOM的情况下,调大此值hbase.server.thread.wakefrequency 100 //减少睡眠等待时间,默认值为10000