启动region server的时候报如下错误:
2013-09-09 11:23:05,863 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException; Region is not online: -ROOT-,,0
2013-09-09 11:23:08,874 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException; Region is not online: -ROOT-,,0
2013-09-09 11:23:11,898 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException; Region is not online: -ROOT-,,0
2013-09-09 11:24:15,344 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Stats: total=2.05 MB, free=247.44 MB, max=249.48 MB, blocks=0, accesses=0, hits=0, hitRatio=0, cachingAccesses=0, cachingHits=0, cachingHitsRatio=0, evictions=0, evicted=0, evictedPerRun=NaN
2013-09-09 11:24:19,977 ERROR org.apache.hadoop.hbase.regionserver.wal.HLog: Can't open after 300 attempts and 300518ms for hdfs://opentsdb:8020/hbase/.logs/opentsdb,60020,1378358082016-splitting/opentsdb%2C60020%2C1378358082016.1378397697610
2013-09-09 11:24:19,978 INFO org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Processed 0 edits across 0 regions threw away edits for 0 regions; log file=hdfs://opentsdb:8020/hbase/.logs/opentsdb,60020,1378358082016-splitting/opentsdb%2C60020%2C1378358082016.1378397697610 is corrupted = false progress failed = false
2013-09-09 11:24:19,978 WARN org.apache.hadoop.hbase.regionserver.SplitLogWorker: log splitting of hdfs://opentsdb:8020/hbase/.logs/opentsdb,60020,1378358082016-splitting/opentsdb%2C60020%2C1378358082016.1378397697610 failed, returning error
java.io.IOException: Cannot obtain block length for LocatedBlock{BP-17274449-192.168.0.75-1376541308222:blk_4420133534962983319_1645; getBlockSize()=0; corrupt=false; offset=0; locs=[192.168.0.75:50010]}
at org.apache.hadoop.hdfs.DFSInputStream.readBlockLength(DFSInputStream.java:319)
at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:263)
at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:205)
at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:198)
at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1117)
at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:249)
at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:82)
at org.apache.hadoop.io.SequenceFile$Reader.openFile(SequenceFile.java:1787)
at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.openFile(SequenceFileLogReader.java:62)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1707)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1728)
at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.<init>(SequenceFileLogReader.java:55)
at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.init(SequenceFileLogReader.java:177)
at org.apache.hadoop.hbase.regionserver.wal.HLog.getReader(HLog.java:713)
at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getReader(HLogSplitter.java:825)
at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getReader(HLogSplitter.java:738)
at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:382)
at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:350)
at org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:115)
at org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:283)
at org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:214)
at org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:182)
at java.lang.Thread.run(Thread.java:662)
从错误上可以看出,ROOT Region没在线, Region Server的拆分HLog的时候,由于获取HLog的长度时,发生错误,导致失败.查询Region Server状态的时候,发现确实没有.META. Region,猜测该HLog文件损坏了.
hbase hlog /hbase/.logs/opentsdb,60020,1378358082016-splitting/opentsdb%2C60020%2C1378358082016.1378397697610
查看HLog,和上面报的错误时一样的,删除该log文件,hadoop fs -rmr /hbase/.logs/opentsdb,60020,1378358082016-splitting/opentsdb%2C60020%2C1378358082016.1378397697610
重新启动Region Server就可以了,问题的原因是,用户在向Hbase插入数据时,强制停掉了RS,使HLog文件出错.
在网上查看Region is not online: -ROOT-,,0相关的错误,也没有得到正确的答案,后来看了一下源码,报这个错误的地方是在:
protected HRegion getRegion(final byte[] regionName)
throws NotServingRegionException {
HRegion region = null;
region = getOnlineRegion(regionName);
if (region == null) {
throw new NotServingRegionException("Region is not online: " +
Bytes.toStringBinary(regionName));
}
return region;
}
也就是说,regionName不再Map中,就会报这个错误,具体问题还得具体分析