环境:
Cloudera Express 5.3.3
4节点
在HDFS集群中部署secondary namenode到datanode1节点上,并做重启hdfs集群后,datanode1节点启动失败,但是secondary namenode状态是已启动。报错日志:
2015-06-25 13:45:08,280 WARN org.apache.hadoop.hdfs.server.common.Storage: org.apache.hadoop.hdfs.server.common.IncorrectVersionException: Unexpected version of storage directory /data0/dfs/nn. Reported: -59. Expecting = -56.
2015-06-25 13:45:08,280 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for Block pool <registering> (Datanode Uuid unassigned) service to scm.data.com/192.168.23.45:8022. Exiting.
java.io.IOException: All specified directories are failed to load.
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:473)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1322)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1292)
at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:320)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:225)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:862)
at java.lang.Thread.run(Thread.java:745)
2015-06-25 13:45:08,290 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool <registering> (Datanode Uuid unassigned) service to scm.data.dingkai.com/103.231.66.62:8022
2015-06-25 13:45:08,391 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Removed Block pool <registering> (Datanode Uuid unassigned)
2015-06-25 13:45:10,392 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode
2015-06-25 13:45:10,393 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 0
2015-06-25 13:45:10,395 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at hadoop1.data.com/192.168.23.45
************************************************************/
解析:
以上红色字体部分已经说明, /data0/dfs/nn 目录初始化失败。其原因是应为 /data0/dfs/nn 是hdfs的 dfs.namenode.name.dir 目录,而这里secondary namenode 节点已经启动了,这就说明/data0/dfs/nn 被当成secondary namenode的 检查点目录(fs.checkpoint.dir, dfs.namenode.checkpoint.di) 占用了。 而对于hdfs集群目录是不能和检查目录共用,所以节点1启动失败。
解决方法:
修改secondary namenode 的检查点目录,并重启hdfs集群就可以正常了。