在HDFS集群中优化secondary namenode到datanode1节点上,并做重启hdfs集群后,datanode1启动失败

环境:

Cloudera Express 5.3.3

4节点



在HDFS集群中部署secondary namenode到datanode1节点上,并做重启hdfs集群后,datanode1节点启动失败,但是secondary namenode状态是已启动。报错日志:


2015-06-25 13:45:08,280 WARN org.apache.hadoop.hdfs.server.common.Storage: org.apache.hadoop.hdfs.server.common.IncorrectVersionException: Unexpected version of storage directory /data0/dfs/nn. Reported: -59. Expecting = -56.

2015-06-25 13:45:08,280 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for Block pool <registering> (Datanode Uuid unassigned) service to scm.data.com/192.168.23.45:8022. Exiting. 

java.io.IOException: All specified directories are failed to load.

at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:473)

at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1322)

at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1292)

at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:320)

at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:225)

at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:862)

at java.lang.Thread.run(Thread.java:745)

2015-06-25 13:45:08,290 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool <registering> (Datanode Uuid unassigned) service to scm.data.dingkai.com/103.231.66.62:8022

2015-06-25 13:45:08,391 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Removed Block pool <registering> (Datanode Uuid unassigned)

2015-06-25 13:45:10,392 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode

2015-06-25 13:45:10,393 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 0

2015-06-25 13:45:10,395 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG: 

/************************************************************

SHUTDOWN_MSG: Shutting down DataNode at hadoop1.data.com/192.168.23.45

************************************************************/


解析:

以上红色字体部分已经说明, /data0/dfs/nn 目录初始化失败。其原因是应为  /data0/dfs/nn 是hdfs的 dfs.namenode.name.dir 目录,而这里secondary namenode 节点已经启动了,这就说明/data0/dfs/nn 被当成secondary namenode的 检查点目录(fs.checkpoint.dir, dfs.namenode.checkpoint.di) 占用了。 而对于hdfs集群目录是不能和检查目录共用,所以节点1启动失败。


解决方法:

修改secondary namenode 的检查点目录,并重启hdfs集群就可以正常了。




你可能感兴趣的:(datanode1启动失败,并做重启hdfs集群后)