HDFS启动发现DataNode启动失败报错:java.io.IOException: Incompatible clusterIDs

报错详情:
HDFS启动后有个别DataNode很快就挂掉,经查看日志文件,有如下报错:

2020-07-15 09:55:09,406 INFO  common.Storage (Storage.java:tryLock(776)) - Lock on /opt/hadoop/hadoop/hdfs/data/in_use.lock acquired by nodename 20657@node4
2020-07-15 09:55:09,408 WARN  common.Storage (DataStorage.java:loadDataStorage(449)) - Failed to add storage directory [DISK]file:/opt/hadoop/hadoop/hdfs/data/
java.io.IOException: Incompatible clusterIDs in /opt/hadoop/hadoop/hdfs/data: namenode clusterID = CID-64742c14-5c74-439c-a95d-e82e8a332914; datanode clusterID = CID-35f03ca6-ee0c-4bd4-babd-c77dfccfab27
    at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:801)
    at org.apache.hadoop.hdfs.server.datanode.DataStorage.loadStorageDirectory(DataStorage.java:322)
    at org.apache.hadoop.hdfs.server.datanode.DataStorage.loadDataStorage(DataStorage.java:438)
    at org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:417)
    at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:595)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1543)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1504)
    at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:319)
    at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:272)
    at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:768)
    at java.lang.Thread.run(Thread.java:748)
2020-07-15 09:55:09,411 ERROR datanode.DataNode (BPServiceActor.java:run(780)) - Initialization failed for Block pool  (Datanode Uuid 2817ee6c-67af-4f2b-a62c-3ebaf8cc260f) service to node1/25.211.142.57:8020. Exiting. 
java.io.IOException: All specified directories are failed to load.
    at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:596)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1543)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1504)
    at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:319)
    at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:272)
    at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:768)
    at java.lang.Thread.run(Thread.java:748)
2020-07-15 09:55:09,411 WARN  datanode.DataNode (BPServiceActor.java:run(804)) - Ending block pool service for: Block pool  (Datanode Uuid 2817ee6c-67af-4f2b-a62c-3ebaf8cc260f) service to node1/25.211.142.57:8020
2020-07-15 09:55:09,514 INFO  datanode.DataNode (BlockPoolManager.java:remove(103)) - Removed Block pool  (Datanode Uuid 2817ee6c-67af-4f2b-a62c-3ebaf8cc260f)
2020-07-15 09:55:11,515 WARN  datanode.DataNode (DataNode.java:secureMain(2699)) - Exiting Datanode
2020-07-15 09:55:11,517 INFO  util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status 0

报错原因:
异常的原因在于namenode 的clusterID 和datanode clusterID 不一致导致的。其中:
通过hdfs-site.xml中配置项:dfs.namenode.name.dir 指定的目录下的current目录中的VERSION文件查看namenode 的clusterID;
通过hdfs-site.xml中配置项:dfs.datanode.data.dir 指定的目录下的current目录中的VERSION文件查看datanode 的clusterID。

解决方案:
修改问题DataNode节点中的上述VERSION文件中的clusterID与NameNode中一致,重启服务即可。

你可能感兴趣的:(大数据运维,#HDFS)