Namenode节点因网络问题挂掉以后,整个集群的datanode等服务也相继挂了,待修复网络问题,并且启动集群后发现有两个datanode节点无法启动,查看日志发现其报错如下:
017-12-20 23:55:17,542 WARN org.apache.hadoop.hdfs.server.common.Storage: org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /data1/data/hadoop/datanode is in an inconsistent state: Root /data1/data/hadoop/datanode: DatanodeUuid=c9ee0ab8-45a3-4709-8fc2-35fe365ed598, does not match b6c8b918-fa63-4812-95bc-c399b4f30031 from other StorageDirectory.
2017-12-20 23:55:17,542 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Setting up storage: nsid=299045286;bpid=BP-1735478683-10.1.0.31-1433992326763;lv=-56;nsInfo=lv=-59;cid=CID-c1b0775b-e6f8-4bf9-bd3c-d3cd953ae8b3;nsid=299045286;c=0;bpid=BP-1735478683-10.1.0.31-1433992326763;dnuuid=b6c8b918-fa63-4812-95bc-c399b4f30031
2017-12-20 23:55:17,554 WARN org.apache.hadoop.hdfs.server.common.Util: Path /data/data/hadoop/datanode should be specified as a URI in configuration files. Please update hdfs configuration.
2017-12-20 23:55:17,555 WARN org.apache.hadoop.hdfs.server.common.Util: Path /data1/data/hadoop/datanode should be specified as a URI in configuration files. Please update hdfs configuration.
2017-12-20 23:55:17,555 INFO org.apache.hadoop.hdfs.server.common.Storage: Storage directory [DISK]file:/data/data/hadoop/datanode/ has already been used.
2017-12-20 23:55:17,603 INFO org.apache.hadoop.hdfs.server.common.Storage: Analyzing storage directories for bpid BP-1735478683-10.1.0.31-1433992326763
2017-12-20 23:55:17,604 WARN org.apache.hadoop.hdfs.server.common.Storage: Failed to analyze storage directories for block pool BP-1735478683-10.1.0.31-1433992326763
java.io.IOException: BlockPoolSliceStorage.recoverTransitionRead: attempt to load an used block storage: /data/data/hadoop/datanode/current/BP-1735478683-10.1.0.31-1433992326763
at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadBpStorageDirectories(BlockPoolSliceStorage.java:210)
at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.recoverTransitionRead(BlockPoolSliceStorage.java:242)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:381)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:462)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1318)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1288)
at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:320)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:221)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:829)
at java.lang.Thread.run(Thread.java:722)
2017-12-20 23:55:17,606 WARN org.apache.hadoop.hdfs.server.common.Storage: Failed to add storage for block pool: BP-1735478683-10.1.0.31-1433992326763 : BlockPoolSliceStorage.recoverTransitionRead: attempt to load an used block storage: /data/data/hadoop/datanode/current/BP-1735478683-10.1.0.31-1433992326763
2017-12-20 23:55:17,644 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /data1/data/hadoop/datanode/in_use.lock acquired by nodename 5774@NDAPP-DATA-11
2017-12-20 23:55:17,645 WARN org.apache.hadoop.hdfs.server.common.Storage: org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /data1/data/hadoop/datanode is in an inconsistent state: Root /data1/data/hadoop/datanode: DatanodeUuid=c9ee0ab8-45a3-4709-8fc2-35fe365ed598, does not match b6c8b918-fa63-4812-95bc-c399b4f30031 from other StorageDirectory.
2017-12-20 23:55:17,645 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for Block pool (Datanode Uuid unassigned) service to NDAPP-DATA-09/10.1.0.32:9000. Exiting.
java.io.IOException: All specified directories are failed to load.
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:463)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1318)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1288)
at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:320)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:221)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:829)
at java.lang.Thread.run(Thread.java:722)
2017-12-20 23:55:17,645 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for Block pool (Datanode Uuid unassigned) service to NDAPP-DATA-08/10.1.0.31:9000. Exiting.
org.apache.hadoop.util.DiskChecker$DiskErrorException: Too many failed volumes - current valid volumes: 1, volumes configured: 2, volumes failed: 1, volume failures tolerated: 0
at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.(FsDatasetImpl.java:247)
at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetFactory.newInstance(FsDatasetFactory.java:34)
at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetFactory.newInstance(FsDatasetFactory.java:30)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1331)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1288)
at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:320)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:221)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:829)
at java.lang.Thread.run(Thread.java:722)
2017-12-20 23:55:17,645 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool (Datanode Uuid unassigned) service to NDAPP-DATA-09/10.1.0.32:9000
2017-12-20 23:55:17,646 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool (Datanode Uuid unassigned) service to NDAPP-DATA-08/10.1.0.31:9000
2017-12-20 23:55:17,747 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Removed Block pool (Datanode Uuid unassigned)
2017-12-20 23:55:19,747 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode
2017-12-20 23:55:19,749 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 0
2017-12-20 23:55:19,751 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at XXXXXX/10.1.0.71
************************************************************/
最终发现无法启动原因是
Root /data1/data/hadoop/datanode: DatanodeUuid=c9ee0ab8-45a3-4709-8fc2-35fe365ed598, does not match b6c8b918-fa63-4812-95bc-c399b4f30031 from other StorageDirectory.
解决办法:
vim /data1/data/hadoop/namenode/current/VERSION
根据日志提示,将原来的UUID换成新的
#Thu Dec 21 00:02:59 CST 2017
storageID=DS-14885ae9-613f-4e9d-b9f3-6e672101bcd9
clusterID=CID-c1b0775b-e6f8-4bf9-bd3c-d3cd953ae8b3
cTime=0
datanodeUuid=c9ee0ab8-45a3-4709-8fc2-35fe365ed598
storageType=DATA_NODE
layoutVersion=-56
换成
#Thu Dec 21 00:02:59 CST 2017
storageID=DS-14885ae9-613f-4e9d-b9f3-6e672101bcd9
clusterID=CID-c1b0775b-e6f8-4bf9-bd3c-d3cd953ae8b3
cTime=0
datanodeUuid=b6c8b918-fa63-4812-95bc-c399b4f30031
storageType=DATA_NODE
layoutVersion=-56
然后启动datanode即可~ 不明白为啥这个ID会变~