- 启动hdfs时,查看进程发现没有datanode进程
[hadoop@hadoop001 sbin]$ start-dfs.sh
Starting namenodes on [hadoop001]
hadoop001: starting namenode, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.15.1/logs/hadoop-hadoop-namenode-hadoop001.out
hadoop001: starting datanode, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.15.1/logs/hadoop-hadoop-datanode-hadoop001.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.15.1/logs/hadoop-hadoop-secondarynamenode-hadoop001.out
[hadoop@hadoop001 sbin]$ jps
2770 ResourceManager
2883 NodeManager
5880 SecondaryNameNode
5995 Jps
5599 NameNode
- 尝试单独启动datanode,发现还是不行
[hadoop@hadoop001 sbin]$ hadoop-daemon.sh start datanode
starting datanode, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.15.1/logs/hadoop-hadoop-datanode-hadoop001.out
[hadoop@hadoop001 sbin]$ jps
2770 ResourceManager
2883 NodeManager
5880 SecondaryNameNode
6107 Jps
5599 NameNode
- 然后到hadoop日志目录下查看datanode得日志信息
报错信息如下:
2019-08-26 08:11:56,368 WARN org.apache.hadoop.hdfs.server.common.Storage: Failed to add storage directory [DISK]file:/tmp/hadoop-hadoop/dfs/data/
java.io.IOException: Incompatible clusterIDs in /tmp/hadoop-hadoop/dfs/data: namenode clusterID = CID-56cb1b3a-d272-4b55-a560-93d34f3ea536; datanode clusterID = CID-f06280d7-1870-452d-a155-419b58c23f55
at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:779)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.loadStorageDirectory(DataStorage.java:302)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.loadDataStorage(DataStorage.java:418)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:397)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:575)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1560)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1520)
at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:354)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:219)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:673)
at java.lang.Thread.run(Thread.java:745)
2019-08-26 08:11:56,371 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool ID needed, but service not yet registered with NN, trace:
java.lang.Exception
at org.apache.hadoop.hdfs.server.datanode.BPOfferService.getBlockPoolId(BPOfferService.java:190)
at org.apache.hadoop.hdfs.server.datanode.BPOfferService.hasBlockPoolId(BPOfferService.java:200)
at org.apache.hadoop.hdfs.server.datanode.BPOfferService.shouldRetryInit(BPOfferService.java:799)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.shouldRetryInit(BPServiceActor.java:712)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:678)
at java.lang.Thread.run(Thread.java:745)
2019-08-26 08:11:56,371 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for Block pool (Datanode Uuid c7fd2b2f-13af-4ec2-9d60-17a9122bc43d) service to hadoop001/172.19.6.118:9000. Exiting.
java.io.IOException: All specified directories are failed to load.
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:576)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1560)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1520)
at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:354)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:219)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:673)
at java.lang.Thread.run(Thread.java:745)
2019-08-26 08:11:56,371 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool (Datanode Uuid c7fd2b2f-13af-4ec2-9d60-17a9122bc43d) service to hadoop001/172.19.6.118:9000
2019-08-26 08:11:56,472 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool ID needed, but service not yet registered with NN, trace:
java.lang.Exception
at org.apache.hadoop.hdfs.server.datanode.BPOfferService.getBlockPoolId(BPOfferService.java:190)
at org.apache.hadoop.hdfs.server.datanode.BPOfferService.hasBlockPoolId(BPOfferService.java:200)
at org.apache.hadoop.hdfs.server.datanode.BlockPoolManager.remove(BlockPoolManager.java:91)
at org.apache.hadoop.hdfs.server.datanode.DataNode.shutdownBlockPool(DataNode.java:1475)
at org.apache.hadoop.hdfs.server.datanode.BPOfferService.shutdownActor(BPOfferService.java:437)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.cleanUp(BPServiceActor.java:457)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:707)
at java.lang.Thread.run(Thread.java:745)
2019-08-26 08:11:56,473 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Removed Block pool (Datanode Uuid c7fd2b2f-13af-4ec2-9d60-17a9122bc43d)
2019-08-26 08:11:56,473 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool ID needed, but service not yet registered with NN, trace:
此时会发现问题所在:
namenode clusterID = CID-56cb1b3a-d272-4b55-a560-93d34f3ea536
datanode clusterID = CID-f06280d7-1870-452d-a155-419b58c23f55
是由于这两个id不一致导致的
- 解决办法
方法1: 根据日志中的路径:/tmp/hadoop-hadoop/dfs/data (该路径在root用户下)
[root@hadoop001 dfs]# ll
总用量 12
drwx------ 3 hadoop hadoop 4096 8月 26 08:48 data
drwxrwxr-x 3 hadoop hadoop 4096 8月 26 08:33 name
drwxrwxr-x 3 hadoop hadoop 4096 8月 26 08:33 namesecondary
# 将name/current/VERSION 文件中的 clusterID的值
#拷贝到 name/current/VERSION 文件中的 clusterID的=后面
# 也就是让name data两个的clusterID保持一致
方法2: 直接删除data name下面的文件夹,重新格式化namenode
- 重新启动
[hadoop@hadoop001 sbin]$ start-dfs.sh
Starting namenodes on [hadoop001]
hadoop001: namenode running as process 5599. Stop it first.
hadoop001: starting datanode, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.15.1/logs/hadoop-hadoop-datanode-hadoop001.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: secondarynamenode running as process 5880. Stop it first.
[hadoop@hadoop001 sbin]$ jps
2770 ResourceManager
2883 NodeManager
5880 SecondaryNameNode
6331 DataNode # 正常启动
6556 Jps
5599 NameNode