多次格式化namenode出现的问题

 

删除hadoop的tmp目录下的所有内容,重新格式化namenode,启动hadoop集群,发现只能启动其中一个namenode。

在未启动成功的节点下,使用 hadoop-daemon.sh start namenode 启动,发现还是启动不了.查看namenode的日志,显示namenode没有格式化,当时就懵了,我明明就格式化了呀,于是打算再来一次,这次我在配置的nn1和nn2的两个节点下都执行hadoop namenode -format命令,再次启动集群,发现还是只能启动其中一个namenode,只不过这次在没启动成功的namenode的节点下查看namenode的日志,不是说没有格式化,而是显示以下内容:

2019-02-17 16:16:36,028 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Starting services required for active state
2019-02-17 16:16:36,514 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Starting recovery process for unclosed journal segments...
2019-02-17 16:16:36,936 FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: recoverUnfinalizedSegments failed for required journal (JournalAndStream(mgr=QJM to [192.168.195.134:8485, 192.168.195.135:8485, 192.168.195.136:8485], stream=null))
org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many exceptions to achieve quorum size 2/3. 3 exceptions thrown:
192.168.195.134:8485: Incompatible namespaceID for journal Storage Directory /home/hadoop-2.7.1/journal/ns: NameNode has nsId 1540108126 but storage has nsId 1584393958
        at org.apache.hadoop.hdfs.qjournal.server.JNStorage.checkConsistentNamespace(JNStorage.java:234)
        at org.apache.hadoop.hdfs.qjournal.server.Journal.newEpoch(Journal.java:289)
        at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.newEpoch(JournalNodeRpcServer.java:135)
        at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.newEpoch(QJournalProtocolServerSideTranslatorPB.java:133)
        at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25417)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)

 发现以下关键信息:

Incompatible namespaceID for journal Storage Directory /home/hadoop-2.7.1/journal/ns: NameNode has nsId 1540108126 but storage has nsId 1584393958

 通过日志可知,${HADOOP_HOME}/journal/ns/current/下的VERSION中的namespaceID(即日志中的nsId)与${HADOOP_HOME}/namenode/current/下的VERSION中的namespaceID值不一致。

实际上,journal、namenode的VERSION中的namespaceID和clusterID都需要一致,与datanode的VERSION中的clusterID也需要保证一致,集群才能正常启动。

各VERSION文件长这样子的:

NAME_NODE:${HADOOP_HOME}/tmp/namenode/current/

#Sun Feb 17 16:12:44 GMT 2019
namespaceID=1540108126
clusterID=CID-18ffe42d-6abc-489d-b414-17da67b9819a
cTime=0
storageType=NAME_NODE
blockpoolID=BP-86241208-192.168.195.134-1550419964338
layoutVersion=-63

JOURNAL_NODE:${HADOOP_HOME}/journal/ns/current/

 #Sun Feb 17 17:05:14 GMT 2019
namespaceID=1584393958
clusterID=CID-26f289e9-20ca-4349-b72f-ba1ff66cb7cb
cTime=0
storageType=JOURNAL_NODE
layoutVersion=-63

DATA_NODE:${HADOOP_HOME}/tmp/datanode/current/

#Sun Feb 17 16:16:26 GMT 2019
storageID=DS-c01212bb-aab9-452b-b621-1919b722be80
clusterID=CID-18ffe42d-6abc-489d-b414-17da67b9819a
cTime=0
datanodeUuid=80718228-4f56-4fbb-8f2e-355419bfa5be
storageType=DATA_NODE
layoutVersion=-56

解决方法:

一是手动修改各VERSION中的namespaceID和clusterID的值与active状态的namenode的一致,但这样做很麻烦也容易出错。

二是,再格式化一次吧,只需要在主节点格式化就好了。执行完 hadoop namenode -format命令后相应节点的tmp/namenode/下就已经生成了相关的namespaceID和clusterID。start-all.sh启动集群,这时发现还是standby的namenode未启动,别急。这时只需要在该节点下执行 hdfs namenode -bootstrapStandby命令去同步就好了。

STARTUP_MSG:   java = 1.8.0_111
************************************************************/
19/02/17 17:32:00 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
19/02/17 17:32:00 INFO namenode.NameNode: createNameNode [-bootstrapStandby]
19/02/17 17:32:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
=====================================================
About to bootstrap Standby ID nn2 from:
           Nameservice ID: ns
        Other Namenode ID: nn1
  Other NN's HTTP address: http://hadoop01:50070
  Other NN's IPC  address: hadoop01/192.168.195.134:9000
             Namespace ID: 1905637806
            Block pool ID: BP-566088330-192.168.195.134-1550424334682
               Cluster ID: CID-5faf3356-0205-4903-9e68-4f0a81f1af68
           Layout version: -63
       isUpgradeFinalized: true
=====================================================
19/02/17 17:32:02 INFO common.Storage: Storage directory /home/hadoop-2.7.1/tmp/namenode has been successfully formatted.
19/02/17 17:32:03 INFO namenode.TransferFsImage: Opening connection to http://hadoop01:50070/imagetransfer?getimage=1&txid=0&storageInfo=-63:1905637806:0:CID-5faf3356-0205-4903-9e68-4f0a81f1af68
19/02/17 17:32:03 INFO namenode.TransferFsImage: Image Transfer timeout configured to 60000 milliseconds
19/02/17 17:32:03 INFO namenode.TransferFsImage: Transfer took 0.02s at 0.00 KB/s
19/02/17 17:32:03 INFO namenode.TransferFsImage: Downloaded file fsimage.ckpt_0000000000000000000 size 351 bytes.
19/02/17 17:32:03 INFO util.ExitUtil: Exiting with status 0
19/02/17 17:32:03 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at hadoop02/192.168.195.135
************************************************************/

显示成功之后,再执行  hadoop-daemon.sh start namenode  ,会发现namenode启动成功。

可以使用hdfs haadmin -getServiceState nn1/nn2 查看nn1和nn2的状态。

你可能感兴趣的:(hadoop)