JournalNode无法启动问题排查

JournalNode无法启动问题排查

1. 问题说明

  • 1.1 JournalNode重启后又失败,一直重启不成功,经过观察,发现日志报错,经排查报错原因是edit log损坏导致的
2018-05-28 16:06:07,896 WARN  namenode.FSImage (EditLogFileInputStream.java:scanEditLog(359)) - Caught exception after scanning through 0 ops from /hadoop/hdfs/journal/DHTestCluster/current/edits_inprogress_0000000000019770365 while determining its valid length. Position was 1044480
java.io.IOException: Can't scan a pre-transactional edit log.
	at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$LegacyReader.scanOp(FSEditLogOp.java:4974)
	at org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream.scanNextOp(EditLogFileInputStream.java:245)
	at org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream.scanEditLog(EditLogFileInputStream.java:355)
	at org.apache.hadoop.hdfs.server.namenode.FileJournalManager$EditLogFile.scanLog(FileJournalManager.java:551)
	at org.apache.hadoop.hdfs.qjournal.server.Journal.scanStorageForLatestEdits(Journal.java:192)
	at org.apache.hadoop.hdfs.qjournal.server.Journal.(Journal.java:152)
	at org.apache.hadoop.hdfs.qjournal.server.JournalNode.getOrCreateJournal(JournalNode.java:90)
	at org.apache.hadoop.hdfs.qjournal.server.JournalNode.getOrCreateJournal(JournalNode.java:99)
	at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.getEditLogManifest(JournalNodeRpcServer.java:189)
	at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.getEditLogManifest(QJournalProtocolServerSideTranslatorPB.java:224)
	at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25431)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2345)
2018-05-28 16:06:07,896 WARN  namenode.FSImage (EditLogFileInputStream.java:scanEditLog(364)) - After resync, position is 1044480

2. 解决方法

  • 2.1 从正常运行的JournalNode机器上copy edit log

  • 2.2 切换到edit log所在的current目录

    • cd /hadoop/hdfs/journal/DHTestCluster/current(根据自己的配置文件找到current目录)
  • 2.3 压缩current目录

    • tar -zcvf current.tar.gz ./current
  • 2.4 删除损坏的edit log

    • cd /hadoop/hdfs/journal/DHTestCluster/

    • rm -rf current/

  • 2.5 copy current.tar.gz到目标机器上

    • scp current.tar.gz hdfs@hadoop:/hadoop/hdfs/journal/DHTestCluster

    • tar -zxvf current.tar.gz

  • 2.6 重启JournalNode即可

你可能感兴趣的:(Hadoop,JournalNode)