大数据之Hadoop碰到的坑
环境:
hadoop2.7.4集群
问题:
启动集群时出现如下问题:
Serving checkpoints at http://Node1:50070
2018-01-21 07:12:28,734 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: Node2/202.96.64.21:8485. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-01-21 07:12:28,735 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: Node3/202.96.64.22:8485. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-01-21 07:12:28,736 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: Node4/202.96.64.23:8485. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-01-21 07:12:29,736 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: Node2/202.96.64.21:8485. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-01-21 07:12:29,737 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: Node3/202.96.64.22:8485. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-01-21 07:12:29,738 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: Node4/202.96.64.23:8485. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-01-21 07:12:30,740 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: Node4/202.96.64.23:8485. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-01-21 07:12:30,741 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: Node3/202.96.64.22:8485. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-01-21 07:12:30,741 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: Node2/202.96.64.21:8485. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-01-21 07:12:31,743 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: Node3/202.96.64.22:8485. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-01-21 07:12:31,744 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: Node2/202.96.64.21:8485. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-01-21 07:12:31,745 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: Node4/202.96.64.23:8485. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-01-21 07:12:32,746 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: Node2/202.96.64.21:8485. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-01-21 07:12:32,747 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: Node4/202.96.64.23:8485. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-01-21 07:12:32,748 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: Node3/202.96.64.22:8485. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-01-21 07:12:33,731 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 6001 ms (timeout=20000 ms) for a response for selectInputStreams. No responses yet.
2018-01-21 07:12:33,749 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: Node4/202.96.64.23:8485. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-01-21 07:12:33,749 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: Node2/202.96.64.21:8485. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-01-21 07:12:33,750 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: Node3/202.96.64.22:8485. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-01-21 07:12:34,732 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 7002 ms (timeout=20000 ms) for a response for selectInputStreams. Succeeded so far: [202.96.64.23:8485]
2018-01-21 07:12:34,752 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: Node2/202.96.64.21:8485. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-01-21 07:12:34,753 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: Node3/202.96.64.22:8485. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-01-21 07:12:36,981 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* registerDatanode: from DatanodeRegistration(202.96.64.23:50010, datanodeUuid=25b7d59f-b67b-4a08-a6a9-972bfcc6cc91, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-56;cid=CID-ccbb5d05-fee9-44af-a7f0-d42e4448730d;nsid=1906906714;c=0) storage 25b7d59f-b67b-4a08-a6a9-972bfcc6cc91
2018-01-21 07:12:36,982 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeDescriptor: Number of failed storage changes from 0 to 0
2018-01-21 07:12:36,983 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default-rack/202.96.64.23:50010
2018-01-21 07:12:37,385 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeDescriptor: Number of failed storage changes from 0 to 0
2018-01-21 07:12:37,385 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeDescriptor: Adding new storage ID DS-00a7e94e-7637-40da-ac72-3239263450c4 for DN 202.96.64.23:50010
2018-01-21 07:12:37,620 INFO org.apache.hadoop.hdfs.StateChange: STATE* Safe mode extension entered.
The reported blocks 1 has reached the threshold 0.9990 of total blocks 2. The number of live datanodes 1 has reached the minimum number 0. In safe mode extension. Safe mode will be turned off automatically in 29 seconds.
2018-01-21 07:12:37,621 INFO BlockStateChange: BLOCK* processReport 0x408163610a0: from storage DS-00a7e94e-7637-40da-ac72-3239263450c4 node DatanodeRegistration(202.96.64.23:50010, datanodeUuid=25b7d59f-b67b-4a08-a6a9-972bfcc6cc91, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-56;cid=CID-ccbb5d05-fee9-44af-a7f0-d42e4448730d;nsid=1906906714;c=0), blocks: 2, hasStaleStorage: false, processing time: 47 msecs
2018-01-21 07:12:37,819 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* registerDatanode: from DatanodeRegistration(202.96.64.22:50010, datanodeUuid=3d0e9d33-3bcd-4bcb-b482-777bb09f380d, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-56;cid=CID-ccbb5d05-fee9-44af-a7f0-d42e4448730d;nsid=1906906714;c=0) storage 3d0e9d33-3bcd-4bcb-b482-777bb09f380d
2018-01-21 07:12:37,820 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeDescriptor: Number of failed storage changes from 0 to 0
2018-01-21 07:12:37,820 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default-rack/202.96.64.22:50010
2018-01-21 07:12:38,124 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeDescriptor: Number of failed storage changes from 0 to 0
2018-01-21 07:12:38,124 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeDescriptor: Adding new storage ID DS-61d63ee2-f197-4b16-9ac4-f2b99f90f3fc for DN 202.96.64.22:50010
2018-01-21 07:12:38,280 INFO BlockStateChange: BLOCK* processReport 0x4084f056804: from storage DS-61d63ee2-f197-4b16-9ac4-f2b99f90f3fc node DatanodeRegistration(202.96.64.22:50010, datanodeUuid=3d0e9d33-3bcd-4bcb-b482-777bb09f380d, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-56;cid=CID-ccbb5d05-fee9-44af-a7f0-d42e4448730d;nsid=1906906714;c=0), blocks: 2, hasStaleStorage: false, processing time: 0 msecs
2018-01-21 07:12:38,454 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* registerDatanode: from DatanodeRegistration(202.96.64.21:50010, datanodeUuid=1c103aa2-3639-4a7d-a7fd-21e04b1112ff, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-56;cid=CID-ccbb5d05-fee9-44af-a7f0-d42e4448730d;nsid=1906906714;c=0) storage 1c103aa2-3639-4a7d-a7fd-21e04b1112ff
2018-01-21 07:12:38,455 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeDescriptor: Number of failed storage changes from 0 to 0
2018-01-21 07:12:38,455 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default-rack/202.96.64.21:50010
2018-01-21 07:12:38,859 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeDescriptor: Number of failed storage changes from 0 to 0
2018-01-21 07:12:38,859 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeDescriptor: Adding new storage ID DS-24676d02-cc62-4153-bdf1-a1d778cb35de for DN 202.96.64.21:50010
2018-01-21 07:12:39,002 INFO BlockStateChange: BLOCK* processReport 0x408793e723c: from storage DS-24676d02-cc62-4153-bdf1-a1d778cb35de node DatanodeRegistration(202.96.64.21:50010, datanodeUuid=1c103aa2-3639-4a7d-a7fd-21e04b1112ff, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-56;cid=CID-ccbb5d05-fee9-44af-a7f0-d42e4448730d;nsid=1906906714;c=0), blocks: 2, hasStaleStorage: false, processing time: 1 msecs
2018-01-21 07:12:41,208 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Stopping services started for standby state
2018-01-21 07:12:41,209 WARN org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Edit log tailer interrupted
java.lang.InterruptedException: sleep interrupted
at java.lang.Thread.sleep(Native Method)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:347)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:284)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:301)
at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:415)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:297)
2018-01-21 07:12:41,214 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Starting services required for active state
2018-01-21 07:12:41,246 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Starting recovery process for unclosed journal segments...
2018-01-21 07:12:41,393 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Successfully started new epoch 7
2018-01-21 07:12:41,393 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Beginning recovery of unclosed segment starting at txid 278
2018-01-21 07:12:41,447 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Recovery prepare phase complete. Responses:
202.96.64.22:8485: segmentState { startTxId: 278 endTxId: 278 isInProgress: true } lastWriterEpoch: 5 lastCommittedTxId: 590
202.96.64.23:8485: lastWriterEpoch: 0
2018-01-21 07:12:41,451 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Using longest log: 202.96.64.22:8485=segmentState {
startTxId: 278
endTxId: 278
isInProgress: true
}
lastWriterEpoch: 5
lastCommittedTxId: 590
2018-01-21 07:12:41,451 FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: recoverUnfinalizedSegments failed for required journal (JournalAndStream(mgr=QJM to [202.96.64.21:8485, 202.96.64.22:8485, 202.96.64.23:8485], stream=null))
java.lang.AssertionError: Decided to synchronize log to startTxId: 278
endTxId: 278
isInProgress: true
but logger 202.96.64.22:8485 had seen txid 590 committed
at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.recoverUnclosedSegment(QuorumJournalManager.java:338)
at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.recoverUnfinalizedSegments(QuorumJournalManager.java:455)
at org.apache.hadoop.hdfs.server.namenode.JournalSet$8.apply(JournalSet.java:624)
at org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:393)
at org.apache.hadoop.hdfs.server.namenode.JournalSet.recoverUnfinalizedSegments(JournalSet.java:621)
at org.apache.hadoop.hdfs.server.namenode.FSEditLog.recoverUnclosedStreams(FSEditLog.java:1490)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1114)
at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1722)
at org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61)
at org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:64)
at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:49)
at org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1595)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:1499)
at org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107)
at org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:4460)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2217)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2213)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2211)
2018-01-21 07:12:41,454 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1
2018-01-21 07:12:41,455 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at Node1/202.96.64.20
************************************************************/
解决方案:
在NameNode上执行如下命令:
hdfs namenode -initializeSharedEdits
然后重启集群。