启动hadoop报No Route to Host from lida1/10.30.12.87 to lida3:8485 failed on socket timeout exception:

        问题: 启动hadoop集群时有个NN始终起不来。查看日志,发现报错如下:

2016-05-04 15:12:27,837 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Get corrupt file blocks returned error: Operation category READ is not supported in state standby
2016-05-04 15:12:36,124 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 8020, call org.apache.hadoop.hdfs.server.protocol.NamenodeProtocol.rollEditLog from 10.30.12.88:49509 Call#185 Retry#0: org.apache.hadoop.ipc.StandbyException: Operation category JOURNAL is not supported in state standby
2016-05-04 15:12:52,584 INFO org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Triggering log roll on remote NameNode lida2/10.30.12.88:8020
2016-05-04 15:12:52,598 WARN org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Unable to trigger a roll of the active NN
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category JOURNAL is not supported in state standby
	at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:87)
	at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1688)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1258)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:5765)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:886)
	at org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:139)
	at org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:11214)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)

	at org.apache.hadoop.ipc.Client.call(Client.java:1411)
	at org.apache.hadoop.ipc.Client.call(Client.java:1364)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
	at com.sun.proxy.$Proxy14.rollEditLog(Unknown Source)
	at org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolTranslatorPB.rollEditLog(NamenodeProtocolTranslatorPB.java:139)
	at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.triggerActiveLogRoll(EditLogTailer.java:271)
	at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.access$600(EditLogTailer.java:61)
	at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:313)
	at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:282)
	at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:299)
	at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:411)
	at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:295)
2016-05-04 15:12:53,632 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: lida4/10.30.12.90:8485. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-05-04 15:12:53,636 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: lida3/10.30.12.89:8485. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-05-04 15:12:54,635 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: lida4/10.30.12.90:8485. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-05-04 15:12:54,638 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: lida3/10.30.12.89:8485. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-05-04 15:12:55,636 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: lida4/10.30.12.90:8485. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-05-04 15:12:55,639 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: lida3/10.30.12.89:8485. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-05-04 15:12:56,638 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: lida4/10.30.12.90:8485. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-05-04 15:12:56,640 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: lida3/10.30.12.89:8485. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-05-04 15:12:57,641 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: lida4/10.30.12.90:8485. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-05-04 15:12:57,642 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: lida3/10.30.12.89:8485. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-05-04 15:12:58,632 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 6001 ms (timeout=20000 ms) for a response for selectInputStreams. Succeeded so far: [10.30.12.88:8485]
2016-05-04 15:12:58,644 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: lida3/10.30.12.89:8485. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-05-04 15:12:58,646 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: lida4/10.30.12.90:8485. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-05-04 15:12:59,633 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 7003 ms (timeout=20000 ms) for a response for selectInputStreams. Succeeded so far: [10.30.12.88:8485]
2016-05-04 15:12:59,647 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: lida3/10.30.12.89:8485. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-05-04 15:12:59,648 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: lida4/10.30.12.90:8485. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-05-04 15:13:00,635 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 8005 ms (timeout=20000 ms) for a response for selectInputStreams. Succeeded so far: [10.30.12.88:8485]
2016-05-04 15:13:00,652 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: lida3/10.30.12.89:8485. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-05-04 15:13:00,653 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: lida4/10.30.12.90:8485. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-05-04 15:13:01,637 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 9007 ms (timeout=20000 ms) for a response for selectInputStreams. Succeeded so far: [10.30.12.88:8485]
2016-05-04 15:13:01,655 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: lida4/10.30.12.90:8485. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-05-04 15:13:01,656 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: lida3/10.30.12.89:8485. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-05-04 15:13:02,638 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 10008 ms (timeout=20000 ms) for a response for selectInputStreams. Succeeded so far: [10.30.12.88:8485]
2016-05-04 15:13:02,659 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: lida4/10.30.12.90:8485. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-05-04 15:13:02,659 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: lida3/10.30.12.89:8485. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-05-04 15:13:02,663 WARN org.apache.hadoop.hdfs.server.namenode.FSEditLog: Unable to determine input streams from QJM to [10.30.12.88:8485, 10.30.12.89:8485, 10.30.12.90:8485]. Skipping.
org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many exceptions to achieve quorum size 2/3. 1 successful responses:
10.30.12.88:8485: [[7162,7163], [7164,7165], [7166,7167], [7168,7169], [7170,7171], [7172,7173], [7174,7175], [7176,7177], [7178,7179], [7180,7181], [7182,7183], [7184,7185], [7186,7187], [7188,7189], [7190,7191]]
2 exceptions thrown:
10.30.12.90:8485: No Route to Host from  lida1/10.30.12.87 to lida4:8485 failed on socket timeout exception: java.net.NoRouteToHostException: No route to host; For more details see:  http://wiki.apache.org/hadoop/NoRouteToHost
10.30.12.89:8485: No Route to Host from  lida1/10.30.12.87 to lida3:8485 failed on socket timeout exception: java.net.NoRouteToHostException: No route to host; For more details see:  http://wiki.apache.org/hadoop/NoRouteToHost
	at org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81)
	at org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:223)
	at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:142)
	at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectInputStreams(QuorumJournalManager.java:471)
	at org.apache.hadoop.hdfs.server.namenode.JournalSet.selectInputStreams(JournalSet.java:260)
	at org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1430)
	at org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1450)
	at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:212)
	at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:324)
	at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:282)
	at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:299)
	at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:411)
	at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:295)
    出现原因:问题1是问题2的诱因,即两个NN都是standBy状态

    解决办法:

    1、重启服务器,重启hadoop集群,让其中一个NN的状态为active。
    2、手动把一台服务器的NN的状态切换成active。命令如下:hdfs haadmin -transitionToActive nn2。该命令的意思是把nn2的状态改为active。


你可能感兴趣的:(Hadoop)