RM崩溃:KeeperErrorCode = NoNode

直接上日志:
前菜

2016-11-03 18:38:15,433 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_e08_1478051449455_0003_01_000001 Container Transitioned from RUNNING to COMPLETED
2016-11-03 18:38:15,433 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp: Completed container: container_e08_1478051449455_0003_01_000001 in state: COMPLETED event:FINISHED
2016-11-03 18:38:15,434 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=appuser  OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS  APPID=application_1478051449455_0003    CONTAINERID=container_e08_1478051449455_0003_01_000001
2016-11-03 18:38:15,434 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode: Released container container_e08_1478051449455_0003_01_000001 of capacity 2048, vCores:1> on host SCSV00463:35617, which currently has 2 containers, 4096, vCores:2> used and 4096, vCores:6> available, release resources=true
2016-11-03 18:38:15,434 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: default used=14336, vCores:7> numContainers=7 user=appuser user-resources=14336, vCores:7>
2016-11-03 18:38:15,434 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: completedContainer container=Container: [ContainerId: container_e08_1478051449455_0003_01_000001, NodeId: SCSV00463:35617, NodeHttpAddress: SCSV00463:8042, Resource: 2048, vCores:1>, Priority: 0, Token: Token { kind: ContainerToken, service: 10.129.12.41:35617 }, ] queue=default: capacity=1.0, absoluteCapacity=1.0, usedResources=14336, vCores:7>, usedCapacity=0.29166666, absoluteUsedCapacity=0.29166666, numApps=2, numContainers=7 cluster=49152, vCores:48>
2016-11-03 18:38:15,434 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: completedContainer queue=root usedCapacity=0.29166666 absoluteUsedCapacity=0.29166666 used=14336, vCores:7> cluster=49152, vCores:48>
2016-11-03 18:38:15,434 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Re-sorting completed queue: root.default stats: default: capacity=1.0, absoluteCapacity=1.0, usedResources=14336, vCores:7>, usedCapacity=0.29166666, absoluteUsedCapacity=0.29166666, numApps=2, numContainers=7
2016-11-03 18:38:15,434 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application attempt appattempt_1478051449455_0003_000001 released container container_e08_1478051449455_0003_01_000001 on node: host: SCSV00463:35617 #containers=2 available=4096, vCores:6> used=4096, vCores:2> with event: FINISHED
2016-11-03 18:38:15,435 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Updating application attempt appattempt_1478051449455_0003_000001 with final state: FAILED, and exit status: 15
2016-11-03 18:38:15,436 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1478051449455_0003_000001 State change from RUNNING to FINAL_SAVING

然后就开始报错了:

2016-11-03 18:38:15,446 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Exception while executing a ZK operation.
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
    at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:949)
    at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
    at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:920)
    at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:917)
    at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1095)
    at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1116)
    at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:917)
    at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:931)
    at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:964)
    at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:674)
    at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275)
    at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260)
    at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
    at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
    at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
    at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
    at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837)
    at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900)
    at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895)
    at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175)
    at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
    at java.lang.Thread.run(Thread.java:745)
2016-11-03 18:38:15,447 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed out ZK retries. Giving up!
2016-11-03 18:38:15,448 ERROR org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error updating appAttempt: appattempt_1478051449455_0003_000001
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
    at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:949)
    at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
    at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:920)
    at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:917)
    at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1095)
    at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1116)
    at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:917)
    at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:931)
    at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:964)
    at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:674)
    at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275)
    at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260)
    at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
    at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
    at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
    at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
    at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837)
    at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900)
    at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895)
    at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175)
    at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
    at java.lang.Thread.run(Thread.java:745)
2016-11-03 18:38:15,451 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type STATE_STORE_OP_FAILED. Cause:org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
    at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:949)
    at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
    at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:920)
    at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:917)
    at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1095)
    at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1116)
    at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:917)
    at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:931)
    at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:964)
    at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:674)
    at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275)
    at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260)
    at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
    at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
    at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
    at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
    at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837)
    at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900)
    at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895)
    at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175)
    at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
    at java.lang.Thread.run(Thread.java:745)

最后

2016-11-03 18:38:15,453 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1
2016-11-03 18:38:15,460 ERROR org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted
2016-11-03 18:38:15,462 INFO org.mortbay.log: Stopped HttpServer2$SelectChannelConnectorWithSafeStartup@10.129.12.36:8088
2016-11-03 18:38:15,564 INFO org.apache.hadoop.ipc.Server: Stopping server on 8033
2016-11-03 18:38:15,565 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 8033
2016-11-03 18:38:15,565 INFO org.apache.hadoop.ha.ActiveStandbyElector: Yielding from election
2016-11-03 18:38:15,565 INFO org.apache.hadoop.ha.ActiveStandbyElector: Deleting bread-crumb of active node...
2016-11-03 18:38:15,565 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
2016-11-03 18:38:15,577 INFO org.apache.zookeeper.ZooKeeper: Session: 0x8582187f4d80000 closed
2016-11-03 18:38:15,577 WARN org.apache.hadoop.ha.ActiveStandbyElector: Ignoring stale result from old client with sessionId 0x8582187f4d80000
2016-11-03 18:38:15,578 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down

这个问题在2.8的时候解决了
详见:YARN-3798

你可能感兴趣的:(RM崩溃:KeeperErrorCode = NoNode)