flink报错:JobManager responsible for xxx lost the leadership

查看jobmanager日志:standalonesession-0-master.log 

2020-05-16 21:46:53,511 WARN  akka.remote.ReliableDeliverySupervisor                        - Association with remote system [akka.tcp://flink-metrics@master:3821] has failed, address is now gated for [50] ms. Reason: [Disassociated]
2020-05-16 21:46:53,511 WARN  akka.remote.ReliableDeliverySupervisor                        - Association with remote system [akka.tcp://flink@master:6123] has failed, address is now gated for [50] ms. Reason: [Disassociated]
2020-05-16 21:47:36,620 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor            - The heartbeat of JobManager with id bc6c72e5dded7b29a59ecc5417a12aee timed out.
2020-05-16 21:47:36,620 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Close JobManager connection for job 53a1798d17013773e5622cc42c9bb39b.
2020-05-16 21:47:36,621 INFO  org.apache.flink.runtime.taskmanager.Task                     - Attempting to fail task externally Source: Custom Source -> Map (1/2) (2dad626977317551ef95c85e3f44cc3a).
2020-05-16 21:47:36,621 INFO  org.apache.flink.runtime.taskmanager.Task                     - Source: Custom Source -> Map (1/2) (2dad626977317551ef95c85e3f44cc3a) switched from RUNNING to FAILED.
org.apache.flink.util.FlinkException: JobManager responsible for 53a1798d17013773e5622cc42c9bb39b lost the leadership.
        at org.apache.flink.runtime.taskexecutor.TaskExecutor.closeJobManagerConnection(TaskExecutor.java:1272)
        at org.apache.flink.runtime.taskexecutor.TaskExecutor.access$1200(TaskExecutor.java:154)
        at org.apache.flink.runtime.taskexecutor.TaskExecutor$JobManagerHeartbeatListener.notifyHeartbeatTimeout(TaskExecutor.java:1791)
        at org.apache.flink.runtime.heartbeat.HeartbeatMonitorImpl.run(HeartbeatMonitorImpl.java:109)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:397)
        at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:190)
        at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152)
        at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
        at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
        at scala.PartialFunction.applyOrElse(PartialFunction.scala:123)
        at scala.PartialFunction.applyOrElse$(PartialFunction.scala:122)
        at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
        at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
        at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
        at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
        at akka.actor.Actor.aroundReceive(Actor.scala:517)
        at akka.actor.Actor.aroundReceive$(Actor.scala:515)
        at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
        at akka.actor.ActorCell.invoke(ActorCell.scala:561)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
        at akka.dispatch.Mailbox.run(Mailbox.scala:225)
        at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
        at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.util.concurrent.TimeoutException: The heartbeat of JobManager with id bc6c72e5dded7b29a59ecc5417a12aee timed out.
        at org.apache.flink.runtime.taskexecutor.TaskExecutor$JobManagerHeartbeatListener.notifyHeartbeatTimeout(TaskExecutor.java:1792)
        ... 26 more

单从日志我没有找到问题原因(因为我太菜)。后来问同事,是因为提交了一个job,这个job中有这么一句:

System.exit(-1);

在异常的时候,执行了这句代码,导致flink所在的JVM虚拟机直接关闭了。

因此,在代码中尽量不要有这种语句。

你可能感兴趣的:(flink)