Flink异常问题提总结

Flink在执行过程中突然异常退出

Sink: time-kafka(1/1) switched to SCHEDULED
04/29/2019 10:10:20     Job execution switched to status FAILING.
org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Not enough free slots available to run the job. You can decrease the operator parallelism or increase thenumber of slots per TaskManager in the configuration. Task to schedule: < Attempt #10 (Source: source -> (Filter, Timestamps/Watermarks -> Filter) (12/12)) @ (unassigned) - [SCHEDULED] > with groupID < d460da9a057758d795825417554f0e72 > in sharing group < SlotSharingGroup [d460da9a057758d795825417554f0e72, 0f5d1bbb1c312ef7bcca697263389b15, 3b928584ed2bd5c041cea2f3dba3aa0e, a57d18a89c6c239247f95ebb9819ce1e, dabc4aa3951942f45c2de75c800930c3] >. Resources available to scheduler: Number of instances=11, total number of slots=11, available slots=0
        at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleTask(Scheduler.java:263)
        at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.allocateSlot(Scheduler.java:142)
        at org.apache.flink.runtime.executiongraph.Execution.lambda$allocateAndAssignSlotForExecution$1(Execution.java:440)
        at java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:981)
        at java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2124)
        at org.apache.flink.runtime.executiongraph.Execution.allocateAndAssignSlotForExecution(Execution.java:438)
        at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.allocateResourcesForAll(ExecutionJobVertex.java:503)
        at org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleEager(ExecutionGraph.java:891)
        at org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleForExecution(ExecutionGraph.java:845)
        at org.apache.flink.runtime.executiongraph.ExecutionGraph.restart(ExecutionGraph.java:1193)
        at org.apache.flink.runtime.executiongraph.restart.ExecutionGraphRestartCallback.triggerFullRecovery(ExecutionGraphRestartCallback.java:59)
        at org.apache.flink.runtime.executiongraph.restart.FixedDelayRestartStrategy$1.run(FixedDelayRestartStrategy.java:68)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
04/29/2019 10:10:20     Source: source -> (Filter, Timestamps/Watermarks -> Filter)(1/12) switched to CANCELED
04/29/2019 10:10:20     Source: source -> (Filter, Timestamps/Watermarks -> Filter)(2/12) switched to CANCELED
04/29/2019 10:10:20     Source: source -> (Filter, Timestamps/Watermarks -> Filter)(3/12) switched to CANCELED
04/29/2019 10:10:20     Source: source -> (Filter, Timestamps/Watermarks -> Filter)(4/12) switched to CANCELED
04/29/2019 10:10:20     Source: source -> (Filter, Timestamps/Watermarks -> Filter)(5/12) switched to CANCELED
04/29/2019 10:10:20     Source: source -> (Filter, Timestamps/Watermarks -> Filter)(6/12) switched to CANCELED
04/29/2019 10:10:20     Source: source -> (Filter, Timestamps/Watermarks -> Filter)(7/12) switched to CANCELED
04/29/2019 10:10:20     Source: source -> (Filter, Timestamps/Watermarks -> Filter)(8/12) switched to CANCELED
04/29/2019 10:10:20     Source: source -> (Filter, Timestamps/Watermarks -> Filter)(9/12) switched to CANCELED
04/29/2019 10:10:20     Source: source -> (Filter, Timestamps/Watermarks -> Filter)(10/12) switched to CANCELED
04/29/2019 10:10:20     Source: source -> (Filter, Timestamps/Watermarks -> Filter)(11/12) switched to CANCELED
04/29/2019 10:10:20     Source: source -> (Filter, Timestamps/Watermarks -> Filter)(12/12) switched to CANCELED
04/29/2019 10:10:20     counter(1/12) switched to CANCELED
04/29/2019 10:10:20     counter(2/12) switched to CANCELED
04/29/2019 10:10:20     counter(3/12) switched to CANCELED
04/29/2019 10:10:20     counter(4/12) switched to CANCELED
04/29/2019 10:10:20     counter(5/12) switched to CANCELED
04/29/2019 10:10:20     counter(6/12) switched to CANCELED
04/29/2019 10:10:20     counter(7/12) switched to CANCELED
04/29/2019 10:10:20     counter(8/12) switched to CANCELED
04/29/2019 10:10:20     counter(9/12) switched to CANCELED
04/29/2019 10:10:20     counter(10/12) switched to CANCELED
04/29/2019 10:10:20     counter(11/12) switched to CANCELED
04/29/2019 10:10:20     counter(12/12) switched to CANCELED
04/29/2019 10:10:20     Sink: counter-kafka(1/1) switched to CANCELED
04/29/2019 10:10:20     timer1(1/12) switched to CANCELED
04/29/2019 10:10:20     timer1(2/12) switched to CANCELED
04/29/2019 10:10:20     timer1(3/12) switched to CANCELED
04/29/2019 10:10:20     timer1(4/12) switched to CANCELED
04/29/2019 10:10:20     timer1(5/12) switched to CANCELED
04/29/2019 10:10:20     timer1(6/12) switched to CANCELED
04/29/2019 10:10:20     timer1(7/12) switched to CANCELED
04/29/2019 10:10:20     timer1(8/12) switched to CANCELED
04/29/2019 10:10:20     timer1(9/12) switched to CANCELED
04/29/2019 10:10:20     timer1(10/12) switched to CANCELED
04/29/2019 10:10:20     timer1(11/12) switched to CANCELED
04/29/2019 10:10:20     timer1(12/12) switched to CANCELED
04/29/2019 10:10:20     Sink: time-kafka(1/1) switched to CANCELED
04/29/2019 10:10:20     Job execution switched to status FAILED.
2019-04-29 10:10:20,666 INFO  org.apache.flink.yarn.YarnClusterClient                       - Sending shutdown request to the Application Master
2019-04-29 10:10:20,666 INFO  org.apache.flink.yarn.YarnClusterClient                       - Start application client.
2019-04-29 10:10:20,859 INFO  org.apache.flink.yarn.ApplicationClient                       - Notification about new leader address akka.tcp://[email protected]:36513/user/jobmanager with session ID 00000000-0000-0000-0000-000000000000.
2019-04-29 10:10:20,868 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.
2019-04-29 10:10:20,869 INFO  org.apache.flink.yarn.ApplicationClient                       - Received address of new leader akka.tcp://[email protected]:36513/user/jobmanager with session ID 00000000-0000-0000-0000-000000000000.
2019-04-29 10:10:20,869 INFO  org.apache.flink.yarn.ApplicationClient                       - Disconnect from JobManager null.
2019-04-29 10:10:20,872 INFO  org.apache.flink.yarn.ApplicationClient                       - Trying to register at JobManager akka.tcp://[email protected]:36513/user/jobmanager.
2019-04-29 10:10:20,878 INFO  org.apache.flink.yarn.ApplicationClient                       - Successfully registered at the ResourceManager using JobManager Actor[akka.tcp://[email protected]:36513/user/jobmanager#-153942343]
2019-04-29 10:10:21,888 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.
2019-04-29 10:10:23,747 INFO  org.apache.flink.yarn.YarnClusterClient                       - Application application_1556227576661_0231 finished with state FINISHED and final stateSUCCEEDED at 1556503821989
2019-04-29 10:10:23,747 INFO  org.apache.flink.yarn.YarnClusterClient                       - YARN Client is shutting down
2019-04-29 10:10:23,911 INFO  org.apache.flink.yarn.ApplicationClient                       - Stopped Application client.
2019-04-29 10:10:23,911 INFO  org.apache.flink.yarn.ApplicationClient                       - Disconnect from JobManager Actor[akka.tcp://[email protected]:36513/user/jobmanager#-153942343].
2019-04-29 10:10:25.282 [main] ERROR c.a.e.f.a.j.l.impl.CommonShellJobLauncherImpl - [FNI-09F180DD19111D0F_0] Failed to execute command, exit code=1
2019-04-29 10:10:25.296 [main] INFO  c.a.e.f.a.j.l.impl.CommonShellJobLauncherImpl - [FNI-09F180DD19111D0F_0] Finished command line, exit code=1.
Mon Apr 29 10:10:25 CST 2019 [JobLauncherRunner] INFO Closing job launcher ...
2019-04-29 10:10:25.298 [main] INFO  c.a.emr.flow.agent.jobs.launcher.JobLauncherBase - [FNI-09F180DD19111D0F_0] Closing ...
2019-04-29 10:10:25.298 [main] INFO  c.a.e.f.a.j.l.impl.CommonShellJobLauncherImpl - [FNI-09F180DD19111D0F_0] Stopping command executor ...
Mon Apr 29 10:10:25 CST 2019 [YarnJobLauncherAM] INFO Closing launcher am ...
Mon Apr 29 10:10:25 CST 2019 [YarnJobLauncherAM] INFO Emr flow launcher is quit.
Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1672)
        at com.aliyun.emr.flow.agent.jobs.launcher.yarn.YarnJobLauncherAM.doMain(YarnJobLauncherAM.java:72)
        at com.aliyun.emr.flow.agent.jobs.launcher.yarn.YarnJobLauncherAM.main(YarnJobLauncherAM.java:137)
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at com.aliyun.emr.flow.agent.jobs.launcher.JobLauncherRunner.run(JobLauncherRunner.java:59)
        at com.aliyun.emr.flow.agent.jobs.launcher.yarn.YarnJobLauncherAM.launchJob(YarnJobLauncherAM.java:104)
        at com.aliyun.emr.flow.agent.jobs.launcher.yarn.YarnJobLauncherAM.access$000(YarnJobLauncherAM.java:32)
        at com.aliyun.emr.flow.agent.jobs.launcher.yarn.YarnJobLauncherAM$1.run(YarnJobLauncherAM.java:75)
        at com.aliyun.emr.flow.agent.jobs.launcher.yarn.YarnJobLauncherAM$1.run(YarnJobLauncherAM.java:72)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
        ... 2 more
Caused by: com.aliyun.emr.flow.agent.common.exceptions.EmrFlowRuntimeException: ###[E10012,JOB]:  Execute job FNI-09F180DD19111D0F_0 failed, exit code: 1, message: .
        at com.aliyun.emr.flow.agent.common.utils.Throwables.propagate(Throwables.java:68)
        at com.aliyun.emr.flow.agent.jobs.launcher.impl.CommonShellJobLauncherImpl.doLaunch(CommonShellJobLauncherImpl.java:221)
        at com.aliyun.emr.flow.agent.jobs.launcher.impl.CommonShellJobLauncherImpl.launch(CommonShellJobLauncherImpl.java:207)
        ... 14 more
2019-04-29 10:10:25.613 [Shutdown-FNI-09F180DD19111D0F_0] INFO  c.a.emr.flow.agent.jobs.launcher.JobLauncherBase - [FNI-09F180DD19111D0F_0] Call shutdown hook.
2019-04-29 10:10:25.614 [Shutdown-FNI-09F180DD19111D0F_0] INFO  c.a.emr.flow.agent.jobs.launcher.JobLauncherBase - [FNI-09F180DD19111D0F_0] Closing ...
2019-04-29 10:10:25.614 [Shutdown-FNI-09F180DD19111D0F_0] INFO  c.a.emr.flow.agent.jobs.launcher.JobLauncherBase - [FNI-09F180DD19111D0F_0] This launcher is closed already, skip.

Flink参数设置slot数量增加,Flink无法启动的bug

/2019 14:07:03     counter(62/96) switched to FAILED
java.io.IOException: Insufficient number of network buffers: required 96, but only 25 available. The total number of network buffers is currently set to 2048 of 32768 bytes each. You can increase this number by setting the configuration keys 'taskmanager.network.memory.fraction', 'taskmanager.network.memory.min', and 'taskmanager.network.memory.max'.
        at org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.createBufferPool(NetworkBufferPool.java:257)
        at org.apache.flink.runtime.io.network.NetworkEnvironment.registerTask(NetworkEnvironment.java:235)
        at org.apache.flink.runtime.taskmanager.Task.run(Task.java:618)
        at java.lang.Thread.run(Thread.java:748)

04/29/2019 14:07:03     counter(63/96) switched to FAILED
java.io.IOException: Insufficient number of network buffers: required 96, but only 26 available. The total number of network buffers is currently set to 2048 of 32768 bytes each. You can increase this number by setting the configuration keys 'taskmanager.network.memory.fraction', 'taskmanager.network.memory.min', and 'taskmanager.network.memory.max'.
        at org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.createBufferPool(NetworkBufferPool.java:257)
        at org.apache.flink.runtime.io.network.NetworkEnvironment.registerTask(NetworkEnvironment.java:235)
        at org.apache.flink.runtime.taskmanager.Task.run(Task.java:618)
        at java.lang.Thread.run(Thread.java:748)

04/29/2019 14:07:03     timer1(57/96) switched to FAILED
java.io.IOException: Insufficient number of network buffers: required 96, but only 26 available. The total number of network buffers is currently set to 2048 of 32768 bytes each. You can increase this number by setting the configuration keys 'taskmanager.network.memory.fraction', 'taskmanager.network.memory.min', and 'taskmanager.network.memory.max'.
        at org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.createBufferPool(NetworkBufferPool.java:257)
        at org.apache.flink.runtime.io.network.NetworkEnvironment.registerTask(NetworkEnvironment.java:235)
        at org.apache.flink.runtime.taskmanager.Task.run(Task.java:618)
        at java.lang.Thread.run(Thread.java:748)

04/29/2019 14:07:03     Job execution switched to status FAILING.
java.io.IOException: Insufficient number of network buffers: required 96, but only 25 available. The total number of network buffers is currently set to 2048 of 32768 bytes each. You can increase this number by setting the configuration keys 'taskmanager.network.memory.fraction', 'taskmanager.network.memory.min', and 'taskmanager.network.memory.max'.
        at org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.createBufferPool(NetworkBufferPool.java:257)
        at org.apache.flink.runtime.io.network.NetworkEnvironment.registerTask(NetworkEnvironment.java:235)
        at org.apache.flink.runtime.taskmanager.Task.run(Task.java:618)
        at java.lang.Thread.run(Thread.java:748)```
解决:调整Flink里面flink-conf.yaml里面的新增参数增加可支持的slot数量

taskmanager.network.memory.fraction: 0.1
taskmanager.network.memory.min: 268435456
taskmanager.network.memory.max: 4294967296

你可能感兴趣的:(Flink,Flink)