Flink1.8开发过程遇到的问题

最近在开发flink1.8过程中,提交任务到yarn遇到了一些问题,特将问题及解决办法整理出来。

一、Caused by: java.lang.ClassCastException: org.apache.hadoop.yarn.proto.YarnServiceProtos$RegisterApplicationMasterRequestProto cannot be cast to com.google.protobuf.Message

2019-07-24 02:12:21.611 [flink-akka.actor.default-dispatcher-5] ERROR org.apache.flink.runtime.entrypoint.ClusterEntrypoint  - Fatal error occurred in the cluster entrypoint.
org.apache.flink.runtime.resourcemanager.exceptions.ResourceManagerException: Could not start the ResourceManager akka.tcp://flink@local-vm-229:47467/user/resourcemanager
        at org.apache.flink.runtime.resourcemanager.ResourceManager.onStart(ResourceManager.java:209)
        at org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StoppedState.start(AkkaRpcActor.java:539)
        at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleControlMessage(AkkaRpcActor.java:164)
        at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(AkkaRpcActor.java:142)
        at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.onReceive(FencedAkkaRpcActor.java:40)
        at akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165)
        at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
        at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
        at akka.actor.ActorCell.invoke(ActorCell.scala:495)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
        at akka.dispatch.Mailbox.run(Mailbox.scala:224)
        at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: org.apache.flink.runtime.resourcemanager.exceptions.ResourceManagerException: Could not start resource manager client.
        at org.apache.flink.yarn.YarnResourceManager.initialize(YarnResourceManager.java:250)
        at org.apache.flink.runtime.resourcemanager.ResourceManager.startResourceManagerServices(ResourceManager.java:219)
        at org.apache.flink.runtime.resourcemanager.ResourceManager.onStart(ResourceManager.java:207)
        ... 16 common frames omitted
Caused by: java.lang.ClassCastException: org.apache.hadoop.yarn.proto.YarnServiceProtos$RegisterApplicationMasterRequestProto cannot be cast to com.google.protobuf.Message
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:224)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
        at com.sun.proxy.$Proxy16.registerApplicationMaster(Unknown Source)
        at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:107)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
        at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
        at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
        at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
        at com.sun.proxy.$Proxy17.registerApplicationMaster(Unknown Source)
        at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.registerApplicationMaster(AMRMClientImpl.java:223)
        at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.registerApplicationMaster(AMRMClientImpl.java:214)
        at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl.registerApplicationMaster(AMRMClientAsyncImpl.java:139)
        at org.apache.flink.yarn.YarnResourceManager.createAndStartResourceManagerClient(YarnResourceManager.java:216)
        at org.apache.flink.yarn.YarnResourceManager.initialize(YarnResourceManager.java:245)
        ... 18 common frames omitted

原因分析:

这种问题一般是由于自己工程的hadoop的jar包和flink集群的jar包冲突导致的。

解决办法:

剔除自己工程中的hadoop相关的jar,打包的时候不要打进来。



    org.apache.hadoop
    hadoop-common
    ${hadoop.version}
    provided


    org.apache.hadoop
    hadoop-hdfs
    ${hadoop.version}
    provided
    
        
            xml-apis
            xml-apis
        
    

二、Could not build the program from JAR file.

[root@master bin]# ./flink run -m yarn-cluster -yn 1 -p 2 -yjm 1024 -ytm 1024 -ynm FlinkOnYarnSession-MemberLogInfoProducer -d -c com.igg.flink.tool.member.rabbitmq.producer.MqMemberProducer /home/test_gjm/igg-flink-tool/igg-flink-tool-1.0.0-SNAPSHOT.jar
Could not build the program from JAR file.

Use the help option (-h or --help) to get help on the command.

原因分析:

flink1.8没有包括hadoop相关的包。

解决办法:

官网下载flink-shaded-hadoop-2-uber-2.8.3-7.0.jar,放到flink_home/lib目录下

三、Deployment took more than 60 seconds. Please check if the requested resources are available in the YARN cluster

2019-07-25 15:51:59,717 INFO  org.apache.flink.yarn.AbstractYarnClusterDescriptor           - Deployment took more than 60 seconds. Please check if the requested resources are available in the YARN cluster
2019-07-25 15:51:59,969 INFO  org.apache.flink.yarn.AbstractYarnClusterDescriptor           - Deployment took more than 60 seconds. Please check if the requested resources are available in the YARN cluster
2019-07-25 15:52:00,220 INFO  org.apache.flink.yarn.AbstractYarnClusterDescriptor           - Deployment took more than 60 seconds. Please check if the requested resources are available in the YARN cluster
2019-07-25 15:52:00,472 INFO  org.apache.flink.yarn.AbstractYarnClusterDescriptor           - Deployment took more than 60 seconds. Please check if the requested resources are available in the YARN cluster

原因分析:

yarn没有足够的资源进行分配,如内存,vcores等。

通过yarn ui界面,可以看到可用的vcores=3,有4个节点处于不健康的状态。点击"4",查看不健康的节点。

通过查看可知,磁盘空间不足。

解决办法:

清理对应的磁盘空间即可。

 

如果有写的不对的地方,欢迎大家指正。有什么疑问,欢迎加QQ群:176098255

 

 

你可能感兴趣的:(Flink)