Futures timed out after [10 seconds]. This timeout is controlled by spark.executor.heartbeatInterva

问题描述:提交spark-submit时,在ssh终端提交没有问题,但是在代码中ssh登陆后,再提交命令就出现以下问题了,开始怀疑是用户问题,但是如果是用户问题,那么我在ssh终端同一个用户执行又执行正确,故排除了此情况。第二感觉是环境变量,我在.sh添加环境后,还是报错。

sh文件

#!/bin/bash
#调用java 程序需要添加上,否则直接跳出shell
#export JAVA_HOME=/usr/java/jdk1.8.0_201
export JAVA_HOME=/usr/java/jdk1.8
export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$PATH:$JAVA_HOME/bin

arg1=$1
arg2=$2
arg3=$3
arg4=$4

spark-submit \
 --master yarn \
 --deploy-mode client \
 --class Main  \
 --executor-cores 6 \
 --executor-memory 10G \
 --num-executors 8 \
 --conf spark.driver.cores=6 \
 --conf spark.driver.maxResultSize=0 \
 --conf spark.driver.memony=5g \
 --jars hdfs://bdpcluster/business/oozie/lib/*.jar,hdfs://bdpcluster/business/oozie/hub/lib/* \
   hdfs://bdpcluster/business/oozie/dhe-dipc.jar \
  "$arg1" "$arg2" "$arg3" "$arg4"

错误信息

org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [10 seconds]. This timeout is controlled by spark.executor.heartbeatInterval
	at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:47)
	at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:62)
	at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:58)
	at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
	at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76)
	at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:92)
	at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$reportHeartBeat(Executor.scala:785)
	at org.apache.spark.executor.Executor$$anon$2$$anonfun$run$1.apply$mcV$sp(Executor.scala:814)
	at org.apache.spark.executor.Executor$$anon$2$$anonfun$run$1.apply(Executor.scala:814)
	at org.apache.spark.executor.Executor$$anon$2$$anonfun$run$1.apply(Executor.scala:814)
	at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1992)
	at org.apache.spark.executor.Executor$$anon$2.run(Executor.scala:814)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [10 seconds]
	at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
	at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
	at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:201)
	at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
	... 14 more

19/05/31 11:17:01 ERROR ApplicationMaster: Uncaught exception:
org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [120 seconds]. This timeout is controlled by spark.rpc.askTimeout
at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:47)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:62)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:58)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76)
at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:92)
at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:76)
at org.apache.spark.deploy.yarn.YarnAllocator.(YarnAllocator.scala:103)
at org.apache.spark.deploy.yarn.YarnRMClient.register(YarnRMClient.scala:78)
at org.apache.spark.deploy.yarn.ApplicationMaster.registerAM(ApplicationMaster.scala:462)
at org.apache.spark.deploy.yarn.ApplicationMaster.runExecutorLauncher(ApplicationMaster.scala:534)
at org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:347)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply$mcV$sp(ApplicationMaster.scala:260)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply(ApplicationMaster.scala:260)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply(ApplicationMaster.scala:260)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$5.run(ApplicationMaster.scala:815)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:814)
at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:259)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:839)
at org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:869)
at org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala)
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [120 seconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:201)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
... 19 more

ssh终端提交:

/home/xx.sh “2019-05-01 00:00:00” “2019-05-31 00:00:00” “week”
“28,29,30,33,43,74,77”

执行正常,spark没有报错的的task

java代码提交关键代码:
SshUtil rec = new SshUtil(DATAX_HOST_LIST.get(0), DATAX_USER, DATAX_PASSWORD, DATAX_PORT);

        String 
        script = "source /etc/profile;/home/x.sh \"2019-05-01 00:00:00\" \"2019-06-01 00:00:00\" \"month\" \"28,29,30,33,43,74,77\"";

解决方案:
java代码提交命令更改为nohub,即

 	script = "source /etc/profile;nohup /home/x.sh \"2019-05-01 00:00:00\" \"2019-06-01 00:00:00\" \"month\" \"28,29,30,33,43,74,77\" &";

 	打印标准和输出日志:
 		script = "source /etc/profile;nohup /home/x.sh \"2019-05-01 00:00:00\" \"2019-06-01 00:00:00\" \"month\" \"28,29,30,33,43,74,77\" >>/home/dev/debug.log 2>&1 &";

你可能感兴趣的:(经验分享,hadoop)