ERROR: Timeout on the Spark engine during the broadcast join

执行 spark 查询的时候报错如下

When the Spark engine runs applications and broadcast join is enabled, the Spark driver broadcasts the cache to the Spark executors running on data nodes in the Hadoop cluster. If you enable broadcast join, applications might fail with an error similar to the following error: java.util.concurrent.TimeoutException: Futures timed out after [300 seconds] at scala.concurrent.impl.Promise D e f a u l t P r o m i s e . r e a d y ( P r o m i s e . s c a l a : 219 ) a t s c a l a . c o n c u r r e n t . i m p l . P r o m i s e DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise DefaultPromise.ready(Promise.scala:219)atscala.concurrent.impl.PromiseDefaultPromise.result(Promise.scala:223) at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:201) at org.apache.spark.sql.execution.exchange.+Broadcast+ExchangeExec.doExecuteBroadcast(BroadcastExchangeExec.scala:123)

解决方案

1)将广播关闭

set spark.sql.autoBroadcastJoinThreshold=-1​
2)增加广播的超时时间,默认是300s

set spark.sql.broadcastTimeout=2000
3)设置任务执行的尝试次数

set spark.yarn.maxAppAttempts=2

你可能感兴趣的:(liunx,spark,scala,大数据)