Spark常见异常: java.util.concurrent.TimeoutException: Futures timed out

执行spark on yarn任务时报错:

Caused by : java.util.concurrent.TimeoutException: Futures timed out after 60s

参考此网站https://stackoverflow.com/questions/41123846/why-does-join-fail-with-java-util-concurrent-timeoutexception-futures-timed-ou

This happens because Spark tries to do Broadcast Hash Join and one of the DataFrames is very large, so sending it consumes much time.
You can:
Set higher spark.sql.broadcastTimeout to increase timeout - spark.conf.set(“spark.sql.broadcastTimeout”, newValueForExample36000)
persist() both DataFrames, then Spark will use Shuffle Join

所以可以:

  1. 增大spark.sql.broadcastTimeout的值;
  2. 持久化两个DataFrames;

另外,还可以考虑将BroadcastJoin禁用掉,以及增加spark.driver.memory的值。

In addition to increasing spark.sql.broadcastTimeout or persist() both DataFrames,
You may try:
1.disable broadcast by setting spark.sql.autoBroadcastJoinThreshold to -1
2.increase the spark driver memory by setting spark.driver.memory to a higher value.

你可能感兴趣的:(Spark)