Spark常见异常: java.util.concurrent.TimeoutException: Futures timed out

执行spark on yarn任务时报错:

Caused by : java.util.concurrent.TimeoutException: Futures timed out after 60s


This happens because Spark tries to do Broadcast Hash Join and one of the DataFrames is very large, so sending it consumes much time.
You can:
Set higher spark.sql.broadcastTimeout to increase timeout - spark.conf.set(“spark.sql.broadcastTimeout”, newValueForExample36000)
persist() both DataFrames, then Spark will use Shuffle Join


  1. 增大spark.sql.broadcastTimeout的值;
  2. 持久化两个DataFrames;


In addition to increasing spark.sql.broadcastTimeout or persist() both DataFrames,
You may try:
1.disable broadcast by setting spark.sql.autoBroadcastJoinThreshold to -1
2.increase the spark driver memory by setting spark.driver.memory to a higher value.
