sparksql广播设置

spark sql broadcast join 配置:
–conf spark.sql.autoBroadcastJoinThreshold=31457280 \

一个比较不错的介绍广播的博文:
https://blog.csdn.net/lsshlsw/article/details/48662669
https://jaceklaskowski.gitbooks.io/mastering-spark-sql/spark-sql-joins-broadcast.html

// Force BroadcastHashJoin using SQL’s BROADCAST hint
// Supported hints: BROADCAST, BROADCASTJOIN or MAPJOIN
val qBroadcastLeft = “”"
SELECT /*+ BROADCAST (lf) */ *
FROM range(100) lf, range(1000) rt
WHERE lf.id = rt.id
“”"
scala> sql(qBroadcastLeft).explain
== Physical Plan ==
*BroadcastHashJoin [id#34L], [id#35L], Inner, BuildRight
:- *Range (0, 100, step=1, splits=8)
± BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, false]))
± *Range (0, 1000, step=1, splits=8)

val qBroadcastRight = “”"
SELECT /*+ MAPJOIN (rt) */ *
FROM range(100) lf, range(1000) rt
WHERE lf.id = rt.id
“”"
scala> sql(qBroadcastRight).explain
== Physical Plan ==
*BroadcastHashJoin [id#42L], [id#43L], Inner, BuildRight
:- *Range (0, 100, step=1, splits=8)
± BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, false]))
± *Range (0, 1000, step=1, splits=8)

你可能感兴趣的:(基础知识)