Spark SQL pitfall

1. SQL需要指定类型

spark.sql(
  """
    |select null as id
  """.stripMargin).createOrReplaceTempView("a")

spark.sql(
  """
    |select 'a' id
  """.stripMargin).createOrReplaceTempView("b")

spark.sql(
  """
    |select a.id, b.id
    |from a
    |full outer join b
    |on a.id = b.id
  """.stripMargin)
Exception in thread "main" org.apache.spark.sql.AnalysisException: Detected cartesian product for FULL OUTER join between logical plans
Project [null AS device#0]
+- OneRowRelation$
and
Project [a AS device#4]
+- OneRowRelation$
Join condition is missing or trivial.
Use the CROSS JOIN syntax to allow cartesian products between these relations.;

由于a表的null没有指定类型,spark在判断是不是笛卡尔积的时候出了问题,可以使用一下方法解决

spark.sql(
  """
    |select cast(null as string) as id
  """.stripMargin).createOrReplaceTempView("a")

你可能感兴趣的:(Spark SQL pitfall)