(转载)Scala 报错:CSV data source does not support struct type:tinyint,size:int,indices:array int

运行Spark报错:Exception in thread “main” java.lang.UnsupportedOperationException: CSV data source does not support struct data type. at

查看多篇博客发现:因为是DenseVector不可以直接报保存到csv文件

  • 可以有下面两种解决方法: (都是将数据列转化为String)
  1. 利用UDF函数
import org.apache.spark.sql.functions.udf

val stringify = udf((vs: Seq[String]) => s"""${vs.mkString(",")}""")
    df.withColumn("cloumnA", stringify($"cloumnA))
      .withColumn("cloumnB", stringify($"cloumnB"))
      .write.csv("xxxxx")
  1. 直接转化
case class Asso(antecedent: String, consequent: String, confidence: String)

df.rdd.map { line => Asso(line(0).toString, line(1).toString, line(2).toString) }.
      toDF().write.csv("xxxx")

参考:
https://jimolonely.github.io/2018/01/03/spark/02-write-csv/
https://cloud.tencent.com/developer/article/1531999

你可能感兴趣的:(Scala,DateFrame,Spark)