SparkR读取CSV格式文件错误java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.spark.u

使用如下命令启动sparkR shell:

bin/sparkR --packages com.databricks:spark-csv_2.10:1.0.3

之后读入csv文件:

flights <- read.df(sqlContext, "/sparktest/nycflights13.csv", "com.databricks.spark.csv", header="true")

head(flights)

报错:

16/04/07 23:06:46 ERROR CsvRelation$: Exception while parsing line: 2013,1,1,914,-6,1244,4,"AA","N517AA",1589,"EWR","DFW",238,1372,9,14. 
java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.spark.unsafe.types.UTF8String
        at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46)
        at org.apache.spark.sql.catalyst.expressions.GenericMutableRow.getUTF8String(rows.scala:248)
        at org.apache.spark.sql.catalyst.expressions.BoundReference.eval(BoundAttribute.scala:49)
        at org.apache.spark.sql.catalyst.expressions.UnaryExpression.eval(Expression.scala:295)
        at org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(Projection.scala:84)
        at org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(Projection.scala:60)
        at com.databricks.spark.csv.CsvRelation$$anonfun$com$databricks$spark$csv$CsvRelation$$parseCSV$1.apply(CsvRelation.scala:150)
        at com.databricks.spark.csv.CsvRelation$$anonfun$com$databricks$spark$csv$CsvRelation$$parseCSV$1.apply(CsvRelation.scala:130)
        at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
        at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:308)
        at scala.collection.Iterator$class.foreach(Iterator.scala:727)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
        at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
        at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
        at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
        at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
        at scala.collection.AbstractIterator.to(Iterator.scala:1157)
        at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
        at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157


错误原因:

读取csv格式文件时加载的包错误:com.databricks:spark-csv_2.10:1.0.3

解决方法:

修改sparkR shell启动命令:

bin/sparkR --packages com.databricks:spark-csv_2.10:1.3.0

SparkR读取CSV格式文件错误java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.spark.u_第1张图片

SparkR读取CSV格式文件错误java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.spark.u_第2张图片

你可能感兴趣的:(Spark)