Scala 将CSV文件转为RDD

How do I convert csv file to rdd

假设csv文件是这种格式:

user, topic, hits
om,  scala, 120
daniel, spark, 80
3754978, spark, 1

我们可以使用第一行来定义一个header class:

class SimpleCSVHeader(header:Array[String]) 
extends Serializable{
    val index = header.zipWithIndex.toMap
    def apply(array:Array[String], 
    key:String)
    :String =array(index(key))
}

然后我们可以利用这个header class 来得到数据:

val csv = sc.textFile("file.csv") // original file 
val data = csv.map(line => line.split(",").map(elem=>elem.trim)) //lines ion rows
val header = new SimpleCSVHeader(data.take(1)(0))
// 取出第一行来创建header
val rows = data.filter(line => header(line,"user") != "user") // 去掉header
val users = rows.map(row => header(row,"user"))
val usersByHits = rows.map(row => header(row,"user") -> header(row,"hits").toInt)

你可能感兴趣的:(scala,spark)