用sparkRDD进行分组排序使用groupbykey+ flatmap + zipWithIndex

val conf = new SparkConf().setAppName(“name”).setMaster(“local[2]”)
val context = new SparkContext(conf)
//context.makeRDD(List[])
val ssh = List((“ma”,3),(“ma”,4),(“ma”,5),(“mb”,2),(“mb”,5))
val unit: RDD[(String, Int)] = context.parallelize(ssh)
val result: RDD[(String, Int, Int)] = unit.groupByKey().flatMap(gg => {
gg._2.toList.sorted.toList.zipWithIndex.map { case (marks, index) => (gg._1, marks, index + 1)
}
})
result.foreach(print)
(ma,3,1)
(mb,2,1)
(ma,4,2)
(mb,5,2)
(ma,5,3)

你可能感兴趣的:(spark,RDD,ZipWithIndex)