spark RDD基础装换操作--sortBy操作

18.sortBy操作

将词频统计的结果按照当出现的次数进行倒序排列。
scala>  val rddData1 = sc.parallelize(Array(("dog",3),("cat",1),("hadoop",2),("spark",3),("apple",2)))
rddData1: org.apache.spark.rdd.RDD[(String, Int)] = ParallelCollectionRDD[0] at parallelize at :24

scala> val rddData2 = rddData1.sortBy(_._2,false)
rddData2: org.apache.spark.rdd.RDD[(String, Int)] = MapPartitionsRDD[3] at sortBy at :26

scala>  rddData2.collect
res0: Array[(String, Int)] = Array((dog,3), (spark,3), (hadoop,2), (apple,2), (cat,1))

说明:
sortBy操作会进行排序操作,默认是正序排序,使用false参数就是倒序排序。

你可能感兴趣的:(spark)