Spark中RDD的sortBy排序的5种实现方法

在RDD,ortBy可以指定对键还是value进行排序,sortBy可以通过下面5中方式实现排序
假如数据的格式如下,list中元素中分别为名称、单价、数量,字符之间用空格连接,要实现按照单价和数量降序

val products = sc.parallelize(List("A 100 10","B 200 20","C 200 30","D 400 30"))

1.通过Tuple方式,按照数组的元素进行排序,代码如下

products.map(x => {
      val splits  = x.split(" ")
      val name = splits(0)
      val price = splits(1).toDouble
      val amount = splits(2).toInt
      (name,price,amount)
    }).sortBy(x =>(-x._2,-x._3)).foreach(println)

2. 自定义一个class,继承Ordered 并混入序列化,实现排序方法

class Products(val name :String, val price:Double, val amount:Int) extends Ordered[Products] with Serializable {
  override def compare(that: Products) = {
    that.amount-that.amount
  }

  override def toString(): String = name + "\t" + price + "\t" + amount
}

在排序的时候,代码如下

 products.map(x => {
      val splits  = x.split(" ")
      val name = splits(0)
      val price = splits(1).toDouble
      val amount = splits(2).toInt

      new Products(name,price ,amount)
    }).sortBy(x => x).foreach(println)

3. 使用case class实现排序,推荐使用
case默认实现序列化,并重写toString、equal、hashcode方法,class代码如下

case class Products2(name: String, price : Double, amount: Int) extends Ordered[Products2]{
  override def compare(that: Products2) = {
    this.amount - that.amount
  }
}

在排序时,代码如下

 products.map(x => {
      val splits  = x.split(" ")
      val name = splits(0)
      val price = splits(1).toDouble
      val amount = splits(2).toInt

      Products2(name,price ,amount)
    }).sortBy(x => x).foreach(println)

4.使用隐式转换,实现带有排序功能的class,代码如下
 

    products.map(x => {
      val splits  = x.split(" ")
      val name = splits(0)
      val price = splits(1).toDouble
      val amount = splits(2).toInt

      new Products3(name,price ,amount)
    }).sortBy(x => x).foreach(println)


    implicit  def products3ToOrdered(products3:Products3) :Ordered[Products3] = new Ordered[Products3] {
      override def compare(that: Products3): Int = {
        products3.amount - that.amount
      }
    }

5.使用Ordering on 实现排序,这种方式比较优雅,但是不推荐使用
先定义class类

class Products3(val name: String, val price : Double, val amount : Int) extends Serializable {
  override def toString(): String = name + "\t" + price + "\t" + amount
}

在排序时,代码如下

// Ording on 语法
val product4 =  products.map(x => {
  val splits  = x.split(" ")
  val name = splits(0)
  val price = splits(1).toDouble
  val amount = splits(2).toInt
  (name,price,amount)
})

implicit  var ord = Ordering[(Double,Int)].on[((String, Double, Int))](x =>(-x._2,-x._3))
product4.sortBy(x=>x).foreach(println)

 

你可能感兴趣的:(Spark)