Spark 求平均成绩

数据:

zhangsan math 88
zhangsan china 78
zhangsan english 80
lisi math 99
lisi china 89
lisi english 82
wangwu math 66
wangwu china 96
wangwu english 84
zhaoliu math 77
zhaoliu china 67
zhaoliu english 86.55

代码:

import org.apache.spark.{SparkConf, SparkContext}

object Avg extends App {

  System.setProperty("hadoop.home.dir","d://soft/hadoop/hadoop-2.9.2")
  val conf=new SparkConf().setMaster("local[*]").setAppName("avg")
  var sc=new SparkContext(conf)
  val lineRdd=sc.textFile(args(0))
  val pairRdd=lineRdd.map( line => (line.split(" ")(1),line.split(" ")(2)))
  val groupByRdd=pairRdd.groupByKey()

  val resultRdd=groupByRdd.map(tuple=>{
    var sum=0.0
    val num=tuple._2.size
    for(score<- tuple._2){
      sum+=score.toDouble
    }
    val avg=sum/num
    val formatAvg=f"$avg%.2f"
    (tuple._1,formatAvg)

  }).coalesce(1)

  resultRdd.saveAsTextFile(args(1))

  sc.stop()
  
}

结果:

(math,82.50)
(english,83.14)
(china,82.50)

你可能感兴趣的:(Spark)