spark_10种不同算子WordCount案例

1、aggregate
val rdd: RDD[String] = sc.makeRDD(List(“hello Spark”, “hello Scala”, “hello hadoop”), 2)
//hello Spark hello Scala hello hadoop
val flatMapRDD: RDD[String] = rdd.flatMap(_.split(" "))
flatMapRDD.aggregate(mutable.MapString,Int)(
(map ,s) => {
map(s) = map.getOrElse(s, 0) + 1
map
},
(map1,map2) => {
map1.foldLeft(map2)(
( innerMap, kv) => {
innerMap(kv._1) = innerMap.getOrElse(kv._1, 0) + kv._2
innerMap
}
)
}
).foreach(println)

2、fold
flatMapRDD.map(s => mutable.Map(s->1)).fold(mutable.MapString,Int)(
(map1,map2) => {
map1.foldLeft(map2)(
( innerMap, kv) => {
innerMap(kv._1) = innerMap.getOrElse(kv._1, 0) + kv._2
innerMap
}
)
}
).foreach(println)

3、其他简单自己来吧

你可能感兴趣的:(spark_10种不同算子WordCount案例)