Spark RDD中map与flatMap

为方便理解两个算子action之后的值,首先看个案例:

import org.apache.spark.{SparkConf, SparkContext}

object MapAndFlatMap {
  def main(args: Array[String]): Unit = {
    val sc = new SparkContext(new SparkConf().setAppName("map_flatMap_demo").setMaster("local"))
    val arrayRDD =sc.parallelize(Array("a_b","c_d","e_f"))
    arrayRDD.foreach(println) //打印结果1

    arrayRDD.map(string=>{
      string.split("_")
    }).foreach(x=>{
      println(x.mkString(",")) //打印结果2
    })

    arrayRDD.flatMap(string=>{
      string.split("_")
    }).foreach(x=>{
      println(x.mkString(","))//打印结果3
    })
  }
}

打印结果1:

a_b
c_d
e_f

打印结果2:

a,b
c,d
e,f

打印结果3:

a
b
c
d
e
f

结论:

map函数后,RDD的值为 Array(Array("a","b"),Array("c","d"),Array("e","f"))

flatMap函数处理后,RDD的值为 Array("a","b","c","d","e","f")

即最终可以认为,flatMap会将其返回的数组全部拆散,然后合成到一个数组中

 

 

你可能感兴趣的:(Spark)