3 RDDs基本操作之Transformations

1 Transformations介绍

Transformations(转换)
从之前的RDD构建一个新的RDD,像map()和filter()

map()

map()接收函数,把函数应用到RDD的每一个元素,返回新RDD

val lines=sc.parallelize(Array("hello","spark","hello","world","!")
lines.foreach(println)
val lines2 = lines.map(word=>(word,1))
lines2.foreach(println)

filter()

filter()接收函数,返回只包含满足filter()函数的元素的新RDD

val lines3=lines.filter(word=>word.contains("hello"))
lines3.foreach(println)

flatMap()

对每个输入元素,输出多个输出元素
flat压扁的意思,将RDD中元素压扁后返回一个新的RDD

val inputs=sc.textFile("/home/helloSpark.txt")
inputs.foreach(println)
val lines=inputs.flatMapt(line=>line.split(" "))
lines.foreach(println)
lines.foreach(print) 

集合运算

val rdd1 = sc.parallelize(Array("coffe","coffe","panda","monkey","tea"))
rdd1.foreach(println)
val rdd2 =sc.parallelize(Array("coffe","monkey","kitty"))
rdd2.foreach(println)
val rdd_distinct=rdd1.distinct()   #去重
rdd_distinct.foreach(println)
val rdd_union = rdd1.union(rdd2)  #并集
rdd_union.foreach(println)
val rdd_inter=rdd1.intersection(rdd2)  #交集
rdd_inter.foreach(println)
val rdd_sub=rdd1.subtract(rdd2)    #包含
rd_sub.foreach(println)

你可能感兴趣的:(3 RDDs基本操作之Transformations)