spark RDD基础装换操作--filter操作

3.filter操作

将自然数1~100的RDD中所有的质数分配到新RDD中。
scala> val rddData = sc.parallelize(1 to 100)
rddData: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[7] at parallelize at :39

scala> import scala.util.control.Breaks._
import scala.util.control.Breaks._

scala> val rddData2 = rddData.filter(n =>{
     |  var flag = if (n<2) false else true
     |  breakable{
     |   for(x <- 2 until n){
     |    if(n%x == 0){
     |      flag = false
     |      break
     |     }
     |    }
     |   }
     |   flag
     | })
rddData2: org.apache.spark.rdd.RDD[Int] = MapPartitionsRDD[8] at filter at :44

scala> rddData2.collect
res3: Array[Int] = Array(2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97)

说明:
import scala.util.control.Breaks._:使用break需要显示导包
rddData.filter:如果参数为True,则该元素会被添加到新的RDD中。

你可能感兴趣的:(spark)