RDD常用方法之subtract&intersection&cartesian

subtract
Return an RDD with the elements from `this` that are not in `other` .    
def subtract(other: RDD[T]): RDD[T]

def subtract(other: RDD[T], numPartitions: Int): RDD[T]

def subtract(other: RDD[T], p: Partitioner): RDD[T]
val a = sc.parallelize(1 to 5)

val b = sc.parallelize(1 to 3)

val c = a.subtract(b)

c.collect

 Array[Int] = Array(4, 5)

 

 
intersection
Return the intersection of this RDD and another one.  The output will not contain any duplicate elements, even if the input RDDs did.    交集
def intersection(other: RDD[T], numPartitions: Int): RDD[T]

def intersection(other: RDD[T], partitioner: Partitioner)(implicit ord: Ordering[T] = null): RDD[T]

def intersection(other: RDD[T]): RDD[T]
val x = sc.parallelize(1 to 10)

val y = sc.parallelize(2 to 8)

val z = x.intersection(y)

z.collect

 Array[Int] = Array(4, 6, 8, 2, 3, 7, 5)

 

cartesian
Return the Cartesian product of this RDD and another one, that is, the RDD of all pairs of elements (a, b) where a is in `this` and b is in `other` .    笛卡尔积
def cartesian[U: ClassTag](other: RDD[U]): RDD[(T, U)] 
val x = sc.parallelize(List(1,2,3))

val y = sc.parallelize(List(6,7,8))

x.cartesian(y).collect

 Array[(Int, Int)] = Array((1,6), (1,7), (1,8), (2,6), (3,6), (2,7), (2,8), (3,7), (3,8))

 

你可能感兴趣的:(intersect)