Spark的RDD作为一个分布式不可变的数据集合,它提供的转换操作,很多是借鉴于Scala的集合框架提供的一些函数,因此,有必要对Scala的集合进行详细的了解
1. 泛型集合都是协变的,对于List而言,如果B是A的子类,那么List[B]也是List[A]的子类,即可以把List[B]的实例赋值给List[A]变量
2. 给变量赋值(注意val关键字,a,b,c是三个关键字)
//定义三个变量a,b,c val List(a, b, c) = List(1, 2, 3)
那么,a为1,b为2,c为3
3. 给变量赋值(注意val关键字,a,b,c是三个关键字)
//定义三个变量a,b,c val a::b::c::Nil = List(1, 2, 3)
如果不使用Nil结尾,那么c的结果是个List,抛开第一个和第二个元素剩下的元素
4.元素连接得到List
5.多个List连接得到List
//X:::Y:::Z,表示X,Y,Z都是List,它们之间是List连接 val l = List(3, 2, 1) ::: List(4, 5, 6) ::: List(7, 8, 9)//l是1到9的List l.foreach(println(_))
6. mkString(三个参数,前缀,元素分隔,后缀)
val ll = List(1, 2, 3, 5, 6, 7) //prefix, separator, suffix println(ll.mkString("String of the list: [", ",", "]"))
7. List的高阶函数map,foreach,flatMap,filter,partition,find,tabulate,concat
7.1 map
It returns the list resulting from applying the function f to each list element。因此集合长度不变
val l = List(1, 2, 3, 4) //map: convert each element of the list into Tuple val m = l.map(elem => (elem * 2, elem * 4)) //l.map((_*2,_*4))为什么不对? m.foreach(println(_));
7.2 foreach
对List进行for each循环
val l = List(1, 2, 3, 4) def square(x: Int) = { x * x } l.foreach(elem => println(square(elem)))
7.3. filter
过滤得到满足条件的集合
val l = List(1, 2, 3, 4) va evens = l.filter( _%2 == 0) //2,4 evens.foreach(println(_));
7.3.1 find
The find method is also similar to filter but it returns the first element satisfying a given predicate, rather than all such elements.
val l = List(1, 2, 3, 4) val found = l.find( _%2 == 0) //单元素集合还是单元素?单元素集合!! found.foreach(println(_));
7.4 partition
分区函数的目的是根据条件将List分隔成包含两个元素的二元Tuple,Tuple中的每个元素是一个List,
The partition method is like filter, but it returns a pair of lists. One list contains all elements for which the predicate is true, while the other list contains all elements for which the predicate is false. It is defined by the equality:
val l = List(1, 2, 3, 4) val partitions = l.partition(_%2 == 0) partitions._1.foreach(println) partitions._2.foreach(println)
7.5 flatMap
The flatMap operator is similar to map, but it takes a function returning a list of elements as its right operand. It applies the function to each list element and returns the concatenation of all function results
flatMap的参数函数是将元素转换为集合
val listOfList = List(List(1, 2), List(3, 4), List(5, 6)) val flatMap = listOfList.flatMap(_.toList) //flatMap的参数是一个函数,它的目的是对每个元素做function操作得到一个集合,然后将每个集合进行合并 println(flatMap) //1,2,3,4,5,6
7.6 tabulate
The tabulate method creates a list whose elements are computed according to a supplied function. Its arguments are just like those of List.fill: the first argument list gives the dimensions of the list to create, and the second describes the elements of the list. The only difference is that instead of the elements being fixed, they are computed from a function:
//tabulate is method defined on the List, not any instance of List val tabulatedList = List.tabulate(6)(n => n * n) //0, 1, 4, 9 16 25 tabulatedList.foreach(println)
7.7 fill
The fill method creates a list consisting of zero or more copies of the same element. It takes two parameters: the length of the list to be created, and the element to be repeated. Each parameter is given in a separate list:
val fillList = List.fill(10)("Apple") fillList.foreach(println) //print 10 Apple String
//第一个参数表示元素(List类型)个数,第二个参数表示每个List元素有多少个元素 val fillList2 = List.fill(2,3)("Apple") fillList2.foreach(elem => elem.foreach(println))
7.8 concat
val concatList = List.concat(List(1,2,3), List(8,9,10)) concatList.foreach(println) //结果1,2,3,8,9,10
7.9 flatten
The flatten method takes a list of lists and flattens it out to a single list
val flattenList = List(List(1,2),List(3,4)) flattenList.flatten.foreach(println) //1~4
7.10 range
val range = List.range(1,10) range.foreach(println) //0~9
7.11 zip
The zip operation takes two lists and forms a list of pairs:
val zip1 = List(1,2,3) val zip2 = List("A", "B", "C") val zip = zip1.zip(zip2) zip.foreach(println)
结果:
(1,A)
(2,B)
(3,C)
7.12 zipWithIndex
A useful special case is to zip a list with its index. This is done most efficiently with the zipWithIndex method, which pairs every element of a list with the position where it appears in the list.
List(1,2,3).zipWithIndex.foreach(println)
结果:
(1,0)
(2,1)
(3,2)
7.13 unzip
Any list of tuples can also be changed back to a tuple of lists by using the unzip
val unzipped = List((1, "A"), (2, "B"), (3, "C")).unzip unzipped._1.foreach(println) unzipped._2.foreach(println)
一个包含两个元素的Tuple,两个元素都是List集合,分布是1,2,3和A,BC
7.14 List.toArray, Array.toList
List转Array,Array转List
val arr: Array[Int] = l.toArray arr.foreach(println) val list : List[Int] = arr.toList list.foreach(println)