1: 如果变量是List的话,_ 就相当于List变量每一个元素
scala> List((1,2),(5,9)).filter(_._1>1)
res5: List[(Int, Int)] = List((5,9))
scala> List((1,2),(5,9)).filter(_._2>1)
res6: List[(Int, Int)] = List((1,2), (5,9))
scala> List((1,2,3),(5,9.4)).filter(_._2>1)
:24: error: value _2 is not a member of Product with Serializable
List((1,2,3),(5,9.4)).filter(_._2>1)
^
scala> List((1,2,3),(5,9.4,4)).filter(_._2>1)
:24: error: value > is not a member of AnyVal
List((1,2,3),(5,9.4,4)).filter(_._2>1)
^
scala> List((1,2,3),(5,4,4)).filter(_._2>1)
res9: List[(Int, Int, Int)] = List((1,2,3), (5,4,4))
scala> List((1,2,3),(5,4,4)).filter(_._2>3)
res10: List[(Int, Int, Int)] = List((5,4,4))
scala> List(1,2).filter(_._2>3)
:24: error: value _2 is not a member of Int
List(1,2).filter(_._2>3)
^
scala> List(1,2).filter(_._1>3)
:24: error: value _1 is not a member of Int
List(1,2).filter(_._1>3)
^
scala> List(1,2).filter(_>3)
res13: List[Int] = List()
scala> List(1,2).filter(_>1)
res14: List[Int] = List(2)
scala> List((1,2,3),(5,4,4)).filter(_.>(1,2,3))
:24: error: value > is not a member of (Int, Int, Int)
List((1,2,3),(5,4,4)).filter(_.>(1,2,3))
还有一种情况下 _ 的使用,两个_ _就代表同时前后两个元素
scala> val list = List(1,2,3,4,5)
list: List[Int] = List(1, 2, 3, 4, 5)
scala> list.reduceLeft(_ + _) 1+2=3 3+3=6 6+4=10 10+5=15
res16: Int = 15
scala> list.reduceLeft(_ - _) 1-2=-1 -1-3=-4 -4-4=-8 -8-5=-13
res17: Int = -13
scala> list.reduceLeft(_ + _ + _)
list.reduceLeft(_ + _ + _)
^
list.reduceLeft(_ + _ + _)
^
list.reduceLeft(_ + _ + _)
2:reduce操作
scala> List((1,2),(5,4),(7,8)).reduce((a,b) => (a._1+b._1,a._2+b._2))
res25: (Int, Int) = (13,14)
scala> List((1,2),(5,4)).reduce((a,b) => (a._1+b._1,a._2+b._2))
res26: (Int, Int) = (6,6)
scala> List((1,2,4),(5,4,3)).reduce((a,b) => (a._1+b._1,a._2+b._2))
found : (Int, Int)
required: (Int, Int, Int)
List((1,2,4),(5,4,3)).reduce((a,b) => (a._1+b._1,a._2+b._2))
scala> List((1,2,4),(5,4,3)).reduce((a,b,c) => (a._1+b._1,a._2+b._2))
List((1,2,4),(5,4,3)).reduce((a,b,c) => (a._1+b._1,a._2+b._2))
^
List((1,2,4),(5,4,3)).reduce((a,b,c) => (a._1+b._1,a._2+b._2))
^
List((1,2,4),(5,4,3)).reduce((a,b,c) => (a._1+b._1,a._2+b._2))
3:reduceByKey和groupByKey可以查看这篇文章
https://blog.csdn.net/qq_27717921/article/details/79603881
4: flatMap的误点
除了可以对RDD进行flatMap外,还可以对Array进行flatMap,这在对RDD数据 reduceByKey后是挺有用的,因为reduceByKey的value的数据类型是Array
下面展示了一个需求:打印出来Array中每个元素(元组)的_1 _2位,如果直接打印的话就会报错,除非把这种打印嵌套在map操作里面才行
flapMap操作:
scala> val tmp = Array(("dd","hgg",0.5), ("ddsdd","hgeweg",0.5), ("dfd","hgweg",0.55), ("dwd","ahgg",0.5))
tmp: Array[(String, String, Double)] = Array((dd,hgg,0.5), (ddsdd,hgeweg,0.5), (dfd,hgweg,0.55), (dwd,ahgg,0.5))
scala> tmp.flatMap(f => (f._1,f._2))
:32: error: type mismatch;
found : (String, String)
required: scala.collection.GenTraversableOnce[?]
tmp.flatMap(f => (f._1,f._2))
^
scala> tmp.flatMap(f => (f._1))
res52: Array[Char] = Array(d, d, d, d, s, d, d, d, f, d, d, w, d)
scala> val tmp2 = Array(("keyword","title","desc"))
tmp2: Array[(String, String, String)] = Array((keyword,title,desc))
scala> tmp.flatMap(f =>tmp2.map(y => (f._1,f._2,y._1)))
res54: Array[(String, String, String)] = Array((dd,hgg,keyword), (ddsdd,hgeweg,keyword), (dfd,hgweg,keyword), (dwd,ahgg,keyword))