scala中 _ reduce groupByKey reduceByKey...用法记录

1: 如果变量是List的话,_ 就相当于List变量每一个元素

scala> List((1,2),(5,9)).filter(_._1>1)
res5: List[(Int, Int)] = List((5,9))

scala> List((1,2),(5,9)).filter(_._2>1)
res6: List[(Int, Int)] = List((1,2), (5,9))

scala> List((1,2,3),(5,9.4)).filter(_._2>1)
:24: error: value _2 is not a member of Product with Serializable
       List((1,2,3),(5,9.4)).filter(_._2>1)
                                      ^

scala> List((1,2,3),(5,9.4,4)).filter(_._2>1)
:24: error: value > is not a member of AnyVal
       List((1,2,3),(5,9.4,4)).filter(_._2>1)
                                          ^

scala> List((1,2,3),(5,4,4)).filter(_._2>1)
res9: List[(Int, Int, Int)] = List((1,2,3), (5,4,4))

scala> List((1,2,3),(5,4,4)).filter(_._2>3)
res10: List[(Int, Int, Int)] = List((5,4,4))

scala> List(1,2).filter(_._2>3)
:24: error: value _2 is not a member of Int
       List(1,2).filter(_._2>3)
                          ^

scala> List(1,2).filter(_._1>3)
:24: error: value _1 is not a member of Int
       List(1,2).filter(_._1>3)
                          ^

scala> List(1,2).filter(_>3)
res13: List[Int] = List()

scala> List(1,2).filter(_>1)
res14: List[Int] = List(2)

scala> List((1,2,3),(5,4,4)).filter(_.>(1,2,3))
:24: error: value > is not a member of (Int, Int, Int)
       List((1,2,3),(5,4,4)).filter(_.>(1,2,3))

还有一种情况下  _ 的使用,两个_ _就代表同时前后两个元素

scala> val list = List(1,2,3,4,5)
list: List[Int] = List(1, 2, 3, 4, 5)

scala> list.reduceLeft(_ + _)     1+2=3  3+3=6 6+4=10 10+5=15
res16: Int = 15

scala> list.reduceLeft(_ - _)   1-2=-1  -1-3=-4 -4-4=-8 -8-5=-13
res17: Int = -13

scala> list.reduceLeft(_ + _ + _)
:26: error: missing parameter type for expanded function ((x$1, x$2, x$3) => x$1.$plus(x$2).$plus(x$3))
       list.reduceLeft(_ + _ + _)
                       ^
:26: error: missing parameter type for expanded function ((x$1: , x$2, x$3) => x$1.$plus(x$2).$plus(x$3))
       list.reduceLeft(_ + _ + _)
                           ^
:26: error: missing parameter type for expanded function ((x$1: , x$2: , x$3) => x$1.$plus(x$2).$plus(x$3))
       list.reduceLeft(_ + _ + _)

2:reduce操作

scala> List((1,2),(5,4),(7,8)).reduce((a,b) => (a._1+b._1,a._2+b._2))
res25: (Int, Int) = (13,14)

scala>  List((1,2),(5,4)).reduce((a,b) => (a._1+b._1,a._2+b._2))
res26: (Int, Int) = (6,6)

 

scala> List((1,2,4),(5,4,3)).reduce((a,b) => (a._1+b._1,a._2+b._2))
:24: error: type mismatch;
 found   : (Int, Int)
 required: (Int, Int, Int)
       List((1,2,4),(5,4,3)).reduce((a,b) => (a._1+b._1,a._2+b._2))

 

scala> List((1,2,4),(5,4,3)).reduce((a,b,c) => (a._1+b._1,a._2+b._2))
:24: error: missing parameter type
       List((1,2,4),(5,4,3)).reduce((a,b,c) => (a._1+b._1,a._2+b._2))
                                     ^
:24: error: missing parameter type
       List((1,2,4),(5,4,3)).reduce((a,b,c) => (a._1+b._1,a._2+b._2))
                                       ^
:24: error: missing parameter type
       List((1,2,4),(5,4,3)).reduce((a,b,c) => (a._1+b._1,a._2+b._2))

3:reduceByKey和groupByKey可以查看这篇文章

https://blog.csdn.net/qq_27717921/article/details/79603881

4:  flatMap的误点

除了可以对RDD进行flatMap外,还可以对Array进行flatMap,这在对RDD数据 reduceByKey后是挺有用的,因为reduceByKey的value的数据类型是Array

下面展示了一个需求:打印出来Array中每个元素(元组)的_1 _2位,如果直接打印的话就会报错,除非把这种打印嵌套在map操作里面才行

flapMap操作:
scala> val tmp = Array(("dd","hgg",0.5), ("ddsdd","hgeweg",0.5), ("dfd","hgweg",0.55), ("dwd","ahgg",0.5))
tmp: Array[(String, String, Double)] = Array((dd,hgg,0.5), (ddsdd,hgeweg,0.5), (dfd,hgweg,0.55), (dwd,ahgg,0.5))

scala> tmp.flatMap(f => (f._1,f._2))
:32: error: type mismatch;
 found   : (String, String)
 required: scala.collection.GenTraversableOnce[?]
              tmp.flatMap(f => (f._1,f._2))
                               ^

scala> tmp.flatMap(f => (f._1))
res52: Array[Char] = Array(d, d, d, d, s, d, d, d, f, d, d, w, d)

scala> val tmp2 = Array(("keyword","title","desc"))
tmp2: Array[(String, String, String)] = Array((keyword,title,desc))

scala> tmp.flatMap(f =>tmp2.map(y => (f._1,f._2,y._1)))
res54: Array[(String, String, String)] = Array((dd,hgg,keyword), (ddsdd,hgeweg,keyword), (dfd,hgweg,keyword), (dwd,ahgg,keyword))

 

 

你可能感兴趣的:(大数据工程)