scala并行化集合

scala底层已经实现并行化集合 (但瓶颈会限制在单机上====> 所以 分布式 spark)

reduce求和是采用reduceLeft (从左叠加)
缺点: 速度回随着元素的增加而增加!!!
且叠加时为单线程

优化:使用并行化集合 scala.collection.parallel
parallel

scala> val lst0 = List(1,2,8,5,6,3,0,7,9)
lst0: List[Int] = List(1, 2, 8, 5, 6, 3, 0, 7, 9)

scala> lst0.reduce(_+_)
res17: Int = 41

scala> lst0.par
res18: scala.collection.parallel.immutable.ParSeq[Int] = ParVector(1, 2, 8, 5, 6, 3, 0, 7, 9)

scala> lst0.par.reduce(_+_)
res19: Int = 41

par 调用的不再是reduceLeft 而是 放在线程池里 多个线程去计算

这里写图片描述

并行化集合 每个初始值100都会在每个线程内初始一次
多核 每次分配的线程都不一样 所以结果每次不一定相等

scala> lst0.par.reduce(_+_)
res19: Int = 41

scala> lst0.par.fold(0)(_+_)
res20: Int = 41

scala> lst0.par.fold(100)(_+_)
res21: Int = 741

scala> lst0.par.fold(100)(_+_)
res22: Int = 441

scala> lst0.par.fold(100)(_+_)
res23: Int = 841

scala> lst0.par.fold(100)(_+_)
res24: Int = 841

scala> lst0.par.fold(100)(_+_)
res25: Int = 841
scala> val arr = List(List(1,2,3), List(3,4,5), List(2), List(0))

scala> arr.aggregate(100)(_+_.sum,_+_)
res27: Int = 120

scala> arr.aggregate(100)(_+_.sum,_+_)
res28: Int = 120

scala> arr.aggregate(100)(_+_.sum,_+_)
res29: Int = 120
//最大只会是420
scala> arr.par.aggregate(100)(_+_.sum,_+_)
res30: Int = 420
//最大只会是420
scala> arr.par.aggregate(100)(_+_.sum,_+_)
res31: Int = 420
scala> arr.par.aggregate(100)(_+_.sum,_+_)
res45: Int = 420

scala> arr.par.aggregate(100)(_+_.sum,_+_)
res46: Int = 320

scala> arr.par.aggregate(100)(_+_.sum,_+_)
res47: Int = 420

scala> arr.par.aggregate(100)(_+_.sum,_+_)

你可能感兴趣的:(scala)