上节我们讨论了并行运算组件库的基础设计,实现了并行运算最基本的功能:创建新的线程并提交一个任务异步执行。并行运算类型的基本表达形式如下:
import java.util.concurrent._ object Par { type Par[A] = ExecutorService => Future[A] def run[A](es: ExecutorService)(pa: Par[A]): Future[A] = pa(es) //> run: [A](es: java.util.concurrent.ExecutorService)(pa: ch71.Par.Par[A])java. //| util.concurrent.Future[A] def unit[A](a: A): Par[A] = { es => new Future[A] { def get = a def get(t: Long, u: TimeUnit) = get def isDone = true def isCancelled = false def cancel(evenIsRunning: Boolean) = false } } //> unit: [A](a: A)ch71.Par.Par[A] <strong> def fork[A](pa: Par[A]): Par[A] = { //注意这里有个错误? </strong> es => es.submit(new Callable[A] { def call: A = run(es)(pa).get }) } def async[A](a: => A): Par[A] = fork(unit(a)) }
1、unit[A](a: A): Par[A] : 我们硬生生的按照Par的类型款式造了一个Future实例,这样我们才可以用Future.get的形式读取运算结果值。看看这个例子:unit(42+1),在调用函数unit时由于传入参数是即时计算的,所以在进入unit前已经完成了计算结果43。然后人为的把这个结果赋予Future.get,这样我们就可以和真正的由ExecutorService返回的Future一样用同样的方式读取结果。所以说unit纯粹是一个改变格式的升格函数,没有任何其它作用。
2、async[A](a: => A): Par[A]:这个async函数把表达式a提交到主线程之外的另一个线程。新的线程由ExecutorService提供,我们无须理会,这样可以实现线程管理和并行运算组件库的松散耦合。由于async的传人函数是延后计算类型,所以我们可以把表达式a提交给另一个线程去运算。
那么我们用例子来示范一下:
<span style="font-size:14px;"> val es = Executors.newCachedThreadPool() //线程由jvm提供,我们无须理会 //> es : java.util.concurrent.ExecutorService = java.util.concurrent.ThreadPool //| Executor@19dfb72a[Running, pool size = 0, active threads = 0, queued tasks = //| 0, completed tasks = 0] val a = unit({println(Thread.currentThread.getName); 42+1}) </span><span style="font-size:18px;"><strong>//> main</strong></span><span style="font-size:14px;"> //| a : ch71.Par.Par[Int] = <function1> val b = async({println(Thread.currentThread.getName); 42+1}) </span><span style="font-size:18px;"><strong> //> main</strong></span><span style="font-size:14px;"> //| b : ch71.Par.Par[Int] = <function1> run(es)(a).get //> res0: Int = 43 run(es)(b).get //> res1: Int = 43 es.shutdown() </span>
def fork[A](pa: => Par[A]): Par[A] = { es => es.submit(new Callable[A] { def call: A = run(es)(pa).get }) } //> fork: [A](pa: ch71.Par.Par[A])ch71.Par.Par[A]
<span style="font-size:14px;"> val es = Executors.newCachedThreadPool() //线程由jvm提供,我们无须理会 //> es : java.util.concurrent.ExecutorService = java.util.concurrent.ThreadPool //| Executor@19dfb72a[Running, pool size = 0, active threads = 0, queued tasks = //| 0, completed tasks = 0] val a = unit({println(Thread.currentThread.getName); 42+1}) </span><span style="font-size:18px;"><strong>//> main</strong></span><span style="font-size:14px;"> //| a : ch71.Par.Par[Int] = <function1> val b = async({println(Thread.currentThread.getName); 42+1}) //> b : ch71.Par.Par[Int] = <function1> run(es)(a).get //> res0: Int = 43 run(es)(b).get //> </span><strong><span style="font-size:18px;">pool-1-thread-1</span></strong><span style="font-size:14px;"> //| res1: Int = 43 es.shutdown() </span>
实现异步运算才是并行运算的第一步。并行运算顾名思义就是把一个大任务分解成几个较小任务然后同时异步运算后再把结果结合起来。我们用伪代码描述一下并行运算思路:
//伪代码 val big10sencondJob = ??? //一个10秒运算 val small5sJob1 = split big10sencondJob in half //分解成两个 5秒运算 val small5sJob2 = split big10sencondJob in half //分解成两个 5秒运算 val fa = run small5sJob1 //立即返回future 但开始运算 5 秒 val fb = run small5sJob2 //立即返回future 但开始运算 5 秒 val sum = fa.get + fb.get //等待5秒后可以得出结果
先用泛函方式启动并行运算。如果我们并行启动两个运算:
def map2[A,B,C](pa: Par[A], pb: Par[B])(f: (A,B) => C): Par[C]
def map2[A,B,C](pa: Par[A], pb: Par[B])(f: (A,B) => C): Par[C] = { import TimeUnit.NANOSECONDS es => new Future[C] { val fa = run(es)(pa) //在这里按pa的定义来确定在那个线程运行。如果pa是fork Par则在非主线程中运行 val fb = run(es)(pb) def get = f(fa.get, fb.get) def get(timeOut: Long, timeUnit: TimeUnit) = { val start = System.nanoTime val a = fa.get val end = System.nanoTime //fa.get用去了一些时间。剩下给fb.get的timeout值要减去 val b = fb.get(timeOut - timeUnit.convert((end - start), NANOSECONDS) , timeUnit) f(a,b) } def isDone = fa.isDone && fb.isDone def isCancelled = fa.isCancelled && fb.isCancelled def cancel(evenIsRunning: Boolean) = fa.cancel(evenIsRunning) || fb.cancel(evenIsRunning) } } //> map2: [A, B, C](pa: ch71.Par.Par[A], pb: ch71.Par.Par[B])(f: (A, B) => C)ch //| 71.Par.Par[C]
我们先试着同时运算41+2,33+4两个计算:
val es = Executors.newCachedThreadPool() //线程由jvm提供,我们无须理会 //> es : java.util.concurrent.ExecutorService = java.util.concurrent.ThreadPoo //| lExecutor@19dfb72a[Running, pool size = 0, active threads = 0, queued tasks //| = 0, completed tasks = 0] map2(async({println(Thread.currentThread.getName); 41+2}), async({println(Thread.currentThread.getName); 33+4})) {(a,b) => {println(Thread.currentThread.getName); a+b}}(es).get <strong> //> pool-1-thread-1 //| pool-1-thread-2 //| main //| res0: Int = 80</strong>
fork { map2(async({println(Thread.currentThread.getName); 41+2}), async({println(Thread.currentThread.getName); 33+4})) {(a,b) => {println(Thread.currentThread.getName); a+b}}}(es).get <strong> //> pool-1-thread-2 //| pool-1-thread-3 //| pool-1-thread-1 //| res0: Int = 80 </strong>
两个以上并行运算可以通过map2来实现:
def map3[A,B,C,D](pa: Par[A], pb: Par[B], pc: Par[C])(f: (A,B,C) => D): Par[D] = { map2(pa,map2(pb,pc){(b,c) => (b,c)}){(a,bc) => { val (b,c) = bc f(a,b,c) }} } def map4[A,B,C,D,E](pa: Par[A], pb: Par[B], pc: Par[C], pd: Par[D])(f: (A,B,C,D) => E): Par[E] = { //| 71.Par.Par[C] map2(pa,map2(pb,map2(pc,pd){(c,d) => (c,d)}){(b,cd) => (b,cd)}){(a,bcd) => { val (b,(c,d)) = bcd f(a,b,c,d) }} } def map5[A,B,C,D,E,F](pa: Par[A], pb: Par[B], pc: Par[C], pd: Par[D], pe: Par[E])(f: (A,B,C,D,E) => F): Par[F] = { //| 71.Par.Par[C] map2(pa,map2(pb,map2(pc,map2(pd,pe){(d,e) => (d,e)}){(c,de) => (c,de)}){(b,cde) => (b,cde)}){(a,bcde) => { val (b,(c,(d,e))) = bcde f(a,b,c,d,e) }} }
//我们可以run pa, get list 后进行排序,然后再封装进Future[List[Int]] def sortPar(pa: Par[List[Int]]): Par[List[Int]] = { es => { val l = run(es)(pa).get new Future[List[Int]] { def get = l.sorted def isDone = true def isCancelled = false def get(t: Long, u: TimeUnit) = get def cancel(e: Boolean) = false } } } //也可以用map2来实现。因为map2可以启动并行运算,也可以对par内元素进行操作。但操作只针对一个par, //我们用unit(())替代第二个par。现在我们可以对一个par的元素进行操作了 def sortedPar(pa: Par[List[Int]]): Par[List[Int]] = { map2(pa,unit(())){(a,_) => a.sorted} } //map是对一个par的元素进行变形操作,我们同样可以用map2实现了 def map[A,B](pa: Par[A])(f: A => B): Par[B] = { map2(pa,unit(())){(a,_) => f(a) } } //然后用map去对Par[List[Int]]排序 def sortParByMap(pa: Par[List[Int]]): Par[List[Int]] = { map(pa){_.sorted} }
sortPar(async({println(Thread.currentThread.getName); List(4,1,2,3)}))(es).get //> pool-1-thread-1 //| res3: List[Int] = List(1, 2, 3, 4) sortParByMap(async({println(Thread.currentThread.getName); List(4,1,2,3)}))(es).get //> pool-1-thread-1 //| res4: List[Int] = List(1, 2, 3, 4)
//启动两项并行运算 def product[A,B](pa: Par[A], pb: Par[B]): Par[(A,B)] = { es => unit((run(es)(pa).get, run(es)(pb).get))(es) } //> product: [A, B](pa: ch71.Par.Par[A], pb: ch71.Par.Par[B])ch71.Par.Par[(A, B //| )] //处理运算结果 def map[A,B](pa: Par[A])(f: A => B): Par[B] = { es => unit(f(run(es)(pa).get))(es) } //> map: [A, B](pa: ch71.Par.Par[A])(f: A => B)ch71.Par.Par[B] //再组合map2 def map2_pm[A,B,C](pa: Par[A], pb: Par[B])(f: (A,B) => C): Par[C] = { map(product(pa, pb)){a => f(a._1, a._2)} } //> map2_pm: [A, B, C](pa: ch71.Par.Par[A], pb: ch71.Par.Par[B])(f: (A, B) => C //| )ch71.Par.Par[C]
def asyncF[A,B](f: A => B): A => Par[B] = a => async(f(a)) //> asyncF: [A, B](f: A => B)A => ch71.Par.Par[B]
def parMap[A,B](as: List[A])(f: A => B): Par[List[B]]
//用递归法实现 def sequence_r[A](lp: List[Par[A]]): Par[List[A]] = { lp match { case Nil => unit(List()) case h::t => map2(h,fork(sequence_r(t))){_ :: _} } } //> sequence_r: [A](lp: List[ch71.Par.Par[A]])ch71.Par.Par[List[A]] //用foldLeft def sequenceByFoldLeft[A](lp: List[Par[A]]): Par[List[A]] = { lp.foldLeft(unit[List[A]](Nil)){(t,h) => map2(h,t){_ :: _}} } //> sequenceByFoldLeft: [A](lp: List[ch71.Par.Par[A]])ch71.Par.Par[List[A]] //用foldRight def sequenceByFoldRight[A](lp: List[Par[A]]): Par[List[A]] = { lp.foldRight(unit[List[A]](Nil)){(h,t) => map2(h,t){_ :: _}} } //> sequenceByFoldRight: [A](lp: List[ch71.Par.Par[A]])ch71.Par.Par[List[A]] //用IndexedSeq切成两半来实现 def sequenceBalanced[A](as: IndexedSeq[Par[A]]): Par[IndexedSeq[A]] = { if (as.isEmpty) unit(Vector()) else if (as.length == 1) map(as.head){a => Vector(a)} else { val (l,r) = as.splitAt(as.length / 2) map2(sequenceBalanced(l),sequenceBalanced(r)){_ ++ _} } } //> sequenceBalanced: [A](as: IndexedSeq[ch71.Par.Par[A]])ch71.Par.Par[IndexedS def sequence[A](lp: List[Par[A]]): Par[List[A]] = { //| eq[A]] map(sequenceBalanced(lp.toIndexedSeq)){_.toList} }
def parMap[A,B](as: List[A])(f: A => B): Par[List[B]] = fork { val lps = as.map{asyncF(f)} sequence(lps) } //> parMap: [A, B](as: List[A])(f: A => B)ch71.Par.Par[List[B]]
fork(parMap(List(1,2,3,4,5)){ _ + 10 })(es).get //> pool-1-thread-1 //| pool-1-thread-2 //| pool-1-thread-3 //| pool-1-thread-4 //| pool-1-thread-5 //| pool-1-thread-6 //| pool-1-thread-8 //| pool-1-thread-7 //| pool-1-thread-9 //| pool-1-thread-10 //| pool-1-thread-14 //| pool-1-thread-12 //| pool-1-thread-15 //| pool-1-thread-11 //| pool-1-thread-13 //| res3: List[Int] = List(11, 12, 13, 14, 15)