我们知道Groovy中的集合操作collect是串行的。参见Groovy(1.8.6)的源代码org.codehaus.groovy.runtime.DefaultGroovyMethods
/** * Iterates through this aggregate Object transforming each item into a new value using the * <code>transform</code> closure, returning a list of transformed values. * Example: * <pre class="groovyTestCase">def list = [1, 'a', 1.23, true ] * def types = list.collect { it.class } * assert types == [Integer, String, BigDecimal, Boolean]</pre> * * @param self an aggregate Object with an Iterator returning its items * @param transform the closure used to transform each item of the aggregate object * @return a List of the transformed values * @since 1.0 */ public static <T> List<T> collect(Object self, Closure<T> transform) { return (List<T>) collect(self, new ArrayList<T>(), transform); }
collect最终使用Java的Iterator:
/** * Iterates through this aggregate Object transforming each item into a new value using the <code>transform</code> closure * and adding it to the supplied <code>collector</code>. * * @param self an aggregate Object with an Iterator returning its items * @param collector the Collection to which the transformed values are added * @param transform the closure used to transform each item of the aggregate object * @return the collector with all transformed values added to it * @since 1.0 */ public static <T> Collection<T> collect(Object self, Collection<T> collector, Closure<? extends T> transform) { for (Iterator iter = InvokerHelper.asIterator(self); iter.hasNext(); ) { collector.add(transform.call(iter.next())); } return collector; }
此处没有任何特殊的,自然就是串行执行transform.call()了。
如何为Collection增加并行处理能力?有个办法,原理很简单,就是将原始的closure包装到线程中,等所有线程完成后整个迭代操作才正式完成。
import java.util.concurrent.* class ParallelFeature { static POOL_SIZE = 10 static def collectParallel(collections, block) { return collectParallel(collections, 60, block) } static def collectParallel(collections, timeout, block) { def exec = Executors.newFixedThreadPool(POOL_SIZE) def latch = new CountDownLatch(collections.size()) def result = collections.collect { exec.submit(new Callable() { def call() { def result = block(it) latch.countDown() result } }) } result = latch.await(timeout, TimeUnit.SECONDS) ? result.collect { it.get() } : null return result } }
简单起见,该代码没有对异常过多处理。
此外,为了方便使用该方法,还需要用Groovy的metaClass在使用前将它植入。
java.util.Collection.metaClass.collectParallel = { block -> ParallelFeature.collectParallel(delegate, block) } java.util.Collection.metaClass.collectParallel = { timeout, block -> ParallelFeature.collectParallel(delegate, timeout, block) }
然后可以直接替换原来代码中的collect操作了。
files.collectParallel { file -> download(file) }是不是很简单?