为Groovy的Collection添加并行处理能力

我们知道Groovy中的集合操作collect是串行的。参见Groovy(1.8.6)的源代码org.codehaus.groovy.runtime.DefaultGroovyMethods

    /**
     * Iterates through this aggregate Object transforming each item into a new value using the
     * <code>transform</code> closure, returning a list of transformed values.
     * Example:
     * <pre class="groovyTestCase">def list = [1, 'a', 1.23, true ]
     * def types = list.collect { it.class }
     * assert types == [Integer, String, BigDecimal, Boolean]</pre>
     *
     * @param self      an aggregate Object with an Iterator returning its items
     * @param transform the closure used to transform each item of the aggregate object
     * @return a List of the transformed values
     * @since 1.0
     */
    public static <T> List<T> collect(Object self, Closure<T> transform) {
        return (List<T>) collect(self, new ArrayList<T>(), transform);
    }

 
collect最终使用Java的Iterator:

    /**
     * Iterates through this aggregate Object transforming each item into a new value using the <code>transform</code> closure
     * and adding it to the supplied <code>collector</code>.
     *
     * @param self      an aggregate Object with an Iterator returning its items
     * @param collector the Collection to which the transformed values are added
     * @param transform the closure used to transform each item of the aggregate object
     * @return the collector with all transformed values added to it
     * @since 1.0
     */
    public static <T> Collection<T> collect(Object self, Collection<T> collector, Closure<? extends T> transform) {
        for (Iterator iter = InvokerHelper.asIterator(self); iter.hasNext(); ) {
            collector.add(transform.call(iter.next()));
        }
        return collector;
    }

此处没有任何特殊的,自然就是串行执行transform.call()了。

如何为Collection增加并行处理能力?有个办法,原理很简单,就是将原始的closure包装到线程中,等所有线程完成后整个迭代操作才正式完成。

import java.util.concurrent.*

class ParallelFeature {
    static POOL_SIZE = 10

    static def collectParallel(collections, block) {
        return collectParallel(collections, 60, block)
    }

    static def collectParallel(collections, timeout, block) {
        def exec = Executors.newFixedThreadPool(POOL_SIZE)
        def latch = new CountDownLatch(collections.size())
        def result = collections.collect {
            exec.submit(new Callable() {
                def call() {
                    def result = block(it)
                    latch.countDown()
                    result
                }
            })
        }
        result = latch.await(timeout, TimeUnit.SECONDS) ? result.collect { it.get() } : null
        return result
    }
}

简单起见,该代码没有对异常过多处理。
此外,为了方便使用该方法,还需要用Groovy的metaClass在使用前将它植入。

    	java.util.Collection.metaClass.collectParallel = { block ->
    		ParallelFeature.collectParallel(delegate, block)
    	}

    	java.util.Collection.metaClass.collectParallel = { timeout, block ->
    		ParallelFeature.collectParallel(delegate, timeout, block)
    	}

然后可以直接替换原来代码中的collect操作了。 

files.collectParallel { file ->
    download(file)
}
是不是很简单?

你可能感兴趣的:(groovy)