java并行计算Fork/Join和python并行计算pp

      多核时代的已经来临,软件开发人员不得不开始关注并行编程领域。而 JDK 7 中将会加入的 Fork/Join 模式是处理并行编程的一个经典的方法,python中也早已有了Parallel Python模块支持多核并行计算,erlang等为并行计算而生的语言也大红大紫,下面我们通过计算给定数组中数据所有素数之和这样一个实例,来分别体验一下java和python并行计算的特点和性能:

java的Fork/Join实现,需要jsr166y的下载http://g.oswego.edu/dl/concurrency-interest/

import java.util.concurrent.TimeUnit;

import jsr166y.ForkJoinPool;
import jsr166y.ForkJoinTask;
import jsr166y.RecursiveAction;

import org.junit.Test;

class Prime extends RecursiveAction{
 final long[] array;
 int len;
 
 public Prime(long[] array, int len){
  this.array = array;
  this.len =  len;

 }
    private boolean isPrime(long num){
     if (num <2) return false;
     if (num == 2) return true;
     long max = (long)Math.ceil(Math.sqrt(num));
     long i = 2;
     while (i <= max) {
      if (num % i == 0)
       return false;
      ++i;
     }
     return true;
    }
   
    private long sumPrime(long num){
      long sum = 0;
      for(int i = 2; i < num; i++){
       if(isPrime(i)){
        sum += i;
       }
      }
      return sum;
    }
   
    private boolean hasElement(long[] a,long b){
     for(int i = 0; i < a.length; i++){
      if (a[i]== b) return true;
     }
     return false;
     
    }
 @Override
 protected void compute() {
  if (len >= 0){
   System.out.println("Sum of primes below " + array[len] + " is "+sumPrime(array[len]));
   --len;
   invokeAll(new Prime(array,len),new Prime(array,len-2));
   }
  }
}

public class TestForkJoinSimple {
 long[] array = {1000000, 1000100,1000200, 1000300, 1000400};
    @Test
    public void testSort() throws Exception {
        Date date1 = new Date();
        ForkJoinTask sort = new Prime(array, 4);
        ForkJoinPool fjpool = new ForkJoinPool();
        fjpool.submit(sort);
        fjpool.shutdown();
        System.out.println("Starting prime with "+fjpool.getParallelism()+" workers");
        fjpool.awaitTermination(4, TimeUnit.SECONDS);
        System.out.println(new Date().getTime()-date1.getTime());    }
}

运行结果:

Starting prime with 2 workers
Sum of primes below 1000400 is 37582408783
Sum of primes below 1000300 is 37574405939
Sum of primes below 1000100 is 37556402315
Sum of primes below 1000200 is 37566403929
Sum of primes below 1000000 is 37550402023
4043

 

python的pp实现,参考:http://www.parallelpython.com/

import math, sys, time
import pp
def isprime(n):
    if not isinstance(n, int):
        raise TypeError("argument passed to is_prime is not of 'int' type")
    if n < 2:
        return False
    if n == 2:
        return True
    max = int(math.ceil(math.sqrt(n)))
    i = 2
    while i <= max:
        if n % i == 0:
            return False
        i += 1
    return True
def sum_primes(n):
    return sum([x for x in xrange(2,n) if isprime(x)])
print """Usage: python sum_primes.py [ncpus]
    [ncpus] - the number of workers to run in parallel,
    if omitted it will be set to the number of processors in the system
"""

ppservers = ()
if len(sys.argv) > 1:
    ncpus = int(sys.argv[1])
    job_server = pp.Server(ncpus, ppservers=ppservers)
else:
    job_server = pp.Server(ppservers=ppservers)
print "Starting pp with", job_server.get_ncpus(), "workers"

job1 = job_server.submit(sum_primes, (100,), (isprime,), ("math",))
result = job1()
print "Sum of primes below 100 is", result
start_time = time.time()
inputs = (100000, 100100, 100200, 100300, 100400)
jobs = [(input, job_server.submit(sum_primes,(input,), (isprime,), ("math",))) for input in inputs]
for input, job in jobs:
    print "Sum of primes below", input, "is", job()
print "Time elapsed: ", time.time() - start_time, "s"
job_server.print_stats()

运行结果:

Usage: python sum_primes.py [ncpus]
    [ncpus] - the number of workers to run in parallel,
    if omitted it will be set to the number of processors in the system

Starting pp with 2 workers
Sum of primes below 100 is 1060
Sum of primes below 100000 is 454396537
Sum of primes below 100100 is 454996777
Sum of primes below 100200 is 455898156
Sum of primes below 100300 is 456700218
Sum of primes below 100400 is 457603451
Time elapsed:  1.36199998856 s
Job execution statistics:
 job count | % of all jobs | job time sum | time per job | job server
         6 |        100.00 |       2.2980 |     0.383000 | local
Time elapsed since server creation 1.36199998856

 

从运行结果我们可以得出点什么?

你可能感兴趣的:(java,python,大数据处理)