shootout指通过一系列算法对language performance进行比较,具体可看:http://shootout.alioth.debian.org/
里面包含了很多算法:
[root@localhost bench]# ls ackermann except harmonic knucleotide message nsievebits raytracer reversefile tcpecho ary fannkuch hash LICENSE messagering objinst README robots.txt tcprequest binarytrees fannkuchredux hash2 license.txt meteor partialsums recursive sieve tcpsocket chameneos fasta health lists methcall pidigits regexdna spectralnorm tcpstream chameneosredux fastaredux heapsort magicsquares moments primes regexmatch spellcheck threadring discovery fibo hello mandelbrot nbody process regexsub strcat wc dispatch flyweightstate implicitode matrix nestedloop prodcons regsub sumcol wordfreq echo gifdeanim Include matrixnorm nsieve random revcomp takfp
基本上通用的语言都包含了(这里也是比较各语言语法差异的一个好地方)
shootout官方测试的机器是ubuntu,kernel是3.0,arch包含x86 one-core/x86 quad-core/x86-64 one-core/x86-64 quad-core,每种arch下的测试结果排名上略有不同,但纵观所有测试结果,可以发现一些特点:
1)编译型语言往往比虚拟机语言速度更快,但也不完全是。如Java 7比Go在某些测试中速度更好(Javay有JIT)。
2)函数型语言综合排名一般都在中间,没有太差,也没有太好。
3)Fortran Intel是最快的,比C/C++更快,我想这是因为后面的Intel的缘故,Intel的c编译器比gcc也快。
4)P系列语言(Perl/Python/Ruby)往往是最慢的。我测试了一下nbody在Go,Java和Python的耗时,Go耗时1m,Python耗时46m,Java耗时最短40sec。
以上的结论是综合排名,在实际中影响测试结果的因素很多,同一种语言的不同算法实现也影响排名,
在threadring的测试,Erlang Hipe排名第一,haskell排名第二,Go排名第三。threadring算法其实很简单,这里就看看Go/Erlang/Java的threadring实现:
package main import ( "flag" "fmt" "os" ) var n = flag.Int("n", 1000, "how many passes") const Nthread = 503 func f(i int, in <-chan int, out chan<- int) { for { n := <-in if n == 0 { fmt.Printf("%d\n", i) os.Exit(0) } out <- n - 1 } } func main() { flag.Parse() one := make(chan int) // will be input to thread 1 var in, out chan int = nil, one for i := 1; i <= Nthread-1; i++ { in, out = out, make(chan int) go f(i, in, out) } go f(Nthread, out, one) one <- *n <-make(chan int) // hang until ring completes }
在Go中thread用goroutine模拟,把503个thread组成一个环ring,向第一个thread中发送一个数字1000,每个thread收到数字减一后传递给下一个thread,直到数字减为0。
这个go程序耗时28sec(time ./8.out -n 50000000)
这里的go版本threadring并不是shootout官方版本,这里的是google的go实现中的版本,shootout版本在这里。
再看Erlang的threadring:
-module(threadring). -export([main/1, roundtrip/2]). -define(RING, 503). start(Token) -> H = lists:foldl( fun(Id, Pid) -> spawn(threadring, roundtrip, [Id, Pid]) end, self(), lists:seq(?RING, 2, -1)), H ! Token, roundtrip(1, H). roundtrip(Id, Pid) -> receive 1 -> io:fwrite("~b~n", [Id]), erlang:halt(); Token -> Pid ! Token - 1, roundtrip(Id, Pid) end. main([Arg]) -> Token = list_to_integer(Arg), start(Token).
Erlang版本的threadring plain耗时17sec,比go版本还要少,erlang smp耗时就多到1m左右(由于threadring算法设计,导致在runqueue中只有一个running process,所以此场景下smp没有发挥优势,反而由于lock contention导致性能下降)。
[root@localhost threadring]# time erl -smp disable -noshell -run threadring main 50000000 292 real 0m17.290s user 0m0.517s sys 0m16.768s [root@localhost threadring]# time erl -noshell -run threadring main 50000000 292 real 1m9.036s user 0m4.266s sys 1m4.759s
下面是Java的threadring:
import java.util.LinkedList; import java.util.Queue; public class threadring { public static void main(String[] args) { Node[] ring = new Node[503]; for (int i=0; i<ring.length; i++) { ring[i] = new Node(i+1); } for (int i=0; i<ring.length; i++) { int nextIndex = (ring[i].label % ring.length); ring[i].next = ring[nextIndex]; } int nHops = Integer.parseInt(args[0]); new Thread(new Consumer()).start(); ring[0].sendMessage(nHops); } private static Queue<Node> q = new LinkedList<Node>(); static class Consumer implements Runnable { public void run() { while (true) { try { Node node; node = q.poll(); if (node == null) { //ignore, wait for some element Thread.sleep(100); } else { node.run(); } } catch (InterruptedException e) { e.printStackTrace(); } } } } static class Node implements Runnable { private final int label; private Node next; private int message; public Node(int label) { this.label = label; } public void sendMessage(int message) { this.message=message; q.add(this); } public void run() { // System.out.println("after lock"); if (message == 0) { System.out.println(label); System.exit(0); } else { next.sendMessage(message - 1); } } } }
这个Java版本的threadring实现过于简单,一个Consumer线程执行所有的Node,类似于Actor模式,所以此Java程序耗时极短,约2sec。
由于Java没有coroutine之类的设计,所以Java用真实的thread来模拟threadring更有说服力:
/** * The Computer Language Benchmarks Game * http://shootout.alioth.debian.org/ * contributed by Klaus Friedel */ import java.util.concurrent.locks.LockSupport; public class threadring { static final int THREAD_COUNT = 503; public static class MessageThread extends Thread { MessageThread nextThread; volatile Integer message; public MessageThread(MessageThread nextThread, int name) { super(""+name); this.nextThread = nextThread; } public void run() { while(true) nextThread.enqueue(dequeue()); } public void enqueue(Integer hopsRemaining) { if(hopsRemaining == 0){ System.out.println(getName()); System.exit(0); } // as only one message populates the ring, it's impossible // that queue is not empty message = hopsRemaining - 1; LockSupport.unpark(this); // work waiting... } private Integer dequeue(){ while(message == null){ LockSupport.park(); } Integer msg = message; message = null; return msg; } } public static void main(String args[]) throws Exception{ int hopCount = Integer.parseInt(args[0]); MessageThread first = null; MessageThread last = null; for (int i = THREAD_COUNT; i >= 1 ; i--) { first = new MessageThread(first, i); if(i == THREAD_COUNT) last = first; } // close the ring: last.nextThread = first; // start all Threads MessageThread t = first; do{ t.start(); t = t.nextThread; }while(t != first); // inject message first.enqueue(hopCount); first.join(); // wait for System.exit } }
这个Java程序则耗时非常长(time java -server threadring 50000000 )。