原文地址:
http://www.erlangatwork.com/2009/03/lies-damned-lies-and-benchmarks.html
Erlang/OTP R13A was released today with a number of major SMP improvements. I've been playing with R13 snapshots for a while and wrote a simple HTTP server to compare the SMP performance on R12 and R13. This server uses {packet, http} to decode requests, increments a counter with a transactional mnesia:read/3 and mnesia:write/1, and responds with the counter's previous value. You'll find the source here.
I ran the HTTP server on a x86_64 CentOS 5 machine running Linux 2.6.18-53.el5. The server has two quad-Core Intel Xeon E5450 CPUs and 8GB of RAM. Erlang/OTP R12B-5 and R13A were compiled from source and run as erl -pa ebin +SN -s ehttpd start where N indicated the number of schedulers to run.
To get performance numbers I ran ab on another server connected via a 100 Mb/s private VLAN as ab -c N -n 100000 http://10.0.0.32:8889/ where N was the number of concurrent requests. ab was run 3 times for each value of N and the following chart shows the average requests/sec with 4 and 8 schedulers.
[img]schedulers.png [/img]
R13A's SMP improvements include multiple run queues and improved locking. It also supports binding schedulers to specific CPU cores and hardware threads. Binding isn't enabled by default, so the following chart shows the result of setting erlang:system_flag(scheduler_bind_type, thread_no_node_processor_spread) and running with 100 concurrent requests.
[img]requests_sec.png [/img]
There is a lot missing from these benchmarks, I didn't test kernel polling and only generated load from one client machine. The drop between 500 and 1000 concurrent requests on R13A +S8 looks too steep and may be the result of using ab. That said, the SMP optimizations in R13 are looking very promising!
根据我在ecug上做的实验:8核心的cpu
[spawn(ring, run,[["100", "10000000000"]]) || _X <- lists:seq(1,1000)].
R12B5:
CPU User% Sys% Wait% Idle|0 |25 |50 |75 100| 3
3 1 21.3 62.4 0.0 16.3|UUUUUUUUUUsssssssssssssssssssssssssssssss >| 3
3 2 20.9 61.7 0.0 17.4|UUUUUUUUUUssssssssssssssssssssssssssssss > | 3
3 3 19.9 63.2 0.0 16.9|UUUUUUUUUsssssssssssssssssssssssssssssss > 3
3 4 18.9 64.2 0.0 16.9|UUUUUUUUUssssssssssssssssssssssssssssssss > 3
3 5 19.9 62.7 0.0 17.4|UUUUUUUUUsssssssssssssssssssssssssssssss > 3
3 6 20.9 63.2 0.0 15.9|UUUUUUUUUUsssssssssssssssssssssssssssssss > 3
3 7 19.4 62.7 0.0 17.9|UUUUUUUUUsssssssssssssssssssssssssssssss > | 3
3 8 19.4 63.7 0.0 16.9
R13A:
CPU User% Sys% Wait% Idle|0 |25 |50 |75 100| 3
3 1 61.2 31.8 0.0 7.0|UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUsssssssssssssss > | 3
3 2 64.7 29.9 0.0 5.5|UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUssssssssssssss > | 3
3 3 62.7 29.9 0.0 7.5|UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUssssssssssssss > | 3
3 4 61.0 32.5 0.0 6.5|UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUssssssssssssssss > | 3
3 5 62.5 30.5 0.0 7.0|UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUsssssssssssssss > | 3
3 6 64.2 29.4 0.0 6.5|UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUssssssssssssss >| 3
3 7 63.7 29.9 0.0 6.5|UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUssssssssssssss > | 3
3 8 65.7 27.9 0.0 6.5|UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUsssssssssssss > | 3
3 +-------------------------------------------------+ 3
3Avg 63.2 30.2 0.0 6.5
sys的调用主要是futex 所有对锁的依赖大量减少!
结论:
速度提高了将近2倍 效果真的很好yeah!