我们正在将测试环境逐步从mesos框架迁移到阿里云的容器服务,在此过程中测试了四种不同的服务之间相互访问的模式的网络性能。本文阐述了该性能测试的方法,数据和结论。
服务容器之间的交互存在四种不同的访问方式:
docker提供的link方式,可以在一个容器中访问另一个容器。 测试host为app-link。
在编排服务时,为每一个服务指定一个hostname,在其他服务中可以使用hostname访问对应的服务。测试host为app。
容器服务基于HAProxy实现的一套服务间HTTP访问和负载均衡的机制。测试host为app.local。
传统的负载均衡服务,有HTTP和TCP两种监听模式。测试中HTTP SLB host为192.168.1.10,TCP SLB host为192.168.1.40。
分别通过这四种网络访问目标机器上的http服务接口/ok并测试延迟(latency)和吞吐量(throughput)。
ubuntu 下安装:
apt-get install httping
apt-get install apache2-utils
httping -c100 -i 0.01 -g 'http://app-link/ok'
...
connected to 172.18.1.3:80 (121 bytes), seq=96 time=0.45 ms
connected to 172.18.1.3:80 (121 bytes), seq=97 time=0.46 ms
connected to 172.18.1.3:80 (121 bytes), seq=98 time=0.42 ms
connected to 172.18.1.3:80 (121 bytes), seq=99 time=1.63 ms
--- http://app-link/ok ping statistics ---
100 connects, 100 ok, 0.00% failed, time 1050ms
round-trip min/avg/max = 0.4/0.5/1.6 ms
httping -c100 -i 0.01 -g 'http://app/ok'
...
connected to 172.18.1.3:80 (121 bytes), seq=96 time=10.80 ms
connected to 172.18.1.3:80 (121 bytes), seq=97 time=0.43 ms
connected to 172.18.1.3:80 (121 bytes), seq=98 time=0.44 ms
connected to 172.18.1.3:80 (121 bytes), seq=99 time=0.46 ms
--- http://app/ok ping statistics ---
100 connects, 100 ok, 0.00% failed, time 1073ms
round-trip min/avg/max = 0.4/0.7/10.8 ms
httping -c100 -i 0.01 -g 'http://app.local/ok'
...
connected to 172.18.1.2:80 (219 bytes), seq=96 time=0.69 ms
connected to 172.18.1.2:80 (219 bytes), seq=97 time=0.67 ms
connected to 172.18.1.2:80 (219 bytes), seq=98 time=0.74 ms
connected to 172.18.1.2:80 (219 bytes), seq=99 time=0.65 ms
--- http://app.local/ok ping statistics ---
100 connects, 100 ok, 0.00% failed, time 1090ms
round-trip min/avg/max = 0.6/0.9/6.0 ms
httping -c100 -i 0.01 -g 'http://192.168.1.10/ok'
...
connected to 192.168.1.10:80 (140 bytes), seq=96 time=1.19 ms
connected to 192.168.1.10:80 (140 bytes), seq=97 time=1.08 ms
connected to 192.168.1.10:80 (140 bytes), seq=98 time=1.15 ms
connected to 192.168.1.10:80 (140 bytes), seq=99 time=1.30 ms
--- http://192.168.1.10/ok ping statistics ---
100 connects, 100 ok, 0.00% failed, time 1123ms
round-trip min/avg/max = 1.0/1.2/2.9 ms
httping -c100 -i 0.01 -g 'http://192.168.1.40/ok'
...
connected to 192.168.1.40:80 (121 bytes), seq=96 time=1.18 ms
connected to 192.168.1.40:80 (121 bytes), seq=97 time=1.25 ms
connected to 192.168.1.40:80 (121 bytes), seq=98 time=1.06 ms
connected to 192.168.1.40:80 (121 bytes), seq=99 time=1.34 ms
--- http://192.168.1.40/ok ping statistics ---
100 connects, 100 ok, 0.00% failed, time 1137ms
round-trip min/avg/max = 1.0/1.3/2.7 ms
测试了100次HEAD请求,平均时延如下表:
访问方式 | 延时(ms) |
---|---|
docker link | 0.5 |
hostname | 0.7 |
服务发现 | 0.9 |
HTTP SLB | 1.2 |
TCP SLB | 1.3 |
ab -lkc 10000 -n 10000 'http://app-link/ok'
Concurrency Level: 10000
Time taken for tests: 0.864 seconds
Complete requests: 10000
Failed requests: 0
Keep-Alive requests: 10000
Total transferred: 2020000 bytes
HTML transferred: 610000 bytes
Requests per second: 11571.74 [#/sec](mean)
Time per request: 864.174 [ms](mean)
Time per request: 0.086 [ms](mean, across all concurrent requests)
Transfer rate: 2282.71 [Kbytes/sec] received
ab -lkc 10000 -n 10000 'http://app/ok'
Concurrency Level: 10000
Time taken for tests: 1.055 seconds
Complete requests: 10000
Failed requests: 0
Keep-Alive requests: 10000
Total transferred: 2020000 bytes
HTML transferred: 610000 bytes
Requests per second: 9476.49 [#/sec](mean)
Time per request: 1055.243 [ms](mean)
Time per request: 0.106 [ms](mean, across all concurrent requests)
Transfer rate: 1869.39 [Kbytes/sec] received
ab -lkc 10000 -n 10000 'http://app.local/ok'
Concurrency Level: 10000
Time taken for tests: 4.276 seconds
Complete requests: 10000
Failed requests: 0
Keep-Alive requests: 10000
Total transferred: 3000000 bytes
HTML transferred: 610000 bytes
Requests per second: 2338.60 [#/sec](mean)
Time per request: 4276.066 [ms](mean)
Time per request: 0.428 [ms](mean, across all concurrent requests)
Transfer rate: 685.14 [Kbytes/sec] received
ab -lkc 10000 -n 10000 'http://192.168.1.10/ok'
Concurrency Level: 10000
Time taken for tests: 6.308 seconds
Complete requests: 10000
Failed requests: 0
Non-2xx responses: 580
Keep-Alive requests: 10000
Total transferred: 2141800 bytes
HTML transferred: 732380 bytes
Requests per second: 1585.41 [#/sec](mean)
Time per request: 6307.517 [ms](mean)
Time per request: 0.631 [ms](mean, across all concurrent requests)
Transfer rate: 331.60 [Kbytes/sec] received
同时10000个并发请求,有580个请求SLB nginx返回了504错误,以下为response。
504 Gateway Time-out
504 Gateway Time-out
The gateway did not receive a timely response from the upstream server or application.
WARNING: Response code not 2xx (504)
ab -lkc 10000 -n 10000 'http://192.168.1.40/ok'
Concurrency Level: 10000
Time taken for tests: 1.891 seconds
Complete requests: 10000
Failed requests: 0
Keep-Alive requests: 10000
Total transferred: 2020000 bytes
HTML transferred: 610000 bytes
Requests per second: 5287.14 [#/sec](mean)
Time per request: 1891.383 [ms](mean)
Time per request: 0.189 [ms](mean, across all concurrent requests)
Transfer rate: 1042.97 [Kbytes/sec] received
并发请求10000次,每秒处理的请求数如下表:
访问方式 | 吞吐量(RPS) |
---|---|
docker link | 11571.74 |
hostname | 9476.49 |
服务发现 | 2338.60 |
HTTP SLB | 1585.41 |
TCP SLB | 5287.14 |
虽然docker link和hostname网络性能最佳,但不清楚其负载能力如何。测试中我们发现hostname方式是具有负载能力的,不过在官方帮助文档中,hostname访问方式被放在『不具备负载均衡能力的访问方式』中,而且被描述为『能做到一定的负载均衡的作用』。可见阿里云并没有强调其负载能力。如果在生产环境中使用,负载均衡能力也是相当重要的一个指标。
最后,这两个问题还需要向阿里云进一步确认: