做RoCE性能测试时,在网上没有找到太多有效的完整的性能参考数据。
故把实测数据贴出,方便后来者进行参考对比。
系统未做明显优化,仅跑了一遍tuned_adm和mlnx_tune。
结果
系统信息
Ubuntu20.04
GenuineIntel Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz N/A
Memory Status Total: 1007.53 GB Free: 898.98 GB
MLNX_OFED_LINUX-23.07-0.5.1.2
ConnectX-5 Device Status on PCI 17:00.0 100 Gb/sec (4X EDR)
perftest
测试RoCE v1和v2无明显差别。仅列举v2数据。
ib_atomic_bw 10.10.11.21 -F -n 100000
---------------------------------------------------------------------------------------
Atomic FETCH_AND_ADD BW Test
Dual-port : OFF Device : mlx5_0
Number of qps : 1 Transport type : IB
Connection type : RC Using SRQ : OFF
PCIe relax order: ON
ibv_wr* API : ON
TX depth : 128
CQ Moderation : 100
Mtu : 4096[B]
Link type : Ethernet
GID index : 4
Outstand reads : 16
rdma_cm QPs : OFF
Data ex. method : Ethernet
---------------------------------------------------------------------------------------
local address: LID 0000 QPN 0x0224 PSN 0x5c1839
GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:10:11:11
remote address: LID 0000 QPN 0x1186 PSN 0xbd5c66
GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:10:11:21
---------------------------------------------------------------------------------------
#bytes #iterations BW peak[MB/sec] BW average[MB/sec] MsgRate[Mpps]
8 100000 0.00 11.06 1.449399
------------------------------------------------------------------------------------
ib_atomic_lat 10.10.11.21 -F -n 100000
---------------------------------------------------------------------------------------
Atomic FETCH_AND_ADD Latency Test
Dual-port : OFF Device : mlx5_0
Number of qps : 1 Transport type : IB
Connection type : RC Using SRQ : OFF
PCIe relax order: ON
ibv_wr* API : ON
TX depth : 1
Mtu : 4096[B]
Link type : Ethernet
GID index : 4
Outstand reads : 16
rdma_cm QPs : OFF
Data ex. method : Ethernet
---------------------------------------------------------------------------------------
local address: LID 0000 QPN 0x0226 PSN 0xf7135e
GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:10:11:11
remote address: LID 0000 QPN 0x1188 PSN 0xeff4d8
GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:10:11:21
---------------------------------------------------------------------------------------
#bytes #iterations t_min[usec] t_max[usec] t_typical[usec] t_avg[usec] t_stdev[usec] 99% percentile[usec] 99.9% percentile[usec]
8 100000 2.65 10.36 2.72 2.73 0.10 2.87 4.04
---------------------------------------------------------------------------------------
ib_write_lat -a 10.10.11.21 -F
---------------------------------------------------------------------------------------
RDMA_Write Latency Test
Dual-port : OFF Device : mlx5_0
Number of qps : 1 Transport type : IB
Connection type : RC Using SRQ : OFF
PCIe relax order: OFF
ibv_wr* API : ON
TX depth : 1
Mtu : 4096[B]
Link type : Ethernet
GID index : 4
Max inline data : 220[B]
rdma_cm QPs : OFF
Data ex. method : Ethernet
---------------------------------------------------------------------------------------
local address: LID 0000 QPN 0x0227 PSN 0x8f570f RKey 0x1813ab VAddr 0x007f0d4e0b2000
GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:10:11:11
remote address: LID 0000 QPN 0x1189 PSN 0x293652 RKey 0x1871f1 VAddr 0x007f23e376c000
GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:10:11:21
---------------------------------------------------------------------------------------
#bytes #iterations t_min[usec] t_max[usec] t_typical[usec] t_avg[usec] t_stdev[usec] 99% percentile[usec] 99.9% percentile[usec]
2 1000 1.49 2.33 1.52 1.52 0.00 1.58 2.33
4 1000 1.39 2.31 1.41 1.41 0.00 1.43 2.31
8 1000 1.39 2.55 1.42 1.42 0.00 1.43 2.55
16 1000 1.39 2.80 1.42 1.42 0.00 1.43 2.80
32 1000 1.42 2.64 1.44 1.44 0.00 1.46 2.64
64 1000 1.43 2.42 1.46 1.47 0.00 1.55 2.42
128 1000 1.47 1.75 1.50 1.50 0.00 1.62 1.75
256 1000 2.22 2.47 2.25 2.26 0.00 2.41 2.47
512 1000 2.36 3.65 2.39 2.41 0.00 2.59 3.65
1024 1000 2.45 2.82 2.49 2.52 0.00 2.70 2.82
2048 1000 2.66 4.68 2.70 2.74 0.00 2.94 4.68
4096 1000 3.09 4.15 3.19 3.21 0.00 3.41 4.15
8192 1000 3.45 3.81 3.56 3.55 0.00 3.69 3.81
16384 1000 4.13 5.93 4.22 4.25 0.00 4.53 5.93
32768 1000 5.45 6.96 5.51 5.56 0.00 5.80 6.96
65536 1000 8.13 9.88 8.20 8.22 0.00 8.42 9.88
131072 1000 13.51 15.79 13.67 13.68 0.00 13.80 15.79
262144 1000 24.19 24.66 24.31 24.34 0.00 24.63 24.66
524288 1000 45.57 47.44 45.61 45.64 0.03 45.99 47.44
1048576 1000 88.37 89.54 89.20 89.12 0.00 89.48 89.54
2097152 1000 174.00 177.11 174.77 174.80 0.04 175.06 177.11
4194304 1000 345.85 346.20 345.99 345.99 0.00 346.14 346.20
8388608 1000 688.12 688.72 688.17 688.22 0.00 688.57 688.72
---------------------------------------------------------------------------------------
ib_write_bw -a 10.10.11.21 -F --report_gbits
---------------------------------------------------------------------------------------
RDMA_Write BW Test
Dual-port : OFF Device : mlx5_0
Number of qps : 1 Transport type : IB
Connection type : RC Using SRQ : OFF
PCIe relax order: ON
ibv_wr* API : ON
TX depth : 128
CQ Moderation : 100
Mtu : 4096[B]
Link type : Ethernet
GID index : 4
Max inline data : 0[B]
rdma_cm QPs : OFF
Data ex. method : Ethernet
---------------------------------------------------------------------------------------
local address: LID 0000 QPN 0x0229 PSN 0xff1ace RKey 0x203dbd VAddr 0x007f2b168e1000
GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:10:11:11
remote address: LID 0000 QPN 0x118b PSN 0xdeb80e RKey 0x027200 VAddr 0x007f1f13f19000
GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:10:11:21
---------------------------------------------------------------------------------------
#bytes #iterations BW peak[Gb/sec] BW average[Gb/sec] MsgRate[Mpps]
2 5000 0.11 0.11 6.881392
4 5000 0.17 0.17 5.328560
8 5000 0.48 0.48 7.554181
16 5000 0.97 0.97 7.542184
32 5000 1.94 1.94 7.570855
64 5000 3.88 3.88 7.571238
128 5000 7.74 7.74 7.556555
256 5000 15.53 15.50 7.570179
512 5000 30.89 30.84 7.530099
1024 5000 59.02 58.95 7.195514
2048 5000 89.85 89.65 5.471935
4096 5000 97.47 97.40 2.972274
8192 5000 97.75 97.72 1.491165
16384 5000 97.89 97.88 0.746774
32768 5000 97.96 97.96 0.373695
65536 5000 98.01 98.00 0.186923
131072 5000 98.02 98.01 0.093474
262144 5000 98.03 98.03 0.046743
524288 5000 98.03 98.03 0.023373
1048576 5000 98.04 98.04 0.011687
2097152 5000 98.04 98.04 0.005843
4194304 5000 98.04 98.04 0.002922
8388608 5000 98.04 98.04 0.001461
---------------------------------------------------------------------------------------
ib_read_lat -a 10.10.11.21 -F
---------------------------------------------------------------------------------------
RDMA_Read Latency Test
Dual-port : OFF Device : mlx5_0
Number of qps : 1 Transport type : IB
Connection type : RC Using SRQ : OFF
PCIe relax order: ON
ibv_wr* API : ON
TX depth : 1
Mtu : 4096[B]
Link type : Ethernet
GID index : 4
Outstand reads : 16
rdma_cm QPs : OFF
Data ex. method : Ethernet
---------------------------------------------------------------------------------------
local address: LID 0000 QPN 0x022a PSN 0xd157ea OUT 0x10 RKey 0x203dbd VAddr 0x007f87f9bad000
GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:10:11:11
remote address: LID 0000 QPN 0x118c PSN 0x313e2c OUT 0x10 RKey 0x027200 VAddr 0x007fd732cf8000
GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:10:11:21
---------------------------------------------------------------------------------------
#bytes #iterations t_min[usec] t_max[usec] t_typical[usec] t_avg[usec] t_stdev[usec] 99% percentile[usec] 99.9% percentile[usec]
2 1000 2.66 2.95 2.71 2.72 0.00 2.92 2.95
4 1000 2.67 3.63 2.71 2.72 0.00 2.84 3.63
8 1000 2.72 3.22 2.78 2.79 0.00 3.02 3.22
16 1000 2.84 3.30 2.89 2.93 0.00 3.27 3.30
32 1000 2.84 3.30 2.89 2.93 0.00 3.26 3.30
64 1000 2.84 3.27 2.90 2.93 0.00 3.24 3.27
128 1000 2.87 3.74 2.93 2.93 0.00 3.01 3.74
256 1000 2.93 3.65 3.00 3.01 0.00 3.18 3.65
512 1000 3.03 3.47 3.10 3.13 0.00 3.33 3.47
1024 1000 3.16 3.63 3.23 3.26 0.00 3.44 3.63
2048 1000 3.39 4.03 3.47 3.51 0.00 3.91 4.03
4096 1000 3.93 4.37 4.04 4.05 0.00 4.22 4.37
8192 1000 4.48 4.97 4.63 4.65 0.00 4.88 4.97
16384 1000 5.28 6.10 5.55 5.59 0.00 5.96 6.10
32768 1000 6.66 7.63 6.90 6.97 0.00 7.51 7.63
65536 1000 9.36 12.17 9.76 9.80 0.03 10.59 12.17
131072 1000 14.06 17.44 14.24 14.41 0.29 16.09 17.44
262144 1000 24.75 28.09 24.91 24.95 0.10 25.33 28.09
524288 1000 46.15 54.60 46.31 46.60 1.06 53.16 54.60
1048576 1000 88.93 102.07 89.07 89.39 1.58 100.08 102.07
2097152 1000 174.49 201.24 175.41 175.52 2.04 179.07 201.24
4194304 1000 345.65 347.28 346.62 346.63 0.00 346.97 347.28
8388608 1000 688.47 781.16 688.81 688.93 1.36 689.32 781.16
---------------------------------------------------------------------------------------
ib_read_bw -a 10.10.11.21 -F --report_gbits
---------------------------------------------------------------------------------------
RDMA_Read BW Test
Dual-port : OFF Device : mlx5_0
Number of qps : 1 Transport type : IB
Connection type : RC Using SRQ : OFF
PCIe relax order: ON
ibv_wr* API : ON
TX depth : 128
CQ Moderation : 100
Mtu : 4096[B]
Link type : Ethernet
GID index : 4
Outstand reads : 16
rdma_cm QPs : OFF
Data ex. method : Ethernet
---------------------------------------------------------------------------------------
local address: LID 0000 QPN 0x022b PSN 0x1959d2 OUT 0x10 RKey 0x203dbd VAddr 0x007ff2ed43d000
GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:10:11:11
remote address: LID 0000 QPN 0x118d PSN 0x68625d OUT 0x10 RKey 0x027200 VAddr 0x007fad14218000
GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:10:11:21
---------------------------------------------------------------------------------------
#bytes #iterations BW peak[Gb/sec] BW average[Gb/sec] MsgRate[Mpps]
2 1000 0.069752 0.068694 4.293375
4 1000 0.17 0.17 5.186768
8 1000 0.34 0.34 5.290751
16 1000 0.67 0.67 5.251310
32 1000 1.35 1.34 5.235130
64 1000 2.71 2.70 5.282983
128 1000 5.41 5.41 5.280663
256 1000 10.73 10.71 5.228649
512 1000 21.73 21.69 5.295133
1024 1000 41.26 41.17 5.025470
2048 1000 61.95 61.89 3.777417
4096 1000 83.82 83.80 2.557335
8192 1000 85.08 85.06 1.297944
16384 1000 88.15 88.13 0.672377
32768 1000 97.85 97.84 0.373236
65536 1000 97.94 97.94 0.186813
131072 1000 97.99 97.99 0.093450
262144 1000 98.02 98.02 0.046739
524288 1000 98.02 98.02 0.023370
1048576 1000 98.03 98.03 0.011686
2097152 1000 98.04 98.04 0.005843
4194304 1000 98.04 98.04 0.002922
8388608 1000 98.04 98.04 0.001461
---------------------------------------------------------------------------------------
ib_send_lat -a 10.10.11.21 -F
---------------------------------------------------------------------------------------
Send Latency Test
Dual-port : OFF Device : mlx5_0
Number of qps : 1 Transport type : IB
Connection type : RC Using SRQ : OFF
PCIe relax order: ON
ibv_wr* API : ON
TX depth : 1
Mtu : 4096[B]
Link type : Ethernet
GID index : 4
Max inline data : 236[B]
rdma_cm QPs : OFF
Data ex. method : Ethernet
---------------------------------------------------------------------------------------
local address: LID 0000 QPN 0x022c PSN 0xb7944e
GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:10:11:11
remote address: LID 0000 QPN 0x118e PSN 0x3a106
GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:10:11:21
---------------------------------------------------------------------------------------
#bytes #iterations t_min[usec] t_max[usec] t_typical[usec] t_avg[usec] t_stdev[usec] 99% percentile[usec] 99.9% percentile[usec]
2 1000 1.37 6.00 1.41 1.42 0.06 1.52 6.00
4 1000 1.29 2.84 1.33 1.33 0.00 1.36 2.84
8 1000 1.29 2.28 1.33 1.32 0.00 1.35 2.28
16 1000 1.31 2.28 1.34 1.34 0.00 1.36 2.28
32 1000 1.31 2.29 1.34 1.34 0.00 1.37 2.29
64 1000 1.36 3.16 1.38 1.39 0.00 1.44 3.16
128 1000 1.40 3.67 1.45 1.45 0.00 1.48 3.67
256 1000 2.11 4.55 2.17 2.18 0.03 2.36 4.55
512 1000 2.21 3.36 2.26 2.27 0.00 2.46 3.36
1024 1000 2.32 4.07 2.36 2.38 0.00 2.63 4.07
2048 1000 2.52 4.77 2.58 2.61 0.07 2.76 4.77
4096 1000 2.96 4.73 3.02 3.07 0.04 3.40 4.73
8192 1000 3.32 7.38 3.43 3.44 0.07 3.66 7.38
16384 1000 3.99 13.20 4.05 4.10 0.10 4.33 13.20
32768 1000 5.33 5.71 5.50 5.47 0.00 5.65 5.71
65536 1000 8.00 10.96 8.05 8.11 0.04 8.42 10.96
131072 1000 13.36 14.58 13.49 13.49 0.00 13.74 14.58
262144 1000 24.05 25.55 24.10 24.16 0.00 24.45 25.55
524288 1000 45.45 46.64 45.60 45.62 0.00 46.11 46.64
1048576 1000 88.23 90.04 89.11 89.01 0.00 89.46 90.04
2097152 1000 174.19 178.18 174.74 174.75 0.04 174.98 178.18
4194304 1000 345.46 349.33 345.99 345.99 0.03 346.18 349.33
8388608 1000 687.92 689.08 688.07 688.08 0.00 688.25 689.08
---------------------------------------------------------------------------------------
ib_send_bw -a 10.10.11.21 -F --report_gbits
---------------------------------------------------------------------------------------
Send BW Test
Dual-port : OFF Device : mlx5_0
Number of qps : 1 Transport type : IB
Connection type : RC Using SRQ : OFF
PCIe relax order: ON
ibv_wr* API : ON
TX depth : 128
CQ Moderation : 100
Mtu : 4096[B]
Link type : Ethernet
GID index : 4
Max inline data : 0[B]
rdma_cm QPs : OFF
Data ex. method : Ethernet
---------------------------------------------------------------------------------------
local address: LID 0000 QPN 0x022d PSN 0xcc91a
GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:10:11:11
remote address: LID 0000 QPN 0x118f PSN 0xdc89f
GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:10:11:21
---------------------------------------------------------------------------------------
#bytes #iterations BW peak[Gb/sec] BW average[Gb/sec] MsgRate[Mpps]
2 1000 0.052205 0.051348 3.209238
4 1000 0.17 0.17 5.221836
8 1000 0.34 0.34 5.304961
16 1000 0.68 0.68 5.320089
32 1000 1.37 1.36 5.327195
64 1000 2.71 2.68 5.226214
128 1000 5.44 5.44 5.308179
256 1000 10.89 10.88 5.313214
512 1000 21.73 21.70 5.298613
1024 1000 43.10 43.08 5.258549
2048 1000 79.29 79.22 4.835111
4096 1000 95.93 95.89 2.926339
8192 1000 96.92 96.91 1.478727
16384 1000 97.53 97.52 0.744006
32768 1000 97.77 97.77 0.372949
65536 1000 97.91 97.91 0.186742
131072 1000 97.97 97.97 0.093433
262144 1000 98.00 97.99 0.046727
524288 1000 98.02 98.02 0.023370
1048576 1000 98.03 98.03 0.011686
2097152 1000 98.03 98.03 0.005843
4194304 1000 98.03 98.03 0.002922
8388608 1000 98.04 98.04 0.001461
---------------------------------------------------------------------------------------
OpenMPI
mpirun -np 2 -host gpu01,gpu02 -x UCX_NET_DEVICES=mlx5_0:1 osu_latency
# OSU MPI Latency Test v7.2
# Size Latency (us)
# Datatype: MPI_CHAR.
1 1.35
2 1.33
4 1.33
8 1.33
16 1.33
32 1.39
64 1.50
128 1.51
256 1.82
512 1.82
1024 1.96
2048 2.93
4096 3.67
8192 4.75
16384 6.94
32768 8.96
65536 11.75
131072 16.81
262144 26.37
524288 47.83
1048576 91.22
2097152 176.93
4194304 348.30
mpirun -np 2 -host gpu01,gpu02 -x UCX_NET_DEVICES=mlx5_0:1 osu_bw
# OSU MPI Bandwidth Test v7.2
# Size Bandwidth (MB/s)
# Datatype: MPI_CHAR.
1 5.92
2 11.76
4 23.72
8 47.17
16 94.75
32 190.71
64 357.15
128 726.07
256 1269.35
512 2345.14
1024 4027.80
2048 6507.12
4096 9284.14
8192 10689.40
16384 11354.37
32768 11692.90
65536 11980.76
131072 12118.09
262144 12185.53
524288 12219.85
1048576 12237.13
2097152 12245.03
4194304 12249.79
mpirun -np 2 -host gpu01,gpu02 -x UCX_NET_DEVICES=ens121np0 osu_latency
# OSU MPI Latency Test v7.2
# Size Latency (us)
# Datatype: MPI_CHAR.
1 13.48
2 13.22
4 13.30
8 13.38
16 13.25
32 13.24
64 13.32
128 13.44
256 13.68
512 13.87
1024 14.15
2048 14.82
4096 15.76
8192 53.69
16384 44.04
32768 51.87
65536 67.73
131072 133.40
262144 156.58
524288 237.20
1048576 363.09
2097152 530.48
4194304 858.22
mpirun -np 2 -host gpu01,gpu02 -x UCX_NET_DEVICES=ens121np0 osu_bw
# OSU MPI Bandwidth Test v7.2
# Size Bandwidth (MB/s)
# Datatype: MPI_CHAR.
1 0.66
2 1.71
4 3.72
8 7.06
16 12.35
32 27.62
64 42.68
128 63.58
256 151.93
512 402.99
1024 825.05
2048 1253.38
4096 2052.76
8192 2204.76
16384 2328.53
32768 1932.05
65536 2558.81
131072 4215.51
262144 5911.52
524288 6973.49
1048576 6751.61
2097152 6457.68
4194304 6459.98