Mellanox CX-5 RoCE网络性能测试数据

做RoCE性能测试时,在网上没有找到太多有效的完整的性能参考数据。
故把实测数据贴出,方便后来者进行参考对比。
系统未做明显优化,仅跑了一遍tuned_adm和mlnx_tune。

结果

24c439fceb4249ab723698268ff3086.png
39dc6d008189f46073b721a78c50ad8.png

系统信息

Ubuntu20.04

GenuineIntel Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz N/A

Memory Status Total: 1007.53 GB Free: 898.98 GB

MLNX_OFED_LINUX-23.07-0.5.1.2

ConnectX-5 Device Status on PCI 17:00.0 100 Gb/sec (4X EDR)

perftest

测试RoCE v1和v2无明显差别。仅列举v2数据。

ib_atomic_bw 10.10.11.21 -F -n 100000
---------------------------------------------------------------------------------------
                    Atomic FETCH_AND_ADD BW Test
 Dual-port       : OFF          Device         : mlx5_0
 Number of qps   : 1            Transport type : IB
 Connection type : RC           Using SRQ      : OFF
 PCIe relax order: ON
 ibv_wr* API     : ON
 TX depth        : 128
 CQ Moderation   : 100
 Mtu             : 4096[B]
 Link type       : Ethernet
 GID index       : 4
 Outstand reads  : 16
 rdma_cm QPs     : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
 local address: LID 0000 QPN 0x0224 PSN 0x5c1839
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:10:11:11
 remote address: LID 0000 QPN 0x1186 PSN 0xbd5c66
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:10:11:21
---------------------------------------------------------------------------------------
 #bytes     #iterations    BW peak[MB/sec]    BW average[MB/sec]   MsgRate[Mpps]
 8          100000           0.00               11.06              1.449399
------------------------------------------------------------------------------------

ib_atomic_lat 10.10.11.21 -F -n 100000
---------------------------------------------------------------------------------------
                    Atomic FETCH_AND_ADD Latency Test
 Dual-port       : OFF          Device         : mlx5_0
 Number of qps   : 1            Transport type : IB
 Connection type : RC           Using SRQ      : OFF
 PCIe relax order: ON
 ibv_wr* API     : ON
 TX depth        : 1
 Mtu             : 4096[B]
 Link type       : Ethernet
 GID index       : 4
 Outstand reads  : 16
 rdma_cm QPs     : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
 local address: LID 0000 QPN 0x0226 PSN 0xf7135e
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:10:11:11
 remote address: LID 0000 QPN 0x1188 PSN 0xeff4d8
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:10:11:21
---------------------------------------------------------------------------------------
 #bytes #iterations    t_min[usec]    t_max[usec]  t_typical[usec]    t_avg[usec]    t_stdev[usec]   99% percentile[usec]   99.9% percentile[usec]
 8       100000          2.65           10.36        2.72              2.73             0.10            2.87                    4.04
---------------------------------------------------------------------------------------


ib_write_lat -a 10.10.11.21 -F
---------------------------------------------------------------------------------------
                    RDMA_Write Latency Test
 Dual-port       : OFF          Device         : mlx5_0
 Number of qps   : 1            Transport type : IB
 Connection type : RC           Using SRQ      : OFF
 PCIe relax order: OFF
 ibv_wr* API     : ON
 TX depth        : 1
 Mtu             : 4096[B]
 Link type       : Ethernet
 GID index       : 4
 Max inline data : 220[B]
 rdma_cm QPs     : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
 local address: LID 0000 QPN 0x0227 PSN 0x8f570f RKey 0x1813ab VAddr 0x007f0d4e0b2000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:10:11:11
 remote address: LID 0000 QPN 0x1189 PSN 0x293652 RKey 0x1871f1 VAddr 0x007f23e376c000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:10:11:21
---------------------------------------------------------------------------------------
 #bytes #iterations    t_min[usec]    t_max[usec]  t_typical[usec]    t_avg[usec]    t_stdev[usec]   99% percentile[usec]   99.9% percentile[usec]
 2       1000          1.49           2.33         1.52                1.52             0.00            1.58                    2.33
 4       1000          1.39           2.31         1.41                1.41             0.00            1.43                    2.31
 8       1000          1.39           2.55         1.42                1.42             0.00            1.43                    2.55
 16      1000          1.39           2.80         1.42                1.42             0.00            1.43                    2.80
 32      1000          1.42           2.64         1.44                1.44             0.00            1.46                    2.64
 64      1000          1.43           2.42         1.46                1.47             0.00            1.55                    2.42
 128     1000          1.47           1.75         1.50                1.50             0.00            1.62                    1.75
 256     1000          2.22           2.47         2.25                2.26             0.00            2.41                    2.47
 512     1000          2.36           3.65         2.39                2.41             0.00            2.59                    3.65
 1024    1000          2.45           2.82         2.49                2.52             0.00            2.70                    2.82
 2048    1000          2.66           4.68         2.70                2.74             0.00            2.94                    4.68
 4096    1000          3.09           4.15         3.19                3.21             0.00            3.41                    4.15
 8192    1000          3.45           3.81         3.56                3.55             0.00            3.69                    3.81
 16384   1000          4.13           5.93         4.22                4.25             0.00            4.53                    5.93
 32768   1000          5.45           6.96         5.51                5.56             0.00            5.80                    6.96
 65536   1000          8.13           9.88         8.20                8.22             0.00            8.42                    9.88
 131072  1000          13.51          15.79        13.67               13.68            0.00            13.80                   15.79
 262144  1000          24.19          24.66        24.31               24.34            0.00            24.63                   24.66
 524288  1000          45.57          47.44        45.61               45.64            0.03            45.99                   47.44
 1048576 1000          88.37          89.54        89.20               89.12            0.00            89.48                   89.54
 2097152 1000          174.00         177.11       174.77              174.80           0.04            175.06                  177.11
 4194304 1000          345.85         346.20       345.99              345.99           0.00            346.14                  346.20
 8388608 1000          688.12         688.72       688.17              688.22           0.00            688.57                  688.72
---------------------------------------------------------------------------------------


ib_write_bw -a 10.10.11.21 -F --report_gbits
---------------------------------------------------------------------------------------
                    RDMA_Write BW Test
 Dual-port       : OFF          Device         : mlx5_0
 Number of qps   : 1            Transport type : IB
 Connection type : RC           Using SRQ      : OFF
 PCIe relax order: ON
 ibv_wr* API     : ON
 TX depth        : 128
 CQ Moderation   : 100
 Mtu             : 4096[B]
 Link type       : Ethernet
 GID index       : 4
 Max inline data : 0[B]
 rdma_cm QPs     : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
 local address: LID 0000 QPN 0x0229 PSN 0xff1ace RKey 0x203dbd VAddr 0x007f2b168e1000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:10:11:11
 remote address: LID 0000 QPN 0x118b PSN 0xdeb80e RKey 0x027200 VAddr 0x007f1f13f19000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:10:11:21
---------------------------------------------------------------------------------------
 #bytes     #iterations    BW peak[Gb/sec]    BW average[Gb/sec]   MsgRate[Mpps]
 2          5000             0.11               0.11               6.881392
 4          5000             0.17               0.17               5.328560
 8          5000             0.48               0.48               7.554181
 16         5000             0.97               0.97               7.542184
 32         5000             1.94               1.94               7.570855
 64         5000             3.88               3.88               7.571238
 128        5000             7.74               7.74               7.556555
 256        5000             15.53              15.50              7.570179
 512        5000             30.89              30.84              7.530099
 1024       5000             59.02              58.95              7.195514
 2048       5000             89.85              89.65              5.471935
 4096       5000             97.47              97.40              2.972274
 8192       5000             97.75              97.72              1.491165
 16384      5000             97.89              97.88              0.746774
 32768      5000             97.96              97.96              0.373695
 65536      5000             98.01              98.00              0.186923
 131072     5000             98.02              98.01              0.093474
 262144     5000             98.03              98.03              0.046743
 524288     5000             98.03              98.03              0.023373
 1048576    5000             98.04              98.04              0.011687
 2097152    5000             98.04              98.04              0.005843
 4194304    5000             98.04              98.04              0.002922
 8388608    5000             98.04              98.04              0.001461
---------------------------------------------------------------------------------------


ib_read_lat -a 10.10.11.21 -F
---------------------------------------------------------------------------------------
                    RDMA_Read Latency Test
 Dual-port       : OFF          Device         : mlx5_0
 Number of qps   : 1            Transport type : IB
 Connection type : RC           Using SRQ      : OFF
 PCIe relax order: ON
 ibv_wr* API     : ON
 TX depth        : 1
 Mtu             : 4096[B]
 Link type       : Ethernet
 GID index       : 4
 Outstand reads  : 16
 rdma_cm QPs     : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
 local address: LID 0000 QPN 0x022a PSN 0xd157ea OUT 0x10 RKey 0x203dbd VAddr 0x007f87f9bad000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:10:11:11
 remote address: LID 0000 QPN 0x118c PSN 0x313e2c OUT 0x10 RKey 0x027200 VAddr 0x007fd732cf8000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:10:11:21
---------------------------------------------------------------------------------------
 #bytes #iterations    t_min[usec]    t_max[usec]  t_typical[usec]    t_avg[usec]    t_stdev[usec]   99% percentile[usec]   99.9% percentile[usec]
 2       1000          2.66           2.95         2.71                2.72             0.00            2.92                    2.95
 4       1000          2.67           3.63         2.71                2.72             0.00            2.84                    3.63
 8       1000          2.72           3.22         2.78                2.79             0.00            3.02                    3.22
 16      1000          2.84           3.30         2.89                2.93             0.00            3.27                    3.30
 32      1000          2.84           3.30         2.89                2.93             0.00            3.26                    3.30
 64      1000          2.84           3.27         2.90                2.93             0.00            3.24                    3.27
 128     1000          2.87           3.74         2.93                2.93             0.00            3.01                    3.74
 256     1000          2.93           3.65         3.00                3.01             0.00            3.18                    3.65
 512     1000          3.03           3.47         3.10                3.13             0.00            3.33                    3.47
 1024    1000          3.16           3.63         3.23                3.26             0.00            3.44                    3.63
 2048    1000          3.39           4.03         3.47                3.51             0.00            3.91                    4.03
 4096    1000          3.93           4.37         4.04                4.05             0.00            4.22                    4.37
 8192    1000          4.48           4.97         4.63                4.65             0.00            4.88                    4.97
 16384   1000          5.28           6.10         5.55                5.59             0.00            5.96                    6.10
 32768   1000          6.66           7.63         6.90                6.97             0.00            7.51                    7.63
 65536   1000          9.36           12.17        9.76                9.80             0.03            10.59                   12.17
 131072  1000          14.06          17.44        14.24               14.41            0.29            16.09                   17.44
 262144  1000          24.75          28.09        24.91               24.95            0.10            25.33                   28.09
 524288  1000          46.15          54.60        46.31               46.60            1.06            53.16                   54.60
 1048576 1000          88.93          102.07       89.07               89.39            1.58            100.08                  102.07
 2097152 1000          174.49         201.24       175.41              175.52           2.04            179.07                  201.24
 4194304 1000          345.65         347.28       346.62              346.63           0.00            346.97                  347.28
 8388608 1000          688.47         781.16       688.81              688.93           1.36            689.32                  781.16
---------------------------------------------------------------------------------------


ib_read_bw -a 10.10.11.21 -F --report_gbits
---------------------------------------------------------------------------------------
                    RDMA_Read BW Test
 Dual-port       : OFF          Device         : mlx5_0
 Number of qps   : 1            Transport type : IB
 Connection type : RC           Using SRQ      : OFF
 PCIe relax order: ON
 ibv_wr* API     : ON
 TX depth        : 128
 CQ Moderation   : 100
 Mtu             : 4096[B]
 Link type       : Ethernet
 GID index       : 4
 Outstand reads  : 16
 rdma_cm QPs     : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
 local address: LID 0000 QPN 0x022b PSN 0x1959d2 OUT 0x10 RKey 0x203dbd VAddr 0x007ff2ed43d000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:10:11:11
 remote address: LID 0000 QPN 0x118d PSN 0x68625d OUT 0x10 RKey 0x027200 VAddr 0x007fad14218000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:10:11:21
---------------------------------------------------------------------------------------
 #bytes     #iterations    BW peak[Gb/sec]    BW average[Gb/sec]   MsgRate[Mpps]
 2          1000           0.069752            0.068694            4.293375
 4          1000             0.17               0.17               5.186768
 8          1000             0.34               0.34               5.290751
 16         1000             0.67               0.67               5.251310
 32         1000             1.35               1.34               5.235130
 64         1000             2.71               2.70               5.282983
 128        1000             5.41               5.41               5.280663
 256        1000             10.73              10.71              5.228649
 512        1000             21.73              21.69              5.295133
 1024       1000             41.26              41.17              5.025470
 2048       1000             61.95              61.89              3.777417
 4096       1000             83.82              83.80              2.557335
 8192       1000             85.08              85.06              1.297944
 16384      1000             88.15              88.13              0.672377
 32768      1000             97.85              97.84              0.373236
 65536      1000             97.94              97.94              0.186813
 131072     1000             97.99              97.99              0.093450
 262144     1000             98.02              98.02              0.046739
 524288     1000             98.02              98.02              0.023370
 1048576    1000             98.03              98.03              0.011686
 2097152    1000             98.04              98.04              0.005843
 4194304    1000             98.04              98.04              0.002922
 8388608    1000             98.04              98.04              0.001461
---------------------------------------------------------------------------------------


ib_send_lat -a 10.10.11.21 -F
---------------------------------------------------------------------------------------
                    Send Latency Test
 Dual-port       : OFF          Device         : mlx5_0
 Number of qps   : 1            Transport type : IB
 Connection type : RC           Using SRQ      : OFF
 PCIe relax order: ON
 ibv_wr* API     : ON
 TX depth        : 1
 Mtu             : 4096[B]
 Link type       : Ethernet
 GID index       : 4
 Max inline data : 236[B]
 rdma_cm QPs     : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
 local address: LID 0000 QPN 0x022c PSN 0xb7944e
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:10:11:11
 remote address: LID 0000 QPN 0x118e PSN 0x3a106
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:10:11:21
---------------------------------------------------------------------------------------
 #bytes #iterations    t_min[usec]    t_max[usec]  t_typical[usec]    t_avg[usec]    t_stdev[usec]   99% percentile[usec]   99.9% percentile[usec]
 2       1000          1.37           6.00         1.41                1.42             0.06            1.52                    6.00
 4       1000          1.29           2.84         1.33                1.33             0.00            1.36                    2.84
 8       1000          1.29           2.28         1.33                1.32             0.00            1.35                    2.28
 16      1000          1.31           2.28         1.34                1.34             0.00            1.36                    2.28
 32      1000          1.31           2.29         1.34                1.34             0.00            1.37                    2.29
 64      1000          1.36           3.16         1.38                1.39             0.00            1.44                    3.16
 128     1000          1.40           3.67         1.45                1.45             0.00            1.48                    3.67
 256     1000          2.11           4.55         2.17                2.18             0.03            2.36                    4.55
 512     1000          2.21           3.36         2.26                2.27             0.00            2.46                    3.36
 1024    1000          2.32           4.07         2.36                2.38             0.00            2.63                    4.07
 2048    1000          2.52           4.77         2.58                2.61             0.07            2.76                    4.77
 4096    1000          2.96           4.73         3.02                3.07             0.04            3.40                    4.73
 8192    1000          3.32           7.38         3.43                3.44             0.07            3.66                    7.38
 16384   1000          3.99           13.20        4.05                4.10             0.10            4.33                    13.20
 32768   1000          5.33           5.71         5.50                5.47             0.00            5.65                    5.71
 65536   1000          8.00           10.96        8.05                8.11             0.04            8.42                    10.96
 131072  1000          13.36          14.58        13.49               13.49            0.00            13.74                   14.58
 262144  1000          24.05          25.55        24.10               24.16            0.00            24.45                   25.55
 524288  1000          45.45          46.64        45.60               45.62            0.00            46.11                   46.64
 1048576 1000          88.23          90.04        89.11               89.01            0.00            89.46                   90.04
 2097152 1000          174.19         178.18       174.74              174.75           0.04            174.98                  178.18
 4194304 1000          345.46         349.33       345.99              345.99           0.03            346.18                  349.33
 8388608 1000          687.92         689.08       688.07              688.08           0.00            688.25                  689.08
---------------------------------------------------------------------------------------


ib_send_bw -a 10.10.11.21 -F --report_gbits
---------------------------------------------------------------------------------------
                    Send BW Test
 Dual-port       : OFF          Device         : mlx5_0
 Number of qps   : 1            Transport type : IB
 Connection type : RC           Using SRQ      : OFF
 PCIe relax order: ON
 ibv_wr* API     : ON
 TX depth        : 128
 CQ Moderation   : 100
 Mtu             : 4096[B]
 Link type       : Ethernet
 GID index       : 4
 Max inline data : 0[B]
 rdma_cm QPs     : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
 local address: LID 0000 QPN 0x022d PSN 0xcc91a
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:10:11:11
 remote address: LID 0000 QPN 0x118f PSN 0xdc89f
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:10:11:21
---------------------------------------------------------------------------------------
 #bytes     #iterations    BW peak[Gb/sec]    BW average[Gb/sec]   MsgRate[Mpps]
 2          1000           0.052205            0.051348            3.209238
 4          1000             0.17               0.17               5.221836
 8          1000             0.34               0.34               5.304961
 16         1000             0.68               0.68               5.320089
 32         1000             1.37               1.36               5.327195
 64         1000             2.71               2.68               5.226214
 128        1000             5.44               5.44               5.308179
 256        1000             10.89              10.88              5.313214
 512        1000             21.73              21.70              5.298613
 1024       1000             43.10              43.08              5.258549
 2048       1000             79.29              79.22              4.835111
 4096       1000             95.93              95.89              2.926339
 8192       1000             96.92              96.91              1.478727
 16384      1000             97.53              97.52              0.744006
 32768      1000             97.77              97.77              0.372949
 65536      1000             97.91              97.91              0.186742
 131072     1000             97.97              97.97              0.093433
 262144     1000             98.00              97.99              0.046727
 524288     1000             98.02              98.02              0.023370
 1048576    1000             98.03              98.03              0.011686
 2097152    1000             98.03              98.03              0.005843
 4194304    1000             98.03              98.03              0.002922
 8388608    1000             98.04              98.04              0.001461
---------------------------------------------------------------------------------------

OpenMPI

mpirun -np 2 -host gpu01,gpu02 -x UCX_NET_DEVICES=mlx5_0:1 osu_latency
# OSU MPI Latency Test v7.2
# Size          Latency (us)
# Datatype: MPI_CHAR.
1                       1.35
2                       1.33
4                       1.33
8                       1.33
16                      1.33
32                      1.39
64                      1.50
128                     1.51
256                     1.82
512                     1.82
1024                    1.96
2048                    2.93
4096                    3.67
8192                    4.75
16384                   6.94
32768                   8.96
65536                  11.75
131072                 16.81
262144                 26.37
524288                 47.83
1048576                91.22
2097152               176.93
4194304               348.30


 mpirun -np 2 -host gpu01,gpu02 -x UCX_NET_DEVICES=mlx5_0:1 osu_bw
# OSU MPI Bandwidth Test v7.2
# Size      Bandwidth (MB/s)
# Datatype: MPI_CHAR.
1                       5.92
2                      11.76
4                      23.72
8                      47.17
16                     94.75
32                    190.71
64                    357.15
128                   726.07
256                  1269.35
512                  2345.14
1024                 4027.80
2048                 6507.12
4096                 9284.14
8192                10689.40
16384               11354.37
32768               11692.90
65536               11980.76
131072              12118.09
262144              12185.53
524288              12219.85
1048576             12237.13
2097152             12245.03
4194304             12249.79


mpirun -np 2 -host gpu01,gpu02 -x UCX_NET_DEVICES=ens121np0 osu_latency
# OSU MPI Latency Test v7.2
# Size          Latency (us)
# Datatype: MPI_CHAR.
1                      13.48
2                      13.22
4                      13.30
8                      13.38
16                     13.25
32                     13.24
64                     13.32
128                    13.44
256                    13.68
512                    13.87
1024                   14.15
2048                   14.82
4096                   15.76
8192                   53.69
16384                  44.04
32768                  51.87
65536                  67.73
131072                133.40
262144                156.58
524288                237.20
1048576               363.09
2097152               530.48
4194304               858.22


mpirun -np 2 -host gpu01,gpu02 -x UCX_NET_DEVICES=ens121np0 osu_bw
# OSU MPI Bandwidth Test v7.2
# Size      Bandwidth (MB/s)
# Datatype: MPI_CHAR.
1                       0.66
2                       1.71
4                       3.72
8                       7.06
16                     12.35
32                     27.62
64                     42.68
128                    63.58
256                   151.93
512                   402.99
1024                  825.05
2048                 1253.38
4096                 2052.76
8192                 2204.76
16384                2328.53
32768                1932.05
65536                2558.81
131072               4215.51
262144               5911.52
524288               6973.49
1048576              6751.61
2097152              6457.68
4194304              6459.98

你可能感兴趣的:(Mellanox CX-5 RoCE网络性能测试数据)