原文地址:http://kerrigan.sinaapp.com/post-8.html
作者:Roger
1.pktgen简介:
pktgen是Linux内核里包含的一个高性能发包工具,主要用来测试网络性能。一般情况下,使用pktgen就可以满足千兆网卡的测试需要,不必花钱购买昂贵的硬件发包设备。pktgen运行在“内核态”,并不占用太多的系统资源,就可以达到非常高的发包速率。
pktgen只支持UDP发包(端口9)。因为pktgen是一个非常底层测试工具,而且一般是测试网络设备的性能,并不涉及到应用层面。如果要测试高级的网络应用的性能,请使用其它的测试工具。
推荐阅读:
pktgen官网:http://tslab.ssvl.kth.se/pktgen/
pktgen论文:www.kernel.org/doc/ols/2005/ols2005v2-pages-19-32.pdf
2.开始实验:
环境:
1 Summary: Dell R620, 2 x Intel 2.00GHz, 47.2GB / 48GB 1333MHz DDR3
2 Network: em1 (tg3): Broadcom NetXtreme BCM5720 Gigabit PCIe, d4:ae:52:7c:8d:90, 1Gb/s
3 Network: em2 (tg3): Broadcom NetXtreme BCM5720 Gigabit PCIe, d4:ae:52:7c:8d:90, no carrier
4 Network: em3 (tg3): Broadcom NetXtreme BCM5720 Gigabit PCIe, d4:ae:52:7c:8d:92, 1Gb/s
5 Network: em4 (tg3): Broadcom NetXtreme BCM5720 Gigabit PCIe, d4:ae:52:7c:8d:93, 1Gb/s
我们只用一台服务器,采取网卡em3和em4直接用网线对接的方式,用这种最基本的测试方式学习pktgen的使用。
1 +-----------+
2 | R620 |
3 | em3 +----+
4 | | |
5 | | |
6 | em4 +----+
7 +-----------+
2.1升级内核,网卡驱动,pktgen到最新(这些非必须),但文档推荐2.6.38以上的内核,具体请读官方文档。
2.2加载pktgen内核模块。
01 [root@R620 ~]# modprobe pktgen
02 [root@R620 ~]# modinfo pktgen
03 filename: /lib/modules/2.6.39/kernel/net/core/pktgen.ko
04 version: 2.75
05 license: GPL
06 description: Packet Generator tool
07 author: Robert Olsson
08 srcversion: 0F4DEF57CF501778B10F44B
09 depends:
10 vermagic: 2.6.39 SMP mod_unload modversions
11 parm: pg_count_d:Default number of packets to inject (int)
12 parm: pg_delay_d:Default delay between packets (nanoseconds) (int)
13 parm: pg_clone_skb_d:Default number of copies of the same packet (int)
14 parm: debug:Enable debugging of pktgen module (int)
加载内核模块后,我们可以看到pktgen已经在/proc文件系统里和内核线程中出现了。
01 [root@R620 ~]# ls /proc/net/pktgen/
02 em3 kpktgend_10 kpktgend_13 kpktgend_16 kpktgend_19 kpktgend_21 kpktgend_3 kpktgend_6 kpktgend_9
03 kpktgend_0 kpktgend_11 kpktgend_14 kpktgend_17 kpktgend_2 kpktgend_22 kpktgend_4 kpktgend_7 pgctrl
04 kpktgend_1 kpktgend_12 kpktgend_15 kpktgend_18 kpktgend_20 kpktgend_23 kpktgend_5 kpktgend_8 pgrx
05 [root@R620 ~]# ps ax | grep kpktgend
06 3054 ? S 12:06 [kpktgend_0]
07 3055 ? S 0:07 [kpktgend_1]
08 3056 ? S 0:07 [kpktgend_2]
09 3057 ? S 0:07 [kpktgend_3]
10 3058 ? S 0:07 [kpktgend_4]
11 3059 ? S 0:06 [kpktgend_5]
12 3060 ? S 0:07 [kpktgend_6]
13 3061 ? S 0:07 [kpktgend_7]
14 3062 ? S 0:06 [kpktgend_8]
15 3063 ? S 0:07 [kpktgend_9]
16 3064 ? S 0:07 [kpktgend_10]
17 3065 ? S 0:07 [kpktgend_11]
18 3066 ? S 0:06 [kpktgend_12]
19 3067 ? S 0:06 [kpktgend_13]
20 3068 ? S 0:05 [kpktgend_14]
21 3069 ? S 0:06 [kpktgend_15]
22 3070 ? S 0:06 [kpktgend_16]
23 3071 ? S 0:06 [kpktgend_17]
24 3072 ? S 0:06 [kpktgend_18]
25 3073 ? S 0:07 [kpktgend_19]
26 3074 ? S 0:06 [kpktgend_20]
27 3075 ? S 0:06 [kpktgend_21]
28 3076 ? S 0:07 [kpktgend_22]
29 3077 ? S 0:06 [kpktgend_23]
30 6835 pts/0 S+ 0:00 grep kpktgend
这里,每一个线程对应到一个CPU,由于我这台服务器是24核,所以有24个线程。
2.3下载pktgen的范例配置文件范本
pktgen的配置还是比较复杂的,所以官网为我们提供了比较丰富的配置文件范本,现在把它下载下来。
1 [root@R620 tmp]# mkdir pktgen
2 [root@R620 tmp]# cd pktgen
3 [root@R620 pktgen]# lftp -e mirror ftp://robur.slu.se/pub/Linux/net-development/pktgen-testing/
4 [root@R620 pktgen]# ls
5 examples old pktgen.c pktgen.c.030511 pktgen-HOWTO.txt pktgen_paper.pdf pktgen.sh TODO
pktgen.sh是我们这次用的配置文件,更多的配置文件在examples文件夹里。
2.4修改配置文件以符合我们的环境
下面我们来改改pktgen.sh,由于我用一台服务器,网卡对接,所以配置文件如下。
01 [root@R620 pktgen]# cp pktgen.sh test.sh
02 [root@R620 pktgen]# vi test.sh
03 [root@R620 pktgen]# cat test.sh
04 #!/bin/sh
05 # pktgen.conf -- Sample configuration for send on two devices on a UP system
06
07 #modprobe pktgen
08
09 function pgset() {
10 local result
11
12 echo $1 > $PGDEV
13
14 result=`cat $PGDEV | fgrep "Result: OK:"`
15 if [ "$result" = "" ]; then
16 cat $PGDEV | fgrep Result:
17 fi
18 }
19
20 function pg() {
21 echo inject > $PGDEV
22 cat $PGDEV
23 }
24
25 # On UP systems only one thread exists -- so just add devices
26 # We use eth1, eth2
27
28 echo "Adding devices to run".
29
30 PGDEV=/proc/net/pktgen/kpktgend_0
31 pgset "rem_device_all"
32 pgset "add_device em3"
33 pgset "max_before_softirq 10000"
34
35 # Configure the individual devices
36 echo "Configuring devices"
37
38 PGDEV=/proc/net/pktgen/em3
39
40 pgset "clone_skb 10000"
41 pgset "pkt_size 60"
42 pgset "src_mac D4:AE:52:7C:8D:92"
43 pgset "src_min 10.0.0.1"
44 pgset "src_max 10.255.255.255"
45 #pgset "dst 10.10.10.3"
46 pgset "dst_mac D4:AE:52:7C:8D:93"
47 pgset "count 0"
48
49 # Time to run
50
51 PGDEV=/proc/net/pktgen/pgctrl
52
53 echo "Running... ctrl^C to stop"
54
55 pgset "start"
56
57 echo "Done"
需要注意的是,配置文件里虽然写的 pgset "pkt_size 60",但实际发包会是64byte,需要加上4字节的CRC校验位。
所以我们想测试64byte的速率,这里就写60,以此类推。
这里相当于配置了网卡3发包,网卡4收包。接下来配置网卡4中断向量表绑定关系。
1 [root@R620 pktgen]# service irqbalance stop
2 Stopping irqbalance: [FAILED]
3 [root@R620 pktgen]# echo 10 > /proc/irq/133/smp_affinity
4 [root@R620 pktgen]# echo 40 > /proc/irq/134/smp_affinity
5 [root@R620 pktgen]# echo 100 > /proc/irq/135/smp_affinity
6 [root@R620 pktgen]# echo 400 > /proc/irq/136/smp_affinity
2.5开始测试
然后执行test.sh,过个几秒后打断。因为设置了pgset "count 0" 所以无限发包。
1 [root@R620 pktgen]# sh test.sh
2 Adding devices to run.
3 Configuring devices
4 Running... ctrl^C to stop
5 ^C
6 [root@R620 pktgen]#
测试完毕,查看统计数据。
01 [root@R620 pktgen]# cat /proc/net/pktgen/em3
02 Params: count 0 min_pkt_size: 60 max_pkt_size: 60
03 frags: 0 delay: 0 clone_skb: 10000 ifname: em3
04 flows: 0 flowlen: 0
05 queue_map_min: 0 queue_map_max: 0
06 dst_min: dst_max:
07 src_min: 10.10.10.2 src_max: 10.255.255.255
08 src_mac: d4:ae:52:7c:8d:92 dst_mac: d4:ae:52:7c:8d:93
09 udp_src_min: 9 udp_src_max: 9 udp_dst_min: 9 udp_dst_max: 9
10 src_mac_count: 0 dst_mac_count: 0
11 Flags:
12 Current:
13 pkts-sofar: 72155155 errors: 0
14 started: 491707033us stopped: 540275587us idle: 545us
15 seq_num: 72155156 cur_dst_mac_offset: 0 cur_src_mac_offset: 0
16 cur_saddr: 0x32260a0a cur_daddr: 0x0
17 cur_udp_dst: 9 cur_udp_src: 9
18 cur_queue_map: 0
19 flows: 0
20 Result: OK: 48568554(c48568008+d545) usec, 72155155 (60byte,0frags)
21 1485635pps 713Mb/sec (713104800bps) errors: 0
从这里看到,64byte的包,我们发包的速率是148kpps,达到线速。我们再看看sar统计数据:
1 [root@R620 ~]# sar -n DEV 2
2 Linux 2.6.39 (R620.co3) 07/11/2012 _x86_64_ (24 CPU)
3
4 09:12:11 PM IFACE rxpck/s txpck/s rxkB/s txkB/s rxcmp/s txcmp/s rxmcst/s
5 09:12:13 PM lo 0.00 0.00 0.00 0.00 0.00 0.00 0.00
6 09:12:13 PM em1 15.50 13.50 1.60 4.09 0.00 0.00 0.50
7 09:12:13 PM em2 0.00 0.00 0.00 0.00 0.00 0.00 0.00
8 09:12:13 PM em3 0.00 1491070.50 0.00 93191.75 0.00 0.00 0.00
9 09:12:13 PM em4 1491062.00 0.00 93191.38 0.00 0.00 0.00 0.00
的确是单机发包收包一致,达到了线速。
另外,最新的pktgen也提供了一个额外的收包统计工具。需要在测试前开启,em4是收包的端口:
1 [root@R620 ~]# echo rx em4 > /proc/net/pktgen/pgrx
测试完毕后,看看这个文件。统计了每个核心的收包情况,非常直观。
01 [root@R620 pktgen]# cat /proc/net/pktgen/pgrx
02 RECEPTION STATISTICS
03 PER-CPU Stats
04 CPU 0: Rx packets: 0 Rx bytes: 0
05 CPU 1: Rx packets: 0 Rx bytes: 0
06 CPU 2: Rx packets: 0 Rx bytes: 0
07 CPU 3: Rx packets: 0 Rx bytes: 0
08 CPU 4: Rx packets: 1990624 Rx bytes: 119437440
09 Work time 5394665 us
10 Rate: 368998pps 177Mb/sec (177119342bps)
11 CPU 5: Rx packets: 0 Rx bytes: 0
12 CPU 6: Rx packets: 2041302 Rx bytes: 122478120
13 Work time 5427048 us
14 Rate: 376134pps 180Mb/sec (180544738bps)
15 CPU 7: Rx packets: 0 Rx bytes: 0
16 CPU 8: Rx packets: 2033521 Rx bytes: 122011260
17 Work time 5396417 us
18 Rate: 376827pps 180Mb/sec (180877437bps)
19 CPU 9: Rx packets: 0 Rx bytes: 0
20 CPU 10: Rx packets: 2010000 Rx bytes: 120600000
21 Work time 5376251 us
22 Rate: 373866pps 179Mb/sec (179455907bps)
23 CPU 11: Rx packets: 0 Rx bytes: 0
24 CPU 12: Rx packets: 0 Rx bytes: 0
25 CPU 13: Rx packets: 0 Rx bytes: 0
26 CPU 14: Rx packets: 0 Rx bytes: 0
27 CPU 15: Rx packets: 0 Rx bytes: 0
28 CPU 16: Rx packets: 0 Rx bytes: 0
29 CPU 17: Rx packets: 0 Rx bytes: 0
30 CPU 18: Rx packets: 0 Rx bytes: 0
31 CPU 19: Rx packets: 0 Rx bytes: 0
32 CPU 20: Rx packets: 0 Rx bytes: 0
33 CPU 21: Rx packets: 0 Rx bytes: 0
34 CPU 22: Rx packets: 0 Rx bytes: 0
35 CPU 23: Rx packets: 0 Rx bytes: 0
36
37 Global Statistics
38 Packets Rx: 8075447 Bytes Rx: 484526820
39 Work time 5427048 us
40 Rate: 1487999pps 714Mb/sec (714239962bps)
3.换一种测试方法
在现实环境中,同服务器网卡对接这种场景是见不到的,我们下面再拿一台服务器,用两台服务器两个网卡对接。
1 +-----------+ +-----------+
2 | R620 | | R720 |
3 | em3 +-----------+ em3 |
4 | | | |
5 | | | |
6 | em4 +-----------+ em4 |
7 +-----------+ +-----------+
我们使用R620 em3发包,R720 em3收包,再从R720 em4转发回 R620 em4。来测试转发性能。
配置文件:
01 [root@R620 pktgen]# cat test.sh
02 #!/bin/sh
03 # pktgen.conf -- Sample configuration for send on two devices on a UP system
04
05 #modprobe pktgen
06
07 function pgset() {
08 local result
09
10 echo $1 > $PGDEV
11
12 result=`cat $PGDEV | fgrep "Result: OK:"`
13 if [ "$result" = "" ]; then
14 cat $PGDEV | fgrep Result:
15 fi
16 }
17
18 function pg() {
19 echo inject > $PGDEV
20 cat $PGDEV
21 }
22
23 # On UP systems only one thread exists -- so just add devices
24 # We use eth1, eth2
25
26 echo "Adding devices to run".
27
28 PGDEV=/proc/net/pktgen/kpktgend_0
29 pgset "rem_device_all"
30 pgset "add_device em3"
31 pgset "max_before_softirq 10000"
32
33 # Configure the individual devices
34 echo "Configuring devices"
35
36 PGDEV=/proc/net/pktgen/em3
37
38 pgset "clone_skb 10000"
39 pgset "pkt_size 60"
40 pgset "src_mac D4:AE:52:7C:8D:92"
41 pgset "src_min 10.10.10.2"
42 #pgset "src_max 10.255.255.255"
43 pgset "dst 192.168.0.2"
44 pgset "dst_mac 24:B6:FD:F4:95:1A"
45 pgset "count 0"
46
47 #
48
49 # Time to run
50
51 PGDEV=/proc/net/pktgen/pgctrl
52
53 echo "Running... ctrl^C to stop"
54
55 pgset "start"
56
57 echo "Done"
这是发包端R620的配置,源mac和ip填写自己的em3,目标mac写对端的em3,目标ip写自己的em4,这样让R720做转发。
R620开始发包。R720 打开转发。
1 [root@r720xd-a ~]# echo 1 > /proc/sys/net/ipv4/conf/all/forwarding
R720 看看结果
1 [root@r720xd-a ~]# sar -n DEV 10
2 Linux 2.6.32-220.el6.x86_64 (r720xd-a.co3) 07/11/2012 _x86_64_ (24 CPU)
3
4 07:34:46 PM IFACE rxpck/s txpck/s rxkB/s txkB/s rxcmp/s txcmp/s rxmcst/s
5 07:34:56 PM lo 0.00 0.00 0.00 0.00 0.00 0.00 0.00
6 07:34:56 PM em1 2.50 0.60 0.75 0.07 0.00 0.00 0.50
7 07:34:56 PM em2 0.00 0.00 0.00 0.00 0.00 0.00 0.00
8 07:34:56 PM em3 1488032.20 0.00 93002.01 0.00 0.00 0.00 0.00
9 07:34:56 PM em4 0.10 1053511.30 0.11 65844.43 0.00 0.00 0.00
现在看到,R720的收包达到了148.8kpps线速,转发的包只有105kpps。丢包率30%。
可以看到很多帧对齐错误。不知道是不是因为这引起的,具体再研究。
1 [root@r720xd-a ~]# sar -n EDEV 10
2 Linux 2.6.32-220.el6.x86_64 (r720xd-a.co3) 07/11/2012 _x86_64_ (24 CPU)
3
4 07:35:45 PM IFACE rxerr/s txerr/s coll/s rxdrop/s txdrop/s txcarr/s rxfram/s rxfifo/s txfifo/s
5 07:35:55 PM lo 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
6 07:35:55 PM em1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
7 07:35:55 PM em2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
8 07:35:55 PM em3 0.00 0.00 0.00 1.00 0.00 0.00 16268.00 0.00 0.00
9 07:35:55 PM em4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
4.关于线速的探讨
RFC 5180里有描述
01 Appendix A. Theoretical Maximum Frame Rates Reference
02
03 This appendix provides the formulas to calculate and the values for
04 the theoretical maximum frame rates for two media types: Ethernet and
05 SONET.
06
07 A.1. Ethernet
08
09 The throughput in frames per second (fps) for various Ethernet
10 interface types and for a frame size X can be calculated with the
11 following formula:
12
13 Line Rate (bps)
14 ------------------------------
15 (8bits/byte)*(X+20)bytes/frame
16
17 The 20 bytes in the formula is the sum of the preamble (8 bytes) and
18 the inter-frame gap (12 bytes). The throughput for various Ethernet
19 interface types and frame sizes:
20
21 Size 10Mb/s 100Mb/s 1000Mb/s 10000Mb/s
22 Bytes pps pps pps pps
23
24 64 14,880 148,809 1,488,095 14,880,952
25 128 8,445 84,459 844,594 8,445,945
26 256 4,528 45,289 452,898 4,528,985
27 512 2,349 23,496 234,962 2,349,624
28 1024 1,197 11,973 119,731 1,197,318
29 1280 961 9,615 96,153 961,538
30 1518 812 8,127 81,274 812,743
31 1522 810 8,106 81,063 810,635
32 2048 604 6,044 60,444 604,448
33 4096 303 3,036 30,396 303,691
34 8192 152 1,522 15,221 152,216
35 9216 135 1,353 13,534 135,339
36
37 Note: Ethernet's maximum frame rates are subject to variances due to
38 clock slop. The listed rates are theoretical maximums, and actual
39 tests should account for a +/- 100 ppm tolerance.
下面是网上找到的:
二层线速指的是交换能力,单位Gbps ;三层线速指的是包转发率,单位Mpps 。
千兆端口的包转发率是1.488Mpps ( 百兆端口为0.1488Mpps,其他类推)
1,000,000,000/8/(64+8+12) = 1,488,095pps (最小数据包的大小为64byte,8byte的前导符,12byte的帧间隙)
5.遇到的问题
1.rxfram/s在测试绑定中断的时候,绑定的CPU和rxfram/s有很大关系,对结果影响很大。怀疑是 BCM5720 这个网卡问题。需要再好好研究。