这两天在使用
在测试的过程中发现,ping这个bond接口的ip,总是丢一半的报文。在交换机端抓包发现与服务器相连的两个端口都有报文转发出去,但是在服务器端使用tcpdump功能监控两个端口的流量时,只有一个端口会收到监控到icmp 请求报文,另一个端口tcpdump不到任何报文。很明显报文有转到服务器端为什么没有送到上层直接在链路层就丢了呢?排查了大半天发现ifconfig下看到有一个加入bond的物理端口mac和bond接口的mac不一致:
[root@localhost ~]# ifconfig
bond0 Link encap:Ethernet HWaddr 00:00:C9:9C:EF:EE
inet addr:1.1.1.10 Bcast:1.1.1.255 Mask:255.255.255.0
inet6 addr: fe80::200:c9ff:fe9c:efee/64 Scope:Link
UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
RX packets:26742641 errors:0 dropped:0 overruns:0 frame:0
TX packets:3252833 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:40470041510 (37.6 GiB) TX bytes:230549054 (219.8 MiB)
p3p1 Link encap:Ethernet HWaddr 00:00:C9:9C:EF:EE
inet addr:1.1.1.10 Bcast:1.1.1.255 Mask:255.255.255.0
UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
RX packets:26742597 errors:0 dropped:0 overruns:0 frame:0
TX packets:3252156 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:40470034804 (37.6 GiB) TX bytes:230487920 (219.8 MiB)
p3p2 Link encap:Ethernet HWaddr 00:00:C9:9C:EF:F0
inet addr:1.1.1.10 Bcast:1.1.1.255 Mask:255.255.255.0
UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
RX packets:44 errors:0 dropped:0 overruns:0 frame:0
TX packets:677 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:6706 (6.5 KiB) TX bytes:61134 (59.7 KiB)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:67 errors:0 dropped:0 overruns:0 frame:0
TX packets:67 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:236952 (231.3 KiB) TX bytes:236952 (231.3 KiB)
virbr0 Link encap:Ethernet HWaddr 52:54:00:3C:46:42
inet addr:192.168.122.1 Bcast:192.168.122.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
p3p1和p3p2两个接口加入bond,从上面的信息看到,p3p2接口的mac地址和bond不一致。按照这种模式绑定的结果是所有bond下的成员口和bond使用同一个mac地址。这样就解释了,为什么报文送到这个接口就不送上层直接被丢弃了。因为回复的报文目标mac是bond接口的mac地址,当这个报文送到p3p2时,由于p3p2的mac不是bond接口的mac,所以判断不是发给自己的报文,则丢弃该报文。另一个接口和bond接口的mac一致所以另一个接口可以正常处理报文。
知道产生该问题的原因了,怎么解决???又上百度找了很久没有结果,根本没有相关问题,最后在接口下发现6.1x比之前的版本多了个NM_CONTROLLED参数,这个参数的作用是把接口交给networkmanager来管理,如果把NM_CONTROLLED设置成yes的话,在起机的时候networkmanager会调用networkmanager的配置文件,直接会调用接口实际物理mac。所以导致上面出现的接口物理mac地址和bond不一致的情况。所以我把接口下的NM_CONTROLLED设置成no,配置如下:
[root@localhost ~]# cat /etc/sysconfig/network-scripts/ifcfg-bond0
DEVICE="bond0"
BOOTPROTO="static"
IPADDR="1.1.1.10"
NETMASK="255.255.255.0"
NM_CONTROLLED="no" -------设置为no
MASTER="yes"
ONBOOT=yes
TYPE=Ethernet
USERCTL=no
BONDING_OPTS="mode=0 miimon=100"
同时关闭networkmanager服务:service NetworkManager stop 或是永久关闭NetworkManager服务: chkconfig NetworkManager off。最后重启一下服务器。重启后一切搞定,网络通了,也不丢包了。查看一下接口配置:
[root@localhost ~]# ifconfig
bond0 Link encap:Ethernet HWaddr 00:00:C9:9C:EF:EE
inet addr:1.1.1.10 Bcast:1.1.1.255 Mask:255.255.255.0
inet6 addr: fe80::200:c9ff:fe9c:efee/64 Scope:Link
UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
RX packets:4253943211 errors:0 dropped:0 overruns:0 frame:0
TX packets:972081955 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:6419129996692 (5.8 TiB) TX bytes:109750327622 (102.2 GiB)
p3p1 Link encap:Ethernet HWaddr 00:00:C9:9C:EF:EE
UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
RX packets:63483797 errors:0 dropped:0 overruns:0 frame:0
TX packets:485936418 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:89065238360 (82.9 GiB) TX bytes:54867009228 (51.0 GiB)
p3p2 Link encap:Ethernet HWaddr 00:00:C9:9C:EF:EE
UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
RX packets:4190459414 errors:0 dropped:0 overruns:0 frame:0
TX packets:486145537 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:6330064758332 (5.7 TiB) TX bytes:54883318394 (51.1 GiB)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:75 errors:0 dropped:0 overruns:0 frame:0
TX packets:75 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:237352 (231.7 KiB) TX bytes:237352 (231.7 KiB)
virbr0 Link encap:Ethernet HWaddr 52:54:00:3C:46:42
inet addr:192.168.122.1 Bcast:192.168.122.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
可以看到上面的bond下所有接口和bond的mac都是一个地址。问题终于解决了,抽了空把这个问题共享出来以便帮助有遇到相同问题的同学解决问题。
以上都是个人解决此问题的方法,如有转载请注明出处