OpenStack中遇到的MTU问题 ( by quqi99 )
这两天接手一个bug, 说是openstack部署在两台物理计算节点机上的两台虚机之间通过ssh执行大数据输出的命令时有hang的情况,老外甚至怀疑是因为两台物理机是通过10G光纤交换机相连造成的巨大帧引发的MTU问题。研究、分析、搜索、试验、讨论MTU一天多,最后是通过下列链接的第二种方法解决的:
http://openstack.redhat.com/forum/discussion/comment/1565
https://review.openstack.org/#/c/27937/16/etc/neutron/plugins/openvswitch/ovs_neutron_plugin.ini
粘贴如下:
So I've been playing with OpenStack Quantum/Neutron for a while now in a flat/bridged networking configuration across a six node cluster on top of CentOS 6.4. Over that time I've had two problems related to MTU's sizes, one on the host and the other inside VM's.
The problem on the host is related to the taps/tunnels/bridges that OpenVSwitch constructs between the integration bridge br-int and your OVS bridge interface associated with a physical NIC, in my case br-em1, in docs sometimes referred to as br-ex. This problem manifests itself in dropped packets to the target VM/instance, for example commands failing to finish when executed inside an SSH session to the VM, simply try to edit a large text file and because the packets are at the maximum MTU they are dropped and you don't see the file in your terminal. This problem is already documented in thislaunchpad bug. It is caused by extra VLAN headers that OpenVSwitch adds to route the packets internally, these headers cause the packets to become too big for the MTU limits of the OpenVSiwtch tunnels, causing them to be dropped. My workaround is to configure the quantum dhcp agent to set a lower MTU for VM instances using dnsmaq DHCP options, this can be done by specifying a dnsmasq config file in the dhcp-agent.ini(/etc/neutron/dhcp_agent.ini). Simply add/uncomment the line dnsmasq_config_file=/var/lib/neutron/dnsmasq.conf
, the specified file should contain a single line dhcp-option=26,1454
to allow leeway for the VLAN headers to be attached and let the packet through OpenVSwitch out onto the network. Also don't forget to make the quantum user the owner of the dnsmasq config file. My suggestion/question on this is; would there be any harm in setting the the default MTU for these tunnels to maximum value that OpenVSwitch supports to avoid such issues in the future? I can think of situations when people will want to start using Jumbo Frames inside instances and this could flummox them.
The second issue I have had is again related to MTU's and the above limits, but from a viewpoint inside the VM and a NIC capability known as generic segmentation offloading. This can manifest itself in very slow scp file coipes between VM's, or failing VNC sessions to VM's. The problem is the virtio_net kernel module driver creating packets way to big for the above OpenVSwitch tunnel limits. The solution to this is to disable gso altogether in virtio_net, to check if tcp segmentation offload is enabled runethtool -k eth0
, where eth0 is the NIC in question inside the VM. To disable instantly useethool -K eth0 tso off
, to disable permanently add the line options virtio_net gso=0
to /etc/modprobe.conf(RHEL5/CENTOS5 etc) OR in a new file /etc/modprobe.d/virtio_net.conf(RHEL6/CENTOS6 etc).
查看PMTU值 (sudo ip link set eth0 mtu 1500)
1, ping -c 2 -s 1472 -M do 192.168.99.1 # IP首部20 bytes + ICMP首部8 bytes
2, tracepath 192.168.99.1
3, traceroute --mtu 192.168.99.1
将 qvrXXX网桥上的qvbxxx网卡和虚机网卡tapXXX设置MTU
http://img.kuqin.com/upimg/allimg/140525/11305334V-5.png
VLAN="122"
for qbridge in `ovs-vsctl show |grep -A1 "tag: ${VLAN}" |grep 'Interface' |cut -d '"' -f 2 |grep 'qvo'`; do
bridge=`echo $qbridge | sed -s 's/^qvo/qbr/'`
for interface in `brctl show $bridge | awk '{print $NF}' | grep -v interfaces`; do
ifconfig $interface mtu $MTU
done
done
2014-10-23,
今天又遇到此问题, 即在远程机器上能ping但不能ssh, 就是MTU问题, 进虚机修改: sudo ip link set eth0 mtu 1454, ok, 搞定. 参考: https://ask.openstack.org/en/question/30502/can-ping-vm-instance-but-cant-ssh-ssh-command-halts-with-no-output/
另一种提升性能的方法是不再虚机, 改GRE沿路的物理口的MTU改成1546, 也要记得为交换机设置mtu, 见: http://techbackground.blogspot.com/2013/06/path-mtu-discovery-