现网上dell和hp都相继的出现网卡异常down,导致服务器中断的问题,现综合各方面信息和对网卡异常的跟踪做分析和处理建议。
一、 问题分析和总结
DELL PE610 是BROADCOM 5709C 的网卡, 对于LINUX 系统下网络I/O大的时候导致的网络不稳定情况,请参考一下REDHAT 公司KB文档 : https://access.redhat.com/kb/docs/DOC-26837 (具体见附件) [目前需要用户和密码才能访问REDHAT文档资料].
其中注明了此网卡的bug修复文档https://rhn.redhat.com/errata/RHSA-2010-0398.html。
1. 网卡的各种中断方式和区别,以及操作系统对中断的选择
1) 网卡中断方式的发展,INTx,MSI,MSI-X
操作系统目前可识别三种类型的中断:
l 传统中断(INTx)-传统或固定中断是指使用早期总线技术的中断。使用这些技术,可通过一个或多个“带外”(即,独立于总线的主线)连线的外部管脚来发送中断信号。较新的总线技术(如 PCI Express)通过带内机制模拟传统中断来维持软件兼容性。主机 OS 将这些模仿中断视为传统中断。
l 消息告知中断-消息告知中断 (message-signalled interrupt, MSI) 使用带内消息而不是使用管脚,可在主桥 (host bridge) 中确定中断的地址。(有关主桥 (host bridge) 的更多信息,请参见PCI 局部总线。)MSI 可以将数据与中断消息一起发送。每个 MSI 都不是共享的,这样可以保证指定给某一设备的 MSI 在系统中是唯一的。一个 PCI 函数最多可以请求 32 条 MSI 消息。
l 扩展消息告知中断-扩展消息告知中断 (Extended message-signalled interrupt, MSI-X) 是 MSI 的增强版本。MSI-X 中断具有以下新增的优点:
? 支持 2048 条而不是 32 条消息
? 针对每条消息支持独立的消息地址和消息数据
? 支持按消息屏蔽
? 软件分配的向量少于硬件请求的向量时可具有更大灵活性。软件可以在多个 MSI-X 插槽中重用相同的 MSI-X 地址和数据。
2) MSI中断方式和MSI-X中断方式的区别
看Broadcom的网卡手册看到这样一句:
MSI Version. This is the Message Signaled Interrupts (MSI) version being used. The option MSI corresponds to the PCI 2.2 specification that supports 32 messages and a single MSI address value. The option MSI-X corresponds to the PCI 3.0 specification that supports 2,048 messages and an independent message address for each message.
总算明白了,实际应用场景中,MSI方式的中断对多核cpu的利用情况不佳,网卡中断全部落在某一个cpu上,即使设置cpu affinity也没有作用,而MSI-X中断方式可以自动在多个cpu上分担中断。
3) Linux中对网卡中断方式的选择
By default, the driver enables MSI if it is supported by the kernel. It runs an interrupt test during initialization to determine if MSI is working. If the test passes, the driver enables MSI. Otherwise, it uses legacy INTx mode.
可以看出,linux中只要网卡支持MSI中断方式,默认都会开启,开机时会探测是否支持MSI,支持就启用,不支持才会使用INTx模式
4) 如何查看系统中网卡工作的中断方式
cat /proc/interrupts有类似如下信息,就可以看出网卡的中断方式
11265195 211176 PCI-MSI-X eth0-0
54549 7408668 PCI-MSI-X eth0-1
5) 如何查看网卡的驱动版本信息
ethtool -i eth0
driver: bnx2
version: 1.9.3
firmware-version: 5.2.2 NCSI 2.0.6
bus-info: 0000:10:00.0
modinfo bnx2
filename: /lib/modules/2.6.18-164.el5PAE/kernel/drivers/net/bnx2.ko
version: 1.9.3
license: GPL
description: Broadcom NetXtreme II BCM5706/5708/5709/5716 Driver
author: Michael Chan <[email protected]>
srcversion: D151EAED8C1037CA480DE9A
2. 根据bug文档,网卡中断问题其实于2010-05-06就已经通过升级kernel(升级到kernel-2.6.18-194.3.1以上版本)完成修复了。并且还修复了一个除MSI-X以外的另一个问题。
看一下修得的日志,以下引用redhat修复bug文档(部分)。
Important: kernel security and bug fix update
Advisory: |
RHSA-2010:0398-1 |
Type: |
Security Advisory |
Severity: |
Important |
Issued on: |
2010-05-06 |
Last updated on: |
2010-05-06 |
Affected Products: |
Red Hat Enterprise Linux (v. 5 server) Red Hat Enterprise Linux Desktop (v. 5 client) |
OVAL: |
com.redhat.rhsa-20100398.xml |
CVEs (cve.mitre.org): |
CVE-2010-0307 CVE-2010-0410 CVE-2010-0730 CVE-2010-1085 CVE-2010-1086 |
Details
Updated kernel packages that fix multiple security issues and several bugs
are now available for Red Hat Enterprise Linux 5.
The Red Hat Security Response Team has rated this update as having
important security impact. Common Vulnerability Scoring System (CVSS) base
scores, which give detailed severity ratings, are available for each
vulnerability from the CVE links in the References section.
The kernel packages contain the Linux kernel, the core of any Linux
operating system.
。。。。
This update fixes the following security issues:
* in certain circumstances, under heavy load, certain network interface
cards using the bnx2 driver and configured to use MSI-X, could stop
processing interrupts and then network connectivity would cease.
(BZ#587799)
* cnic parts resets could cause a deadlock when the bnx2 device was
enslaved in a bonding device and that device had an associated VLAN.
(BZ#581148)
。。。。。。
Users should upgrade to these updated packages, which contain backported
patches to correct these issues. The system must be rebooted for this
update to take effect.
3. 以下引用Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet网卡驱动程序作者于2010-04-27的修复补丁文档来查看具体修改了驱动中的哪一部分。
bnx2: Fix lost MSI-X problem on 5709 NICs
Submitter
Michael Chan(此网卡驱动的作者,modinfo bnx2中可以看得到)
Date
2010-04-27 21:28:09
Message ID
Comments
Michael Chan - 2010-04-27 21:28:09
It has been reported that under certain heavy traffic conditions in MSI-X
mode, the driver can lose an MSI-X vector causing all packets in the
associated rx/tx ring pair to be dropped. The problem is caused by
the chip dropping the write to unmask the MSI-X vector by the kernel
(when migrating the IRQ for example).
This can be prevented by increasing the GRC timeout value for these
register read and write operations.
Thanks to Dell for helping us debug this problem.
Signed-off-by: Michael Chan <[email protected]>
---
drivers/net/bnx2.c | 6 +++++-
1 files changed, 5 insertions(+), 1 deletions(-)
David Miller - 2010-04-27 21:38:25
From: "Michael Chan" <[email protected]>
Date: Tue, 27 Apr 2010 14:28:09 -0700
> It has been reported that under certain heavy traffic conditions in MSI-X
> mode, the driver can lose an MSI-X vector causing all packets in the
> associated rx/tx ring pair to be dropped. The problem is caused by
> the chip dropping the write to unmask the MSI-X vector by the kernel
> (when migrating the IRQ for example).
>
> This can be prevented by increasing the GRC timeout value for these
> register read and write operations.
>
> Thanks to Dell for helping us debug this problem.
>
> Signed-off-by: Michael Chan <[email protected]>
Applied to net-2.6
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Patch
diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
index a257bab..4c1e51e 100644
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -4759,8 +4759,12 @@ bnx2_reset_chip(struct bnx2 *bp, u32 reset_code)
rc = bnx2_alloc_bad_rbuf(bp);
}
- if (bp->flags & BNX2_FLAG_USING_MSIX)
+ if (bp->flags & BNX2_FLAG_USING_MSIX) {
bnx2_setup_msix_tbl(bp);
+ /* Prevent MSIX table reads and write from timing out */
+ REG_WR(bp, BNX2_MISC_ECO_HW_CTL,
+ BNX2_MISC_ECO_HW_CTL_LARGE_GRC_TMOUT_EN);
+ }
return rc;
}
4. 经过对redhat5.4的kernel中的网卡驱动部分源码进行确认,确实没有打上此补丁。
5. 现网上之前采用升级到2.0.18c的网卡驱动中,确认驱动源码中也没有打上些补丁,所以现网上把网卡升级到2.0.18c也是无效的。
6. 现redhat对rhel5系列发布的最新修复bug后的kernel为kernel-PAE-2.6.18-194.32.1.el5.i686.rpm,对此kernel的源码进行确认,确认已经打上了补丁。
7. 网卡Broadcom官网上对BCM5709系列网卡的驱动已经更新到bnx2-2.0.23b,通过源码确认,Broadcom官网也已经把此补丁更新到发布的驱动中。
二、 现网处理建议
1. 不建议通过关闭msi中断方式解决,原因请看第一部分中的网卡中断方式的区别,即关闭后,网卡中断全部落在某一个cpu上。如果真想关闭msi中断方式,在加载模块时加上disable_msi=1的参数,并加进系统配置文件。
2. 如果条件请允许(因为升级kernel要重启机器才能生效),建议只升级kernel来解决网卡异常问题,因为升级kernel在解决网卡异常问题时,同时也解决了其中很多的系统bug。而且因为只升级kernel,所以对现网的环境如java,python,gcc等都不会发生改变,请不要直接使用centos源,把系统完全升级,那样的话,系统中python等环境会全部改变,可能会对现网的业务产生冲击。建议使用运维提供的yum升级源(特为此次升级kernel做的内网升级源)进行升级。
3. 如果不具备升级kernel的条件,可以使用编译网卡Broadcom官网发布的最新的驱动将网卡的驱动更新到bnx2-2.0.23b。
三、 现网处理具体步骤
1. 关闭msi中断方式,具体参数配置请参考文档中的附件DOC-26837,但不建议关闭。
2. 使用yum升级kernel
1) 运维会提供现网上主要使用的几个操作系统版本的yum源,把运维提供的repo源文件放到/etc/yum.repos.d/下。
如cp rhel-server-5.4-i386.repo /etc/yum.repos.d/
2) 建议把原来/etc/yum.repos.d/下的其中文件备份并移除。
3) Yum clean all
yum update kernel
Loaded plugins: fastestmirror
Determining fastest mirrors
Cluster | 1.1 kB 00:00
ClusterStorage | 1.1 kB 00:00
Server | 1.1 kB 00:00
Server/primary | 818 kB 00:00
Server 2293/2293
VT | 1.1 kB 00:00
Setting up Update Process
Resolving Dependencies
--> Running transaction check
---> Package kernel-PAE.i686 0:2.6.18-194.32.1.el5 set to be installed
--> Finished Dependency Resolution
Dependencies Resolved
===========================================================================
Package Arch Version Repository Size
===========================================================================
Installing:
kernel-PAE i686 2.6.18-194.32.1.el5 Server 17 M
Transaction Summary
===========================================================================
Install 1 Package(s)
Update 0 Package(s)
Remove 0 Package(s)
Total download size: 17 M
Is this ok [y/N]: y
Downloading Packages:
kernel-PAE-2.6.18-194.32.1.el5.i686.rpm | 17 MB 00:00
Running rpm_check_debug
Running Transaction Test
Finished Transaction Test
Transaction Test Succeeded
Running Transaction
Installing : kernel-PAE 1/1
Installed:
kernel-PAE.i686 0:2.6.18-194.32.1.el5
Complete!
4) cat /boot/grub/grub.conf
default=0
timeout=5
splashimage=(hd0,0)/boot/grub/splash.xpm.gz
hiddenmenu
title CentOS (2.6.18-194.32.1.el5PAE)
root (hd0,0)
kernel /boot/vmlinuz-2.6.18-194.32.1.el5PAE ro root=LABEL=/1 rhgb quiet
initrd /boot/initrd-2.6.18-194.32.1.el5PAE.img
title CentOS (2.6.18-164.el5PAE)
root (hd0,0)
kernel /boot/vmlinuz-2.6.18-164.el5PAE ro root=LABEL=/1 rhgb quiet
initrd /boot/initrd-2.6.18-164.el5PAE.img
注意查看default是否是配置成最新内核的titile
3. 请下载Broadcom官网发布的最新的驱动
1) Wget –c http://zh-cn.broadcom.com/docs/driver_download/NXII/linux-6.2.23.zip
2) 编译安装
A. 建议使用src包安装
解压并找到linux-6.2.23.zip\Server\Linux\Driver\netxtreme2-6.2.23-1.src.rpm
rpm -ivh netxtreme2-<version>.src.rpm
cd /usr/src/redhat
rpmbuild -bb SPECS/netxtreme2.spec
编译好的RPM包就在RPMS/<arch>/netxtreme2-<version>.<arch>.rpm
找到自己的版本进行安装如
rpm -ivh RPMS/i386/netxtreme2-<version>.i386.rpm
B. 使用tar.gz包进行编译安装
tar xvzf netxtreme2-<version>.tar.gz
make
make install
3) 重装加载网卡模块
rmmod bnx2;modprobe bnx2
注意执行模块重载会中断几秒钟,正常的话,所有的连接都不会中断。
注意事项:
1. 建议使用升级kernel方式解决。
2. 不建议关闭msi中断方式。
3. 如果没有使用官网最新的驱动程序进行编译,只能使用源码编译的方式,而且在编译前要先给bnx2.c打path后,再进行编译安装。