Broadcom 5719/5720 NICs using tg3 driver become unresponsive and stop traffic in vSphere (2035701)

Broadcom 5719/5720 NICs using tg3 driver become unresponsive and stop traffic in vSphere (2035701)

Symptoms

  • When a system uses the tg3 driver with 1 Gb NICs, vmkernel/messages logs report messages indicating that NetQueue feature is enabled in the driver:

    T20:45:09.053Z cpu14:2091)<6>tg3 : vmnic3: RX NetQ allocated on 1
    T20:45:09.053Z cpu14:2091)<6>tg3 : vmnic3: NetQ set RX Filter: 1 [00:50:56:7f:96:94 0]
    T20:45:44.054Z cpu7:2091)<6>tg3 : vmnic3: NetQ remove RX filter: 1
    T20:45:44.054Z cpu7:2091)<6>tg3 : vmnic3: Free NetQ RX Queue: 1

  • This issue occurs when Broadcom BCM5719 and BCM5720 NICs are used in the system.

  • One or more NICs in the system stop functioning or responding, causing partial or full loss of network connectivity to virtual machines or any other type of VMkernel networking (vMotion, management, NFS, iSCSI, etc).

  • The NICs impacted do not appear to be receiving CDP (Cisco Discovery Protocol) information from the upstream physical switch.

Resolution

This issue has been resolved for ESXi 5.0/5.1 in version 3.129d.v50.1 of the async tg3 driver released by Broadcom. You can download the ESXi 5.0/5.1 async driver from the VMware Download Center. For more information on updating this driver, see Installing async drivers on ESXi 5.x (2005205).

For ESXi 5.5, you can download the ESXi 5.5 async driver from the VMware Download Center.

For ESXi/ESX 4.x, this issue has been resolved in async tg3 driver version 3.129d.v40.1. You can download the driver from the VMware Download Center. For more information on updating this driver in ESX/ESXi 4.x, see Installing async drivers on ESXi 4.x and ESX 4.x (1032936).

Note: Ensure to contact and confirm with your hardware vendor before upgrading to the driver versions mentioned in this article.

To work around this issue, disable the NetQueue feature.

Note: NetQueue can only be disabled on third-party async versions of the tg3 driver. Inbox drivers are now included with ESXi 5.0 Update 2 and ESXi 5.1 and do not include the NetQueue feature. To see the various async and inbox driver versions for the Broadcom 5719/5720 adapter, refer to the VMware Hardware Compatibility Guide.

The performance enhancement from NetQueue does not benefit 1 Gb NICs. This feature spreads the network load across multiple CPUs, and a single CPU can handle approximately 3 Gb of network load.

Therefore, if there are no 10 Gb NICs on the host, you can disable NetQueue for the host using these commands:
  • On ESXi 5.x, use this command:

    # esxcli system settings kernel set -s netNetqueueEnabled -v FALSE
    # reboot

  • On ESXi/ESX 4.x, use this command to verify the existing settings on the tg3 driver:

    # esxcfg-module -q | grep -E "^tg"

    or

    # esxcfg-advcfg -j netNetqueueEnabled
    netNetqueueEnabled = TRUE


If there are 10 Gb NICs on the host in addition to the tg3 NICs, then only disable NetQueue for the tg3 driver.

To disable NetQueue on ESXi/ESX for the tg3 driver, run this command:

# esxcfg-module -s force_netq=0,0,0,0 tg3

To disable NetQueue for the host, run this command:

esxcfg-advcfg -k FALSE netNetqueueEnabled

To enable NetQueue for the host, run this command:

esxcfg-advcfg -k TRUE netNetqueueEnabled

Note: The number of zeroes ( 0) in the force_netq parameter array must be the same as the number of tg3 devices on your system. For example, the preceding command applies if you have 4 tg3 NICs, which can be verified using the esxcfg-nics --list command.

To revert the change or to enable NetQueue for the tg3 driver, run this command:

# esxcfg-module -s force_netq=1,1,1,1 tg3

You can enable NetQueue for the driver using the below command after installing the new drivers. The new drivers does not support force_netq parameter so this is the only way to enable NetQueue.

# esxcfg-module -s "" tg3

After the changes are complete, reboot the host.

Additional Information

You can also verify this issue by unloading and reloading the tg3 driver using these commands:
  • To unload the driver:

    # vmkload_mod -u tg3

  • To reload the driver:

    # vmkload_mod tg3


To check if the setting is configured:

  1. View the contents of the esx.conf file by running this command:

    # cat /etc/vmware/esx.conf

  2. At the end of this file, ensure that you see an entry similar to:

    /vmkernel/module/tg3/options = "force_netq=0,0,0,0"


To verify the current NetQueue status after it is disabled, run this command:

# esxcli system settings kernel list | grep -i netqueue
netNetqueueEnabled Bool Enable/Disable NetQueue support. FALSE FALSE TRUE

Where the keys are:

Bool Column = FALSE
Enable Column = FALSE
disable Column = TRUE


You can also use the vSphere Client to make the configuration change:
  1. Click the host in vCenter Server.

  2. Click Configuration.

  3. Under Software, click Advanced Settings.

  4. Expand VMkernel in the list and click Boot.

  5. Scroll down to the setting named VMkernel.Boot.netNetqueueEnabled and deselect it to disable.

  6. Reboot the host.

In ESXi 5.0 and later, if you can identify the specific NIC that is malfunctioning you can resolve the issue by forcing the link state down and then setting it back up at the OS level. Use these commands (to reset vmnic1 in this example):
localcli network nic down -n vmnic1
localcli network nic up -n vmnic1
This has an advantage over unloading and re-loading the driver because this only affects a specific NIC at a time and not all NICs using the tg3 driver.


你可能感兴趣的:(vmware)