现象
网卡不断重启,导致Proxmox上所有服务都中断了通信。
报错
查看/var/log/kern.log,发现如下报错(并反复循环,重启)
May 3 17:43:41 pve kernel: [409387.721072] vmbr0: port 1(eno1) entered blocking state
May 3 17:44:09 pve kernel: [409416.200046] vmbr0: port 1(eno1) entered disabled state
May 3 17:44:17 pve kernel: [409423.956276] vmbr0: port 1(eno1) entered blocking state
May 3 17:44:19 pve kernel: [409425.959886] e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
May 3 17:44:19 pve kernel: [409425.959886] TDH <0>
May 3 17:44:19 pve kernel: [409425.959886] TDT <9>
May 3 17:44:19 pve kernel: [409425.959886] next_to_use <9>
May 3 17:44:19 pve kernel: [409425.959886] next_to_clean <0>
May 3 17:44:19 pve kernel: [409425.959886] buffer_info[next_to_clean]:
May 3 17:44:19 pve kernel: [409425.959886] time_stamp <10618bd00>
May 3 17:44:19 pve kernel: [409425.959886] next_to_watch <0>
May 3 17:44:19 pve kernel: [409425.959886] jiffies <10618be88>
May 3 17:44:19 pve kernel: [409425.959886] next_to_watch.status <0>
May 3 17:44:19 pve kernel: [409425.959886] MAC Status <40080083>
May 3 17:44:19 pve kernel: [409425.959886] PHY Status <796d>
May 3 17:44:19 pve kernel: [409425.959886] PHY 1000BASE-T Status <3800>
May 3 17:44:19 pve kernel: [409425.959886] PHY Extended Status <3000>
May 3 17:44:19 pve kernel: [409425.959886] PCI Status <10>
May 3 17:44:23 pve kernel: [409429.991735] e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
May 3 17:44:23 pve kernel: [409429.991735] TDH <0>
May 3 17:44:23 pve kernel: [409429.991735] TDT <9>
May 3 17:44:23 pve kernel: [409429.991735] next_to_use <9>
May 3 17:44:23 pve kernel: [409429.991735] next_to_clean <0>
May 3 17:44:23 pve kernel: [409429.991735] buffer_info[next_to_clean]:
May 3 17:44:23 pve kernel: [409429.991735] time_stamp <10618bd00>
May 3 17:44:23 pve kernel: [409429.991735] next_to_watch <0>
May 3 17:44:23 pve kernel: [409429.991735] jiffies <10618c278>
May 3 17:44:23 pve kernel: [409429.991735] next_to_watch.status <0>
May 3 17:44:23 pve kernel: [409429.991735] MAC Status <40080083>
May 3 17:44:23 pve kernel: [409429.991735] PHY Status <796d>
May 3 17:44:23 pve kernel: [409429.991735] PHY 1000BASE-T Status <3800>
May 3 17:44:23 pve kernel: [409429.991735] PHY Extended Status <3000>
May 3 17:44:23 pve kernel: [409429.991735] PCI Status <10>
May 3 17:44:25 pve kernel: [409432.007628] e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
May 3 17:44:25 pve kernel: [409432.007628] TDH <0>
May 3 17:44:25 pve kernel: [409432.007628] TDT <9>
May 3 17:44:25 pve kernel: [409432.007628] next_to_use <9>
May 3 17:44:25 pve kernel: [409432.007628] next_to_clean <0>
May 3 17:44:25 pve kernel: [409432.007628] buffer_info[next_to_clean]:
May 3 17:44:25 pve kernel: [409432.007628] time_stamp <10618bd00>
May 3 17:44:25 pve kernel: [409432.007628] next_to_watch <0>
May 3 17:44:25 pve kernel: [409432.007628] jiffies <10618c470>
May 3 17:44:25 pve kernel: [409432.007628] next_to_watch.status <0>
May 3 17:44:25 pve kernel: [409432.007628] MAC Status <40080083>
May 3 17:44:25 pve kernel: [409432.007628] PHY Status <796d>
May 3 17:44:25 pve kernel: [409432.007628] PHY 1000BASE-T Status <3800>
May 3 17:44:25 pve kernel: [409432.007628] PHY Extended Status <3000>
May 3 17:44:25 pve kernel: [409432.007628] PCI Status <10>
May 3 17:44:50 pve kernel: [409456.326913] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
May 3 17:44:50 pve kernel: [409456.326969] vmbr0: port 1(eno1) entered forwarding state
May 3 17:44:52 pve kernel: [409458.346577] e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
May 3 17:44:52 pve kernel: [409458.346577] TDH <0>
May 3 17:44:52 pve kernel: [409458.346577] TDT <3>
May 3 17:44:52 pve kernel: [409458.346577] next_to_use <3>
May 3 17:44:52 pve kernel: [409458.346577] next_to_clean <0>
May 3 17:44:52 pve kernel: [409458.346577] buffer_info[next_to_clean]:
May 3 17:44:52 pve kernel: [409458.346577] time_stamp <10618dc50>
May 3 17:44:52 pve kernel: [409458.346577] next_to_watch <0>
May 3 17:44:52 pve kernel: [409458.346577] jiffies <10618de29>
May 3 17:44:52 pve kernel: [409458.346577] next_to_watch.status <0>
May 3 17:44:52 pve kernel: [409458.346577] MAC Status <40080083>
May 3 17:44:52 pve kernel: [409458.346577] PHY Status <796d>
May 3 17:44:52 pve kernel: [409458.346577] PHY 1000BASE-T Status <3800>
May 3 17:44:52 pve kernel: [409458.346577] PHY Extended Status <3000>
May 3 17:44:52 pve kernel: [409458.346577] PCI Status <10>
May 3 17:45:16 pve kernel: [409482.437698] e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
May 3 17:45:16 pve kernel: [409482.437698] TDH <0>
May 3 17:45:16 pve kernel: [409482.437698] TDT <8>
May 3 17:45:16 pve kernel: [409482.437698] next_to_use <8>
May 3 17:45:16 pve kernel: [409482.437698] next_to_clean <0>
May 3 17:45:16 pve kernel: [409482.437698] buffer_info[next_to_clean]:
May 3 17:45:16 pve kernel: [409482.437698] time_stamp <10618ee00>
May 3 17:45:16 pve kernel: [409482.437698] next_to_watch <0>
May 3 17:45:16 pve kernel: [409482.437698] jiffies <10618f5b0>
May 3 17:45:16 pve kernel: [409482.437698] next_to_watch.status <0>
May 3 17:45:16 pve kernel: [409482.437698] MAC Status <40080083>
May 3 17:45:16 pve kernel: [409482.437698] PHY Status <796d>
May 3 17:45:16 pve kernel: [409482.437698] PHY 1000BASE-T Status <3800>
May 3 17:45:16 pve kernel: [409482.437698] PHY Extended Status <3000>
May 3 17:45:16 pve kernel: [409482.437698] PCI Status <10>
May 3 17:45:18 pve kernel: [409484.453566] e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
May 3 17:45:18 pve kernel: [409484.453566] TDH <0>
May 3 17:45:18 pve kernel: [409484.453566] TDT <8>
May 3 17:45:18 pve kernel: [409484.453566] next_to_use <8>
May 3 17:45:18 pve kernel: [409484.453566] next_to_clean <0>
May 3 17:45:18 pve kernel: [409484.453566] buffer_info[next_to_clean]:
May 3 17:45:18 pve kernel: [409484.453566] time_stamp <10618ee00>
May 3 17:45:18 pve kernel: [409484.453566] next_to_watch <0>
May 3 17:45:18 pve kernel: [409484.453566] jiffies <10618f7a8>
May 3 17:45:18 pve kernel: [409484.453566] next_to_watch.status <0>
May 3 17:45:18 pve kernel: [409484.453566] MAC Status <40080083>
May 3 17:45:18 pve kernel: [409484.453566] PHY Status <796d>
May 3 17:45:18 pve kernel: [409484.453566] PHY 1000BASE-T Status <3800>
May 3 17:45:18 pve kernel: [409484.453566] PHY Extended Status <3000>
May 3 17:45:18 pve kernel: [409484.453566] PCI Status <10>
May 3 17:45:18 pve kernel: [409485.061421] vmbr0: port 1(eno1) entered disabled state
May 3 17:45:26 pve kernel: [409492.477592] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
May 3 17:45:26 pve kernel: [409492.477648] vmbr0: port 1(eno1) entered forwarding state
May 3 17:45:28 pve kernel: [409494.501256] e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
May 3 17:45:28 pve kernel: [409494.501256] TDH <0>
May 3 17:45:28 pve kernel: [409494.501256] TDT <1>
May 3 17:45:28 pve kernel: [409494.501256] next_to_use <1>
May 3 17:45:28 pve kernel: [409494.501256] next_to_clean <0>
May 3 17:45:28 pve kernel: [409494.501256] buffer_info[next_to_clean]:
May 3 17:45:28 pve kernel: [409494.501256] time_stamp <106190000>
May 3 17:45:28 pve kernel: [409494.501256] next_to_watch <0>
May 3 17:45:28 pve kernel: [409494.501256] jiffies <106190178>
May 3 17:45:28 pve kernel: [409494.501256] next_to_watch.status <0>
May 3 17:45:28 pve kernel: [409494.501256] MAC Status <40080083>
May 3 17:45:28 pve kernel: [409494.501256] PHY Status <796d>
May 3 17:45:28 pve kernel: [409494.501256] PHY 1000BASE-T Status <3800>
May 3 17:45:28 pve kernel: [409494.501256] PHY Extended Status <3000>
May 3 17:45:28 pve kernel: [409494.501256] PCI Status <10>
May 3 17:45:30 pve kernel: [409496.517174] e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
May 3 17:45:30 pve kernel: [409496.517174] TDH <0>
May 3 17:45:30 pve kernel: [409496.517174] TDT <1>
May 3 17:45:30 pve kernel: [409496.517174] next_to_use <1>
May 3 17:45:30 pve kernel: [409496.517174] next_to_clean <0>
May 3 17:45:30 pve kernel: [409496.517174] buffer_info[next_to_clean]:
May 3 17:45:30 pve kernel: [409496.517174] time_stamp <106190000>
May 3 17:45:30 pve kernel: [409496.517174] next_to_watch <0>
May 3 17:45:30 pve kernel: [409496.517174] jiffies <106190370>
May 3 17:45:30 pve kernel: [409496.517174] next_to_watch.status <0>
May 3 17:45:30 pve kernel: [409496.517174] MAC Status <40080083>
May 3 17:45:30 pve kernel: [409496.517174] PHY Status <796d>
May 3 17:45:30 pve kernel: [409496.517174] PHY 1000BASE-T Status <3800>
May 3 17:45:30 pve kernel: [409496.517174] PHY Extended Status <3000>
May 3 17:45:30 pve kernel: [409496.517174] PCI Status <10>
May 3 17:45:31 pve kernel: [409498.116850] e1000e 0000:00:1f.6 eno1: Reset adapter unexpectedly
May 3 17:45:31 pve kernel: [409498.116929] vmbr0: port 1(eno1) entered disabled state
May 3 17:45:38 pve kernel: [409504.505067] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
May 3 17:45:42 pve kernel: [409508.548719] e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
May 3 17:45:42 pve kernel: [409508.548719] TDH <0>
May 3 17:45:42 pve kernel: [409508.548719] TDT <8>
May 3 17:45:42 pve kernel: [409508.548719] next_to_use <8>
May 3 17:45:42 pve kernel: [409508.548719] next_to_clean <0>
May 3 17:45:42 pve kernel: [409508.548719] buffer_info[next_to_clean]:
May 3 17:45:42 pve kernel: [409508.548719] time_stamp <106190b80>
May 3 17:45:42 pve kernel: [409508.548719] next_to_watch <0>
May 3 17:45:42 pve kernel: [409508.548719] jiffies <106190f30>
May 3 17:45:42 pve kernel: [409508.548719] next_to_watch.status <0>
May 3 17:45:42 pve kernel: [409508.548719] MAC Status <40080083>
May 3 17:45:42 pve kernel: [409508.548719] PHY Status <796d>
May 3 17:45:42 pve kernel: [409508.548719] PHY 1000BASE-T Status <3800>
May 3 17:45:42 pve kernel: [409508.548719] PHY Extended Status <3000>
May 3 17:45:42 pve kernel: [409508.548719] PCI Status <10>
May 3 17:45:44 pve kernel: [409510.564636] e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
May 3 17:45:44 pve kernel: [409510.564636] TDH <0>
May 3 17:45:44 pve kernel: [409510.564636] TDT <8>
May 3 17:45:44 pve kernel: [409510.564636] next_to_use <8>
May 3 17:45:44 pve kernel: [409510.564636] next_to_clean <0>
May 3 17:45:44 pve kernel: [409510.564636] buffer_info[next_to_clean]:
May 3 17:45:44 pve kernel: [409510.564636] time_stamp <106190b80>
May 3 17:45:44 pve kernel: [409510.564636] next_to_watch <0>
May 3 17:45:44 pve kernel: [409510.564636] jiffies <106191128>
May 3 17:45:44 pve kernel: [409510.564636] next_to_watch.status <0>
May 3 17:45:44 pve kernel: [409510.564636] MAC Status <40080083>
May 3 17:45:44 pve kernel: [409510.564636] PHY Status <796d>
May 3 17:45:44 pve kernel: [409510.564636] PHY 1000BASE-T Status <3800>
May 3 17:45:44 pve kernel: [409510.564636] PHY Extended Status <3000>
May 3 17:45:44 pve kernel: [409510.564636] PCI Status <10>
May 3 17:45:46 pve kernel: [409512.580469] e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
May 3 17:45:46 pve kernel: [409512.580469] TDH <0>
May 3 17:45:46 pve kernel: [409512.580469] TDT <8>
May 3 17:45:46 pve kernel: [409512.580469] next_to_use <8>
May 3 17:45:46 pve kernel: [409512.580469] next_to_clean <0>
May 3 17:45:46 pve kernel: [409512.580469] buffer_info[next_to_clean]:
May 3 17:45:46 pve kernel: [409512.580469] time_stamp <106190b80>
May 3 17:45:46 pve kernel: [409512.580469] next_to_watch <0>
May 3 17:45:46 pve kernel: [409512.580469] jiffies <106191320>
May 3 17:45:46 pve kernel: [409512.580469] next_to_watch.status <0>
May 3 17:45:46 pve kernel: [409512.580469] MAC Status <40080083>
May 3 17:45:46 pve kernel: [409512.580469] PHY Status <796d>
May 3 17:45:46 pve kernel: [409512.580469] PHY 1000BASE-T Status <3800>
May 3 17:45:46 pve kernel: [409512.580469] PHY Extended Status <3000>
May 3 17:45:46 pve kernel: [409512.580469] PCI Status <10>
May 3 17:46:24 pve kernel: [409551.106819] e1000e 0000:00:1f.6 eno1: Reset adapter unexpectedly
环境
硬件:NUC8i5BEH
软件:
root@pve:~# pveversion -v
proxmox-ve: 6.1-2 (running kernel: 5.3.18-2-pve)
pve-manager: 6.1-7 (running version: 6.1-7/13e58d5e)
pve-kernel-helper: 6.1-6
pve-kernel-5.3: 6.1-5
pve-kernel-5.3.18-2-pve: 5.3.18-2
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.3-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.15-pve1
libpve-access-control: 6.0-6
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.0-13
libpve-guest-common-perl: 3.0-3
libpve-http-server-perl: 3.0-4
libpve-storage-perl: 6.1-5
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 3.2.1-1
lxcfs: 3.0.3-pve60
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.1-3
pve-cluster: 6.1-4
pve-container: 3.0-21
pve-docs: 6.1-6
pve-edk2-firmware: 2.20200229-1
pve-firewall: 4.0-10
pve-firmware: 3.0-6
pve-ha-manager: 3.0-8
pve-i18n: 2.0-4
pve-qemu-kvm: 4.1.1-3
pve-xtermjs: 4.3.0-1
qemu-server: 6.1-6
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.3-pve1
原因
e1000驱动问题,见:https://forum.proxmox.com/threads/e1000-driver-hang.58284/
解决方案(据说会影响性能):
vim /etc/network/interfaces
在iface vmbr0 inet static
的最后加入 post-up ethtool -K eno1 tso off gso off
,例如:
auto lo
iface lo inet loopback
iface eno1 inet manual
auto vmbr0
iface vmbr0 inet static
address 192.168.1.117
gateway 192.168.1.1
bridge-ports eno1
bridge-stp off
bridge-fd 0
post-up ethtool -K eno1 tso off gso off
然后:wq
保存并退出