本文同步发表在我的个人博客https://evine.win。
前言
近期在PVE中发现,只要连接数一多,PVE会有两个单核的使用率明显比其他核超过一大截。经过查询资料发现,这是因为我的 RealTEK RTL8125 2.5GB 网卡默认加载的驱动是r8169
,它并没有开启网卡多队列等特性。导致一个网卡的软中断只能使用固定的一个核心开接收和发送数据。
查看网卡加载的是什么驱动:
## 看看网卡的pci编号
$ lspci | grep RTL8125
22:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller (rev 05)
2a:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller (rev 05)
## 看看网卡加载的驱动
$ lspci -s 22:00.0 -k # 22:00.0这块网卡
22:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller (rev 05)
Subsystem: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller
Kernel driver in use: r8169
Kernel modules: r8169
$ lspci -s 2a:00.0 -k # 2a:00.0这块网卡
2a:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller (rev 05)
Subsystem: Micro-Star International Co., Ltd. [MSI] RTL8125 2.5GbE Controller
Kernel driver in use: r8169
Kernel modules: r8169
查看网卡软中断数量:
$ cat /proc/interrupts | grep -P 'eth|CPU0' # 我的网卡名叫eth0、eth1
CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 CPU8 CPU9 CPU10 CPU11 CPU12 CPU13 CPU14 CPU15
51: 0 0 0 0 0 0 0 0 0 631894756 0 0 0 0 0 0 IR-PCI-MSIX-0000:22:00.0 eth0
59: 0 0 0 0 0 0 0 0 0 0 0 932824696 0 0 0 0 IR-PCI-MSIX-0000:2a:00.0 eth1
也可以用这个命令看看软件中断(%soft
)是不是大量集中在某个核心上。
$ mpstat -P ALL 1 5
连接数多的时候,甚至这两个核心的使用率能达到80%,而其他核心才不到20%。所以还是有必要将该网卡驱动从Linux默认的r8169
切换到官方驱动上。
使用官方驱动
已经有大佬把官方驱动打包成了dkms deb包:realtek-r8125-dkms,不过该deb包并没有打开网卡多队列,所以我FORK了一份,启用TX多队列及RSS,禁用ASPM,发布在:https://github.com/devome/realtek-r8125-dkms ,直接使用即可。先下载Release中最新的deb文件,再按下方流程安装即可:
## 更新内核、安装依赖
$ apt update
$ apt upgrade
$ apt install -y dkms pve-headers #pve 8.0.4+建议将pve-headers替换为proxmox-default-headers
## 安装headers
$ headers=$(dpkg -l | awk '/^ii.+kernel-[0-9]+\.[0-9]+\.[0-9]/{gsub(/-signed/, ""); gsub(/kernel/, "headers"); print $2}' | tr "\n" " ")
$ eval apt install -y $headers
## 安装刚刚下载好的deb包,此命令只会为当前系统所使用的内核以及刚刚安装的最新内核(也可能当前所使用的内核就是最新内核)安装驱动
$ dpkg -i realtek-r8125-dkms_*.deb # 如通配符会匹配多个时亦可指定具体的文件名
## 如果想为那些既不是系统当前所使用的内核,也不是刚刚安装的最新内核安装驱动,需要手动指定安装
## 看看哪些内核安装好了驱动
$ dkms status
## 列出全部内核版本kernel_version,找出那些还没有安装驱动的内核
$ dpkg -l | awk '/^ii.+kernel-[0-9]+\.[0-9]+\.[0-9]/{gsub(/proxmox-kernel-|pve-kernel-|-signed/, ""); print $2}'
## 手动指定驱动版本(在deb文件名中有体现)和内核版本(从上一句命令的输出中),zsh按tab可自动补全,比如:dkms install realtek-r8125/9.011.01 -k 6.2.16-5-pve
$ dkms install realtek-r8125/ -k
## 卸载不再需要的headers
$ eval apt-mark auto $headers
$ apt autopurge
## 禁用r8169驱动
$ echo "blacklist r8169" >> /etc/modprobe.d/dkms.conf
## 重启
$ update-grub
$ update-initramfs -u -k all
$ reboot
## 再次查看网卡加载的驱动,现在加载的是r8125了
$ lspci -s 22:00.0 -k
22:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller (rev 05)
Subsystem: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller
Kernel driver in use: r8125
Kernel modules: r8169, r8125
看看效果
# 看看中断数
$ cat /proc/interrupts | grep -P 'eth|CPU0' # 我的网卡名叫eth0、eth1
CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 CPU8 CPU9 CPU10 CPU11 CPU12 CPU13 CPU14 CPU15
99: 0 0 0 144263633 0 631894756 0 0 0 297 0 3128 0 0 0 0 IR-PCI-MSIX-0000:22:00.0 0-edge eth0-0
101: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSIX-0000:22:00.0 1-edge eth0-1
102: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSIX-0000:22:00.0 2-edge eth0-2
103: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSIX-0000:22:00.0 3-edge eth0-3
104: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSIX-0000:22:00.0 4-edge eth0-4
105: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSIX-0000:22:00.0 5-edge eth0-5
106: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSIX-0000:22:00.0 6-edge eth0-6
107: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSIX-0000:22:00.0 7-edge eth0-7
108: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSIX-0000:22:00.0 8-edge eth0-8
109: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSIX-0000:22:00.0 9-edge eth0-9
110: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSIX-0000:22:00.0 10-edge eth0-10
111: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSIX-0000:22:00.0 11-edge eth0-11
112: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSIX-0000:22:00.0 12-edge eth0-12
113: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSIX-0000:22:00.0 13-edge eth0-13
114: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSIX-0000:22:00.0 14-edge eth0-14
115: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSIX-0000:22:00.0 15-edge eth0-15
116: 0 0 1733 0 440636622 5579613 0 161245 17862881 0 0 1031 0 0 9500904 0 IR-PCI-MSIX-0000:22:00.0 16-edge eth0-16
117: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSIX-0000:22:00.0 17-edge eth0-17
118: 3365897 0 1182441 0 124570 308349 2351042 235318334 0 0 0 0 230800986 1358 0 0 IR-PCI-MSIX-0000:22:00.0 18-edge eth0-18
119: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSIX-0000:22:00.0 19-edge eth0-19
120: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSIX-0000:22:00.0 20-edge eth0-20
121: 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSIX-0000:22:00.0 21-edge eth0-21
122: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSIX-0000:22:00.0 22-edge eth0-22
123: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSIX-0000:22:00.0 23-edge eth0-23
124: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSIX-0000:22:00.0 24-edge eth0-24
125: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSIX-0000:22:00.0 25-edge eth0-25
126: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSIX-0000:22:00.0 26-edge eth0-26
127: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSIX-0000:22:00.0 27-edge eth0-27
128: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSIX-0000:22:00.0 28-edge eth0-28
129: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSIX-0000:22:00.0 29-edge eth0-29
130: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSIX-0000:22:00.0 30-edge eth0-30
131: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSIX-0000:22:00.0 31-edge eth0-31
133: 0 156031397 0 0 0 147300933 215 514317 0 0 0 1456987057 0 0 0 422 IR-PCI-MSIX-0000:2a:00.0 0-edge eth1-0
134: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSIX-0000:2a:00.0 1-edge eth1-1
136: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSIX-0000:2a:00.0 2-edge eth1-2
137: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSIX-0000:2a:00.0 3-edge eth1-3
138: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSIX-0000:2a:00.0 4-edge eth1-4
139: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSIX-0000:2a:00.0 5-edge eth1-5
140: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSIX-0000:2a:00.0 6-edge eth1-6
141: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSIX-0000:2a:00.0 7-edge eth1-7
142: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSIX-0000:2a:00.0 8-edge eth1-8
143: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSIX-0000:2a:00.0 9-edge eth1-9
144: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSIX-0000:2a:00.0 10-edge eth1-10
145: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSIX-0000:2a:00.0 11-edge eth1-11
146: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSIX-0000:2a:00.0 12-edge eth1-12
147: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSIX-0000:2a:00.0 13-edge eth1-13
148: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSIX-0000:2a:00.0 14-edge eth1-14
149: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSIX-0000:2a:00.0 15-edge eth1-15
150: 2196 115680935 162017341 0 0 3 459917 567139230 25305 106494 2371 5 82635683 0 1254 632401889 IR-PCI-MSIX-0000:2a:00.0 16-edge eth1-16
151: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSIX-0000:2a:00.0 17-edge eth1-17
152: 1557 4043 309835387 0 0 293 1044744 0 0 473416721 0 0 0 4 162948447 615115873 IR-PCI-MSIX-0000:2a:00.0 18-edge eth1-18
153: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSIX-0000:2a:00.0 19-edge eth1-19
154: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSIX-0000:2a:00.0 20-edge eth1-20
155: 1 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSIX-0000:2a:00.0 21-edge eth1-21
156: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSIX-0000:2a:00.0 22-edge eth1-22
157: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSIX-0000:2a:00.0 23-edge eth1-23
158: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSIX-0000:2a:00.0 24-edge eth1-24
159: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSIX-0000:2a:00.0 25-edge eth1-25
160: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSIX-0000:2a:00.0 26-edge eth1-26
161: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSIX-0000:2a:00.0 27-edge eth1-27
162: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSIX-0000:2a:00.0 28-edge eth1-28
163: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSIX-0000:2a:00.0 29-edge eth1-29
164: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSIX-0000:2a:00.0 30-edge eth1-30
165: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSIX-0000:2a:00.0 31-edge eth1-31
每个网卡都用到了多个核心来处理软中断,效果不错。根据这个帖子的说法,9.011.01版本驱动是1个rx队列和2个tx队列。
再看看cpu各核心的使用率是否相对均衡:
$ mpstat -P ALL 1 3
Linux 6.2.16-6-pve (pve) 08/15/23 _x86_64_ (16 CPU)
20:55:54 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
20:55:55 all 1.44 0.06 1.57 0.06 0.00 0.38 0.00 4.27 0.00 92.21
20:55:55 0 0.98 0.00 1.96 0.00 0.00 0.00 0.00 5.88 0.00 91.18
20:55:55 1 0.00 0.00 1.01 0.00 0.00 0.00 0.00 4.04 0.00 94.95
20:55:55 2 0.00 0.00 0.99 0.00 0.00 0.00 0.00 8.91 0.00 90.10
20:55:55 3 1.01 0.00 0.00 0.00 0.00 1.01 0.00 5.05 0.00 92.93
20:55:55 4 2.02 0.00 0.00 0.00 0.00 1.01 0.00 5.05 0.00 91.92
20:55:55 5 4.00 0.00 4.00 0.00 0.00 1.00 0.00 1.00 0.00 90.00
20:55:55 6 1.98 0.99 1.98 0.00 0.00 0.99 0.00 3.96 0.00 90.10
20:55:55 7 1.98 0.00 1.98 0.99 0.00 0.00 0.00 3.96 0.00 91.09
20:55:55 8 1.02 0.00 2.04 0.00 0.00 0.00 0.00 2.04 0.00 94.90
20:55:55 9 0.00 0.00 1.03 0.00 0.00 0.00 0.00 4.12 0.00 94.85
20:55:55 10 4.12 0.00 1.03 0.00 0.00 0.00 0.00 3.09 0.00 91.75
20:55:55 11 1.01 0.00 1.01 0.00 0.00 1.01 0.00 7.07 0.00 89.90
20:55:55 12 1.02 0.00 2.04 0.00 0.00 1.02 0.00 2.04 0.00 93.88
20:55:55 13 3.00 0.00 1.00 0.00 0.00 0.00 0.00 3.00 0.00 93.00
20:55:55 14 0.00 0.00 2.02 0.00 0.00 0.00 0.00 3.03 0.00 94.95
20:55:55 15 0.98 0.00 2.94 0.00 0.00 0.00 0.00 5.88 0.00 90.20
20:55:55 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
20:55:56 all 1.14 0.00 1.21 0.51 0.00 0.44 0.00 2.35 0.00 94.35
20:55:56 0 1.02 0.00 3.06 0.00 0.00 0.00 0.00 2.04 0.00 93.88
20:55:56 1 1.02 0.00 0.00 1.02 0.00 1.02 0.00 1.02 0.00 95.92
20:55:56 2 0.99 0.00 1.98 0.00 0.00 0.00 0.00 3.96 0.00 93.07
20:55:56 3 1.98 0.00 1.98 0.00 0.00 0.99 0.00 3.96 0.00 91.09
20:55:56 4 3.06 0.00 1.02 0.00 0.00 0.00 0.00 2.04 0.00 93.88
20:55:56 5 0.00 0.00 2.97 0.00 0.00 1.98 0.00 4.95 0.00 90.10
20:55:56 6 2.06 0.00 1.03 0.00 0.00 0.00 0.00 0.00 0.00 96.91
20:55:56 7 1.02 0.00 0.00 3.06 0.00 1.02 0.00 2.04 0.00 92.86
20:55:56 8 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.04 0.00 98.96
20:55:56 9 1.02 0.00 1.02 0.00 0.00 1.02 0.00 2.04 0.00 94.90
20:55:56 10 0.00 0.00 1.01 1.01 0.00 0.00 0.00 3.03 0.00 94.95
20:55:56 11 2.04 0.00 1.02 0.00 0.00 0.00 0.00 2.04 0.00 94.90
20:55:56 12 1.03 0.00 1.03 1.03 0.00 1.03 0.00 2.06 0.00 93.81
20:55:56 13 1.01 0.00 1.01 0.00 0.00 0.00 0.00 2.02 0.00 95.96
20:55:56 14 1.01 0.00 0.00 2.02 0.00 0.00 0.00 3.03 0.00 93.94
20:55:56 15 1.02 0.00 2.04 0.00 0.00 0.00 0.00 2.04 0.00 94.90
20:55:56 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
20:55:57 all 1.95 0.00 1.01 0.19 0.00 0.38 0.00 2.83 0.00 93.65
20:55:57 0 0.99 0.00 0.00 0.00 0.00 0.99 0.00 3.96 0.00 94.06
20:55:57 1 0.00 0.00 1.98 0.99 0.00 0.00 0.00 2.97 0.00 94.06
20:55:57 2 0.00 0.00 2.04 0.00 0.00 0.00 0.00 3.06 0.00 94.90
20:55:57 3 2.00 0.00 0.00 0.00 0.00 1.00 0.00 3.00 0.00 94.00
20:55:57 4 1.01 0.00 2.02 0.00 0.00 0.00 0.00 2.02 0.00 94.95
20:55:57 5 2.02 0.00 3.03 0.00 0.00 0.00 0.00 1.01 0.00 93.94
20:55:57 6 2.00 0.00 0.00 0.00 0.00 1.00 0.00 2.00 0.00 95.00
20:55:57 7 10.10 0.00 0.00 1.01 0.00 0.00 0.00 2.02 0.00 86.87
20:55:57 8 0.00 0.00 1.01 0.00 0.00 0.00 0.00 5.05 0.00 93.94
20:55:57 9 1.01 0.00 0.00 0.00 0.00 2.02 0.00 3.03 0.00 93.94
20:55:57 10 3.03 0.00 1.01 0.00 0.00 0.00 0.00 4.04 0.00 91.92
20:55:57 11 4.00 0.00 2.00 0.00 0.00 0.00 0.00 2.00 0.00 92.00
20:55:57 12 3.92 0.00 0.98 0.00 0.00 0.98 0.00 3.92 0.00 90.20
20:55:57 13 0.00 0.00 1.01 1.01 0.00 0.00 0.00 3.03 0.00 94.95
20:55:57 14 0.00 0.00 1.03 0.00 0.00 0.00 0.00 1.03 0.00 97.94
20:55:57 15 1.01 0.00 0.00 0.00 0.00 0.00 0.00 3.03 0.00 95.96
Average: CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
Average: all 1.51 0.02 1.26 0.25 0.00 0.40 0.00 3.15 0.00 93.40
Average: 0 1.00 0.00 1.66 0.00 0.00 0.33 0.00 3.99 0.00 93.02
Average: 1 0.34 0.00 1.01 0.67 0.00 0.34 0.00 2.68 0.00 94.97
Average: 2 0.33 0.00 1.67 0.00 0.00 0.00 0.00 5.33 0.00 92.67
Average: 3 1.67 0.00 0.67 0.00 0.00 1.00 0.00 4.00 0.00 92.67
Average: 4 2.03 0.00 1.01 0.00 0.00 0.34 0.00 3.04 0.00 93.58
Average: 5 2.00 0.00 3.33 0.00 0.00 1.00 0.00 2.33 0.00 91.33
Average: 6 2.01 0.34 1.01 0.00 0.00 0.67 0.00 2.01 0.00 93.96
Average: 7 4.36 0.00 0.67 1.68 0.00 0.34 0.00 2.68 0.00 90.27
Average: 8 0.34 0.00 1.02 0.00 0.00 0.00 0.00 2.73 0.00 95.90
Average: 9 0.68 0.00 0.68 0.00 0.00 1.02 0.00 3.06 0.00 94.56
Average: 10 2.37 0.00 1.02 0.34 0.00 0.00 0.00 3.39 0.00 92.88
Average: 11 2.36 0.00 1.35 0.00 0.00 0.34 0.00 3.70 0.00 92.26
Average: 12 2.02 0.00 1.35 0.34 0.00 1.01 0.00 2.69 0.00 92.59
Average: 13 1.34 0.00 1.01 0.34 0.00 0.00 0.00 2.68 0.00 94.63
Average: 14 0.34 0.00 1.02 0.68 0.00 0.00 0.00 2.37 0.00 95.59
Average: 15 1.00 0.00 1.67 0.00 0.00 0.00 0.00 3.68 0.00 93.65
看得出来软中断在各核心上使用率差异已不如之前大了。