Troubleshooting a high system CPU usage issue on Linux/Solaris

A Linux server has high %system CPU usage, following are steps to find theroot cause of the issue and how to resolve it.

Vmstat show %systemCPU usage is high.
# vmstat 2

procs-----------memory---------- ---swap-- -----io---- --system-- -----cpu------

r b swpd free buff cache si so bi bo in cs us sy id wa st


1 0 0 191420 8688 35780 0 0 0 0 1006 31 1 496 0 0

1 0 0 124468 9208 98020 0 0 15626 2074 1195 188 0 76 0 24 0

0 1 0 110716 9316 110996 0 0 3268 4144 1366 84 0 94 0 7 0

0 3 0 97048 9416 122272 0 0 2818 11855 1314 109 1 80 0 20 0

0 4 0 80476 9544 137888 0 0 3908 2786 1272 172 0 54 0 46 0

2 1 0 72860 9612 145848 0 0 1930 0 1193 141 0 42 0 58 0

0 1 0 74300 9620 145860 0 0 0 6 1208 67 0 38 0 62 0

0 0 0 75680 9620 145860 0 0 0 6929 1364 101 0 70 624 0


Let’s run mpstat to show more detailed CPU usage,it showed CPU was busy withinterruptions.

# mpstat 2

Linux 2.6.18-92.el5(centos-ks) 01/14/2010


02:03:50 AM CPU %user %nice %sys %iowait %irq %soft %steal %idle intr/s

02:04:04 AM all 1.33 0.00 41.78 0.00 0.44 3.56 0.00 52.89 1015.56

02:04:06 AM all 0.00 0.00 8.04 38.69 29.6523.62 0.00 0.00 1326.63

02:04:08 AM all 0.00 0.00 8.70 30.43 27.5428.50 0.00 4.83 1327.54

02:04:10 AM all 0.00 0.00 5.47 46.77 27.36 20.40 0.00 0.00 1280.10

02:04:12 AM all 0.50 0.00 6.47 63.18 19.40 10.45 0.00 0.00 1183.08

02:04:14 AM all 1.01 0.00 6.53 62.31 21.11 9.05 0.00 0.00 1190.95

02:04:16 AM all 0.00 0.00 8.04 26.63 43.72 21.61 0.00 0.00 1365.83

02:04:18 AM all 0.00 0.00 1.50 0.00 0.00 0.50 0.00 98.00 1006.50

Use sar to findout which interrupt number was culprit. #9 was the highest excluding system interrupt #0.
# sar -IXALL 2 10

02:07:10 AM INTR intr/s

02:07:12 AM 0 992.57

02:07:12 AM 1 0.00

02:07:12 AM 2 0.00

02:07:12 AM 3 0.00

02:07:12 AM 4 0.00

02:07:12 AM 5 0.00

02:07:12 AM 6 0.00

02:07:12 AM 7 0.00

02:07:12 AM 8 0.00

02:07:12 AM 9 350.50



[ Solarisequivalent command]

Solaris# intrstat 2


device | cpu0 %tim cpu1 %tim

-------------+------------------------------

bge#0 | 0 0.0 128 0.6

cpqary3#0 | 0 0.0 14 0.0

# cat /proc/interrupts

CPU0

0: 702980 XT-PIC timer

1: 439 XT-PIC i8042

2: 0 XT-PIC cascade

6: 2 XT-PIC floppy

8: 1 XT-PIC rtc

9: 14464 XT-PIC acpi, eth2

11: 12 XT-PIC eth0

12: 400 XT-PIC i8042

14: 6091 XT-PIC ide0

15: 22 XT-PIC ide1

NMI: 0

LOC: 700623

ERR: 0

MIS: 0

[ OpenSolaris equivalent command ]

Solaris#echo::interrupts | mdb �Ck

Native Solaris hasto search the interrupt from output of prtconf -v

Solution:
When the card transmits or receives a frame, the
system must be notified of the event. If the card interrupts the system for each transmitted and received frame,the result is a high degree ofprocessor overhead. To prevent that, Gigabit Ethernet provides a feature calledInterrupt Coalescence. Effective use of this feature can reduce system overhead and improve performance.


Interrupt Coalescence essentially means that the card interrupts the system after sending or receiving batch offrames.


you can enable adaptive moderation ( Adaptive RX: off TX: off) to let system choose value automatically or set individual values manually.


A interrupt is generated by the card to the host when either frame counteror timer counter is met. Values 0 means disabled.


RX for example:

Timer counter in microseconds: rx-usecs/rx-usecs-irq

Frames counter:rx-frames/rx-frames-irq



# A sample outputwith default values.

# ethtool -c eth1

Coalesce parametersfor eth1:

Adaptive RX:off TX: off

stats-block-usecs:999936

sample-interval: 0

pkt-rate-low: 0

pkt-rate-high: 0


rx-usecs: 18

rx-frames: 6

rx-usecs-irq: 18

rx-frames-irq: 6


tx-usecs: 80

tx-frames: 20

tx-usecs-irq: 80

tx-frames-irq: 20


rx-usecs-low: 0

rx-frame-low: 0

tx-usecs-low: 0

tx-frame-low: 0


rx-usecs-high: 0

rx-frame-high: 0

tx-usecs-high: 0

tx-frame-high: 0

<>

[ Solaris equivalent command]

Varying on driver,Find out the driver's capability.

Solaris#ndd -get/dev/e1000g0 \? | egrep ‘interrupt |intr’

The Value shoud beset in driver conf file:

Solaris#/platform/`uname-m`/kernel/drv/*.conf

Alternative Workaround:

I couldn't config Interrupt Coalescence because virtual machine NIC didn'tsupport it. but as workaround, Increasing mtu can also decrease interrupt,ifconfig eth2 mtu 9000 resolved the issue. It needs to set on both hosts peer,if they are not directly connected, make sure the switch supports jumbo frames.

You don't need to care Interrupt Coalescence if your CPU resource is abundant, But for high load NFS/CIFS/ISCSI/ NAS servers, it is very useful.



你可能感兴趣的:(linux,server,System,following)