浅谈
Cisco4500
系列交换机
CPU
超载
一:
CPU
超载原因:
导致
CISCO4500
系列交换
CPU
超载的原因很多。最常见的原因为网络中异常包过多,使核心交换机
CPU
疲于控制,转发异常包,
CPU
超载运作。在我们公司有出现过下列三种方式导致异常包过多现象
:
1
:病毒(
ARP
,
DHCP
协议包过多)
;
2
:二层网络串接成回路(广播风暴);
3
:测试软件使用不当
(
该软件能持续发送广播包或者多包
)
;
二:
Troubleshooting
过程可参考的命令
:
1: show processes cpu
2: show platform health
3: show platform cpu packet statistics
4: debug platform packet all receive buffer
5: show platform cpu packet buffered
6
:
show interfaces | include L2 |line |broadcast
7
:
show interfaces interface counters
8
:
show interface | include line |\/sec
9:
monitor session 1 source cpu
monitor session 2 destination interface interfaces
三:一般处理办法:
针对上述导致
CPU
超载原因,在现实中稍有不同的处理步骤。病毒以及测软件产生异常包,我们需要找到源头并封挡它;网络串接成环,我们需要定位哪个端口下或哪些端口串接成环并做出处理。
最近
5
厂出现测试软件使用不当,导致核心
45 CPU
超载的异常事件,现结合案子大概讲述下处理过程:
1
:
收到核心
45
超载报警信息后,登入
45,show processes cpu
查看各个进程占用率情况,发现
Cat4k Mgmt
进程占用率过大,此为核心进程。
F5-4506-DOWN#
show processes cpu
CPU utilization for five seconds: 87%/1%; one minute: 85%; five minutes: 85%
PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
1 3643880 17411824 209 0.00% 0.00% 0.00% 0 Chunk Manager
2 10624 1740948 6 0.00% 0.00% 0.00% 0 Load Meter
3 0 1 0 0.00% 0.00% 0.00% 0 Deferred Events
4 0 1 0 0.00% 0.00% 0.00% 0 CEF IPC Backgrou
5 16913420 1579962 10704 0.00% 0.20% 0.18% 0 Check heaps
6 28 143 195 0.00% 0.00% 0.00% 0 Pool Manager
7 0 2 0 0.00% 0.00% 0.00% 0 Timers
-------------
Output suppressed--------------------------
33 316292 8740292 36 0.00% 0.00% 0.00% 0 Per-Second Jobs
34 4661080 277884 16773 0.00% 0.06% 0.05% 0 Per-minute Jobs
35 8688161121758382511 494 6.85% 7.29% 7.30% 0 Cat4k Mgmt HiPri
36 721412156 357038431 2020 74.56% 68.72% 68.57% 0 Cat4k Mgmt LoPri
37 212616 10593419 20 0.00% 0.00% 0.00% 0 Galios Reschedul
38 8 69 115 0.00% 0.00% 0.00% 0 IOS ACL Helper
39 0 2 0 0.00% 0.00% 0.00% 0 NAM Manager
---------------- Output suppressed--------------------------
2
:
show platform health
进一步确认该平台具体程序利用率,发现
K2CpuMan Review
占用率最大,包的转发需要调用的该进程,至此,有个大概的眉目了,可以判断有大量数据包在作怪
.
F5-4506-DOWN#show platform health
%CPU %CPU RunTimeMax Priority Average %CPU Total
Target Actual Target Actual Fg Bg 5Sec Min Hour CPU
Lj-poll 1.00 0.01 2 0 100 500 0 0 0 13:45
GalChassisVp-review 3.00 0.20 10 16 100 500 0 0 0 88:44
S2w-JobEventSchedule 10.00 0.57 10 7 100 500 1 0 0 404:22
Stub-JobEventSchedul 10.00 0.00 10 0 100 500 0 0 0 0:00
StatValueMan Update 1.00 0.09 1 0 100 500 0 0 0 91:33
Pim-review 0.10 0.00 1 0 100 500 0 0 0 4:46
Ebm-host-review 1.00 0.00 8 4 100 500 0 0 0 14:01
Ebm-port-review 0.10 0.00 1 0 100 500 0 0 0 0:20
Protocol-aging-revie 0.20 0.00 2 0 100 500 0 0 0 0:01
Acl-Flattener 1.00 0.00 10 5 100 500 0 0 0 0:04
KxAclPathMan create/ 1.00 0.00 10 5 100 500 0 0 0 0:21
KxAclPathMan update 2.00 0.00 10 6 100 500 0 0 0 0:05
KxAclPathMan reprogr 1.00 0.00 2 1 100 500 0 0 0 0:00
TagMan-InformMtegRev 1.00 0.00 5 0 100 500 0 0 0 0:00
TagMan-RecreateMtegR 1.00 0.00 10 14 100 500 0 0 0 0:18
K2CpuMan Review 30.00 91.31 30 92 100 500 128 119 84 13039:02
K2AccelPacketMan: Tx 10.00 2.30 20 0 100 500 2 2 2 1345:30
K2AccelPacketMan: Au 0.10 0.00 0 0 100 500 0 0 0 0:00
-------------- Output suppressed--------------------------
3
:
show platform cpu packet statistics
查看有发现
L2/L3Control
队列中需要
CPU
处理的数据包比较多。
F5-4506-DOWN# sho platform cpu packet statistics
Packets Dropped In Hardware By CPU Subport (txQueueNotAvail)
------------- Output suppressed--------------------------
Packets Received by Packet Queue
Queue Total 5 sec avg 1 min avg 5 min avg 1 hour avg
---------------------- --------------- --------- --------- --------- ----------
Esmp 559394854 101 85 65 55
L2/L3Control 241541699 916 820 631 173
Host Learning 9303858 0 0 0 0
L3 Fwd High 1535 0 0 0 0
L3 Fwd Medium 19512 0 0 0 0
L3 Fwd Low 3953395 0 0 0 0
L2 Fwd High 7 0 0 0 0
------------- Output suppressed--------------------------
4
:
既然知道是异常包在作怪,我们可以做端口镜像并用
Ethereal
抓包分析异常数据包,从中能得出包的类型
,
IP
,
MAC
等相关信息。接着层层往下查找即可定位到异常包的源头
,
最后封挡源头并观察
CPU
利用率。本次案子是利用
cisco
设备本身具有的命令来获取异常包信息:
F5-4506-DOWN# debug platform packet all receive buffer
platform packet debugging is on
F5-4506-DOWN#sho platform cpu packet buffered
Total Received Packets Buffered: 1024
-------------------------------------
Index 0:
100 days 18:19:59:900721 - RxVlan: 517, RxPort: Gi4/47
Priority: Normal, Tag: Dot1Q Tag, Event: Input Acl, Flags: 0x40, Size: 1362
Eth: Src 00-E0-4C-B1-7F-4D Dst 01-00-5E-00-00-01 Type/Len 0x0800
Ip: ver:4 len:20 tos:0 totLen:1344 id:62005 fragOffset:0 ttl:1 proto:udp
src: 192.168.1.100 dst: 224.0.0.1 firstFragment lastFragment
Remaining data:
0: 0x4 0x9C 0x4 0xD2 0x5 0x2C 0x58 0x82 0x47 0x0
10: 0x45 0x1E 0x8A 0xDD 0xC2 0x72 0xA5 0xAA 0x1F 0xD4
20: 0x29 0x41 0x1C 0x2 0x2B 0x1A 0x8 0x1F 0x3E 0x0
Index 1:
100 days 18:19:59:901497 - RxVlan: 517, RxPort: Gi4/47
Priority: Normal, Tag: Dot1Q Tag, Event: Input Acl, Flags: 0x40, Size: 1362
Eth: Src 00-E0-4C-B1-7F-4D Dst 01-00-5E-00-00-01 Type/Len 0x0800
Ip: ver:4 len:20 tos:0 totLen:1344 id:62006 fragOffset:0 ttl:1 proto:udp
src: 192.168.1.100 dst: 224.0.0.1 firstFragment lastFragment
Remaining data:
0: 0x4 0x9C 0x4 0xD2 0x5 0x2C 0xB3 0x1A 0x47 0x0
10: 0x45 0x15 0xC7 0xD8 0x4F 0x2E 0x11 0x72 0x4E 0xF8
20: 0x43 0xA 0x29 0x23 0x48 0x20 0xFD 0xA0 0x3 0xFF
Index 2:
100 days 18:19:59:902274 - RxVlan: 517, RxPort: Gi4/47
Priority: Normal, Tag: Dot1Q Tag, Event: Input Acl, Flags: 0x40, Size: 1362
Eth: Src 00-E0-4C-B1-7F-4D Dst 01-00-5E-00-00-01 Type/Len 0x0800
Ip: ver:4 len:20 tos:0 totLen:1344 id:62007 fragOffset:0 ttl:1 proto:udp
src: 192.168.1.100 dst: 224.0.0.1 firstFragment lastFragment
Remaining data:
0: 0x4 0x9C 0x4 0xD2 0x5 0x2C 0x71 0x68 0x47 0x0
10: 0x45 0x1C 0xB7 0xF9 0x7D 0xBA 0x9F 0x2F 0xBA 0xEB
20: 0x26 0xC2 0xEA 0xA3 0x7E 0x5D 0x0 0x58 0x8 0x0
------------- Output suppressed--------------------------
根据上述信息,我们可得出大量多包,源头在
vlan 517,
从端口
Gi4/47
发送至
4506-Down,
源
IP
(
192.168.1.100
),源
MAC(00-E0-4C-B1-7F-4D),
至此
,
可继续往下查找此源头的网络接入点并现场确认后隔离。必要时可直接在
4506-UP
中
drop
掉。