今天上班,同事反映一个系统有报错,运维人员在排查,说目前没什么影响,但是会不停的产生报错信息,因为系统上跑着生产环境的应用,担心有隐患,还是希望能排查下,于是,登录系统分析排查。
# oslevel -s
6100-04-00-0000
# errpt
IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION
CB4A951F 0819092720 I S SRC SOFTWARE PROGRAM ERROR
CB4A951F 0819092620 I S SRC SOFTWARE PROGRAM ERROR
CB4A951F 0819092520 I S SRC SOFTWARE PROGRAM ERROR
CB4A951F 0819092420 I S SRC SOFTWARE PROGRAM ERROR
CB4A951F 0819092320 I S SRC SOFTWARE PROGRAM ERROR
CB4A951F 0819092220 I S SRC SOFTWARE PROGRAM ERROR
CB4A951F 0819092120 I S SRC SOFTWARE PROGRAM ERROR
CB4A951F 0819092020 I S SRC SOFTWARE PROGRAM ERROR
CB4A951F 0819091920 I S SRC SOFTWARE PROGRAM ERROR
CB4A951F 0819091820 I S SRC SOFTWARE PROGRAM ERROR
CB4A951F 0819091720 I S SRC SOFTWARE PROGRAM ERROR
。。。
--注:
1)IDENTIFIER:报错标识。
2)TIMESTAMP:报错时间戳,格式为“月日时分年”。
3)T:报错类型(Type)。P - Permanent;T - Temporary;I - Information。
4)C:报错类(Class)。 H - Hardware;S - Software;O - Operator Notice(Intiated with errlogger command);
U - Undetermined
5)RESOURCE_NAME:探测到错误的资源名,也许并不是存在问题的组件。
6)DESCRIPTION:从模板库抽取出的错误信息的简短描述。
可以看到,每隔一分钟就报错一个同样的错误,再看错误详细信息。
# errpt -aj CB4A951F |more
---------------------------------------------------------------------------
LABEL: SRC_RSTRT
IDENTIFIER: CB4A951F
Date/Time: Wed Aug 19 10:58:40 BEIDT 2020
Sequence Number: 2215882
Machine Id: xxxxxxxxxxx
Node Id: localhost
Class: S
Type: INFO
WPAR: Global
Resource Name: SRC
Description
SOFTWARE PROGRAM ERROR
Probable Causes
APPLICATION PROGRAM
Failure Causes
SOFTWARE PROGRAM
Recommended Actions
VERIFY SUBSYSTEM RESTARTED AUTOMATICALLY
Detail Data
SYMPTOM CODE
19968
SOFTWARE ERROR CODE
-9035
ERROR CODE
0
DETECTING MODULE
'srchevn.c'@line:'234'
FAILING MODULE
sendmail
。。。
由此可以看到,报错信息应该和系统的sendmail应用有关,应该是每分钟重启sendmail导致的报错信息,和相关同事沟通了解,应用目前没有用到sendmail,于是,打算将其关掉。
# date
# Wed Aug 19 10:58:20 BEIDT 2020
# stopsrc -s sendmail
# date
# Wed Aug 19 11:18:50 BEIDT 2020
# errpt
IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION
CB4A951F 0819105820 I S SRC SOFTWARE PROGRAM ERROR
CB4A951F 0819105720 I S SRC SOFTWARE PROGRAM ERROR
CB4A951F 0819105620 I S SRC SOFTWARE PROGRAM ERROR
CB4A951F 0819105520 I S SRC SOFTWARE PROGRAM ERROR
CB4A951F 0819105520 I S SRC SOFTWARE PROGRAM ERROR
CB4A951F 0819105420 I S SRC SOFTWARE PROGRAM ERROR
CB4A951F 0819105320 I S SRC SOFTWARE PROGRAM ERROR
CB4A951F 0819105220 I S SRC SOFTWARE PROGRAM ERROR
CB4A951F 0819105120 I S SRC SOFTWARE PROGRAM ERROR
CB4A951F 0819105020 I S SRC SOFTWARE PROGRAM ERROR
CB4A951F 0819104920 I S SRC SOFTWARE PROGRAM ERROR
CB4A951F 0819104820 I S SRC SOFTWARE PROGRAM ERROR
CB4A951F 0819104720 I S SRC SOFTWARE PROGRAM ERROR
。。。
可以看到,自从关闭sendmail应用,系统没再报出之前的错误信息,问题解决,为了确保系统重启后sendmail不再重启,对系统调整如下。
# vi /etc/rc.tcpip
--将如下两行注释掉并保存退出。
# qpi=30m # 30 minute interval
# start /usr/lib/sendmail "$src_running" "-bd -q${qpi}"
至此,问题成功解决。