本人不是Oracle DBA,不懂Oracle,告警了运维又不管,说是DBA的活,反正在他们眼里无论是MySQL,Oracle,SYBASE还是Redis,MongoDB都是DBA,和他们没关系。。。。。
1.打开nrpe.cfg,找到check_oracle_rman_backup_problems监控项,执行一下
cat /usr/local/nagios/etc/nrpe.cfgbr/>![](https://s1.51cto.com/images/blog/201803/09/6ac77908871d3a4587a289d7f718f8a4.png?x-oss-process=image/watermark,size_16,text_QDUxQ1RP5Y2a5a6i,color_FFFFFF,t_100,g_se,x_10,y_10,shadow_90,type_ZmFuZ3poZW5naGVpdGk=)
2.找到check_oracle_health脚本(perl语言)监控的,那就打开看看是如何取值监控的呗
通过rman-backup-problems搜索到在@mode数组
nagios报 check_oracle_rman_backup_problems告警处理思路_第1张图片
并找到如下代码,其中sql就是我们最终要找的,这是关于rman备份状态监控
elsif ($params{mode} =~ /server::instance::rman::backup::problems/) {
$self->{rman_backup_problems} = $self->{handle}->fetchrow_array(q{
SELECT COUNT(*) FROM v$rman_status
WHERE
operation = 'BACKUP'
AND
status != 'COMPLETED'
AND
status != 'RUNNING'
AND
start_time > sysdate-3
});
} elsif ($params{mode} =~ /server::instance::rman::backup::problems/) {
$self->add_nagios(
$self->check_thresholds($self->{rman_backup_problems}, 1, 2),
sprintf "rman had %d problems during the last 3 days",
$self->{rman_backup_problems});
$self->add_perfdata(sprintf "rman_backup_problems=%d;%d;%d",
$self->{rman_backup_problems},
$self->{warningrange}, $self->{criticalrange});
现在知道这个是由于rman备份造成,那就执行下sql和备份日志,发现如下错误
Deleting the following obsolete backups and copies:
Type Key Completion Time Filename/Handle


Control File Copy 69 2017-12-20 11:22:41 /data/ora11g/product/11.2.0/db_1/dbs/snapcf_oradb2.f
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of delete command on ORA_DISK_1 channel at 03/06/2018 01:15:28
ORA-19606: Cannot copy or restore to snapshot control file
知道错误,那就好解决啦,网上一搜总结如下:
CONFIGURE SNAPSHOT CONTROLFILE NAME TO '/data/ora11g/product/11.2.0/db_1/dbs/snapcf_oradb2.f_bak';
crosscheck controlfilecopy '/data/ora11g/product/11.2.0/db_1/dbs/snapcf_oradb2.f';
delete expired controlfilecopy '/data/ora11g/product/11.2.0/db_1/dbs/snapcf_oradb2.f';
CONFIGURE SNAPSHOT CONTROLFILE NAME TO '/data/ora11g/product/11.2.0/db_1/dbs/snapcf_oradb2.f';
CONFIGURE SNAPSHOT CONTROLFILE NAME clear;
总结,这里需要你能看懂perl面向对象编程,这里package xxx相当于class 声明类,new函数就是常说的构造函数,我觉的不会不可怕,不会可以去学,顺便了解了一下perl语言,还是有收获的