DBA如何定制自动化巡检工具

一、逐步实现自动化的基本步骤

1. 手动巡检

最初阶段,多数DBA习惯于将常用的SQL语句总结记录下来,在日常巡检的时候,会在数据库中执行常用的SQL语句,然后以抓屏方式将结果复制出来,然后逐条检查分析。这种方法只适合于管理少量数据库,一旦数据库数量增加,工作量会非常大。

2. 脚本巡检

升级阶段,具备脚本语言开发功底的DBA会逐步将常用的SQL语句编制为固定SQL脚本,然后通过操作系统可执行的脚本语言(例如:KSH、BASH),通过Shell脚本去调用SQL脚本,并把执行结果写入日志文件,后期通过日志文件去进行检查分析。这种方法可管理较多数据库,但是需要频繁切换到不同的数据库服务器,容易误操作,尤其是操作系统平台不统一的场景。

3. 自动化巡检

最终阶段,具备Python、Expect等语言开发功底的DBA会逐步将常前面的成果物积淀下来,然后通过Python脚本的方式去自动调用各个系统的Shell脚本,从而批量完成所有数据库的巡检过程,并将所有数据库的巡检日志传输至集中位置。再通过文本过滤工具(例如:AWK)或者VBA对日志内容进行过滤筛选,最后对过滤后的内容进行最终分析。这种方法可以通过集中管理服务器对所有数据库服务器进行批量操作,并且消除了操作系统平台不一致带来的问题。

二、需要获取的数据库诊断信息

1. 基本信息

基本信息根据不同的数据库环境会有所差异,但是有些最基本的信息是必须的。如数据库名称、实例名称、唯一标识信息、系统版本、地址信息、数据库创建时间、数据库启动时间、数据库版本信息、数据库补丁信息、字符集、实例状态、数据库归档模式等。以下脚本(for Oracle)供参考:

SELECT name AS db_name,
       dbid,
       created,
       platform_name,
       db_unique_name,
       log_mode,
       guard_status,
       force_logging,
       flashback_on
  FROM v$database;
  
SELECT instance_name,
       inst_id,
       host_name,
       TO_CHAR(startup_time, 'YYYY/MM/DD HH24:MI:SS') AS startup_time,
       status,
       SYSDATE - STARTUP_TIME AS Running_Time
  FROM gv$instance
 ORDER BY inst_id;

2. 存储信息

存储信息包括逻辑存储信息和物理存储信息,具体包含数据库的表空间、数据文件、磁盘三个维度。表空间需要收集表空间名称,类型,区管理方式,区初始大小,段的管理方式,总大小,剩余大小,剩余百分比;数据文件需要收集对应的表空间,是否自动扩展,当前值,最大值等;磁盘信息需要收集磁盘组的序号、名称、冗余类型、总大小、剩余大小、使用频繁的数据大小、镜像需要的空间、可用于镜像的大小、offline的磁盘数量等。以下脚本(for Oracle)供参考:


SELECT c.tablespace_name,
       c.contents,
       c.extent_management ext_mgmt,
       c.allocation_type alloc_type,
       c.initial_extent / 1024 ext_kb,
       c.segment_space_management SSM,
       nvl(a.total_gb, 0) total_gb,
       decode(c.contents, 'TEMPORARY', nvl(a.total_gb, 0) - c.block_size * b.used_blocks / 1073741824, nvl(b.free_gb, 0)) free_gb,
       100 *  decode(c.contents, 'TEMPORARY', nvl(a.total_gb, 0) - c.block_size * b.used_blocks / 1073741824, nvl(b.free_gb, 0)) / nvl(a.total_gb, 1) free_pct
  FROM (SELECT tablespace_name, nvl(sum(bytes) / 1073741824, 0) total_gb
          FROM dba_data_files
         GROUP BY tablespace_name
         UNION all
        SELECT tablespace_name, nvl(sum(bytes) / 1073741824, 0) total_gb
          FROM dba_temp_files
         GROUP BY tablespace_name) a,
       (SELECT tablespace_name,
               nvl(sum(bytes) / 1073741824, 0) free_gb,
               0 used_blocks
          FROM dba_free_space
         GROUP BY tablespace_name
         UNION all
        SELECT tablespace_name,
               0 free_gb,
               nvl(sum(used_blocks), 0) used_blocks
          FROM gv$sort_segment
         GROUP BY tablespace_name) b,
       dba_tablespaces c
 WHERE c.tablespace_name = b.tablespace_name(+)
   AND c.tablespace_name = a.tablespace_name(+)
 ORDER BY c.contents, free_pct, c.tablespace_name;

 SELECT file_id,
       file_name,
       tablespace_name,
       autoextensible,
       bytes / 1073741824 as current_gb,
       maxbytes / 1073741824 as max_gb
  FROM dba_data_files
 UNION
SELECT file_id,
       file_name,
       tablespace_name as ts_name,
       autoextensible,
       bytes / 1073741824 as cur_gb,
       maxbytes / 1073741824 as max_gb
  FROM dba_temp_files
 ORDER BY tablespace_name, file_id, file_name;

 SELECT group_number,
       name,
       type,
       total_mb,
       free_mb,
       hot_used_mb,
       required_mirror_free_mb,
       usable_file_mb,
       offline_disks 
  FROM v$asm_diskgroup_stat;

3. 日志信息

日志信息主要是对联机日志、快速恢复日志相关动态信息的统计分析。具体包括联机日志的组号、线程号、序列号、文件路径、大小、状态、归档状态、组内成员数量;归档日志产生日期,产生的大小,和文件数等;每天日志切换的量以及切换的频率。以下脚本(for Oracle)供参考:

SELECT t2.member,
       t1.group#,
       t1.thread#,
       t1.sequence#,
       t1.bytes / 1024 / 1024 AS SIZE_MB,
       t1.status,
       t1.archived,
       t1.members
  FROM v$log t1, v$logfile t2
 WHERE t1.group# = t2.group#
 ORDER BY thread#, group#;

 SELECT lpad(to_char(first_time, 'yyyymmdd'), 12) "DATE",
       trunc(sum(blocks * block_size) / 1024 / 1024) "SIZE(MB)",
       count(*)
  FROM v$archived_log
 WHERE first_time > sysdate - 6
   AND creator = 'ARCH'
 GROUP BY lpad(to_char(first_time, 'yyyymmdd'), 12)
 ORDER BY 1;

 SELECT SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH:MI:SS'),1,5) Day,
       SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'00',1,0)) H00,
       SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'01',1,0)) H01, 
       SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'02',1,0)) H02,
       SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'03',1,0)) H03,
       SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'04',1,0)) H04,
       SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'05',1,0)) H05,
       SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'06',1,0)) H06,
       SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'07',1,0)) H07,
       SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'08',1,0)) H08,
       SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'09',1,0)) H09,
       SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'10',1,0)) H10,
       SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'11',1,0)) H11, 
       SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'12',1,0)) H12,
       SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'13',1,0)) H13, 
       SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'14',1,0)) H14,
       SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'15',1,0)) H15, 
       SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'16',1,0)) H16, 
       SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'17',1,0)) H17, 
       SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'18',1,0)) H18, 
       SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'19',1,0)) H19, 
       SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'20',1,0)) H20, 
       SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'21',1,0)) H21,
       SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'22',1,0)) H22 , 
       SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'23',1,0)) H23, 
       COUNT(*) TOTAL 
  FROM v$log_history  a  
 WHERE first_time>=to_char(sysdate-7)
 GROUP BY SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH:MI:SS'),1,5) 
 ORDER BY SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH:MI:SS'),1,5) DESC;

 SELECT thread#, 
       sequence#, 
       to_char(first_time, 'MM/DD/RR HH24:MI:SS')
  FROM v$log_history
 WHERE thread# = 1
   AND first_time > sysdate - 1
 ORDER BY first_time DESC;

SELECT thread#, 
       sequence#, 
       to_char(first_time, 'MM/DD/RR HH24:MI:SS')
  FROM v$log_history
 WHERE thread# = 2
   AND first_time > sysdate - 1
 ORDER BY first_time DESC;

4. 备份信息

备份信息主要是查看数据库备份任务的执行情况。主要包括备份任务的会话信息标识,备份内容,开始时间,结束时间,消耗时间。以下脚本(for Oracle)供参考:


SELECT session_key,
       start_time,
       end_time,
       status,
       time_taken_display tt
  FROM v$rman_backup_job_details
 WHERE start_time > sysdate -1
 ORDER BY session_key;

5. 性能信息

数据库性能诊断分析的基本目标分两个方面,一方面要看数据库本身跟性能有关的资源使用情况是否正常,另外一方面就是要抓取异常的会话、事件、SQL等。因此,这部分内容首先要收集数据库缓存使用情况,然后要收集异常会话以及相关进程的资源参数,24小时内CPU等待最长的事件,执行解析最多的SQL语句等。以下脚本(for Oracle)供参考:

SELECT free_space,
       avg_free_size,
       used_space,
       avg_used_size,
       request_failures,
       last_failure_size 
  FROM v$shared_pool_reserved;

  SELECT *
  FROM (SELECT t.sid,
               t.serial#,
               trunc(sysdate - logon_time) AS online_time,
               t.PROGRAM,
               t.status,
               t.LOGON_TIME,
               t.sql_id,
               t.prev_sql_id,
               t.event
          FROM gv$session t
         WHERE t.type <> 'BACKGROUND' AND program is not null
         ORDER BY logon_time)
 WHERE rownum <= 30;

 SELECT * 
  FROM gv$resource_limit  
 WHERE trim(limit_value) != 'UNLIMITED';

 SELECT t.*, s.sid, s.serial#, s.machine, s.program, s.osuser
  FROM (SELECT c.USERNAME, a.event, to_char(a.cnt) as seconds, a.sql_id, dbms_lob.substr(b.sql_fulltext, 100, 1) sqltext
          FROM (SELECT rownum rn, t.*
                  FROM (SELECT decode(s.session_state, 'WAITING', s.event, 'Cpu + Wait For Cpu') Event, s.sql_id, s.user_id, count(*) cnt
                          FROM v$active_session_history s
                         WHERE sample_time > sysdate - 1
                         GROUP BY s.user_id, decode(s.session_state, 'WAITING', s.event, 'Cpu + Wait For Cpu'), s.sql_id
                         ORDER BY cnt DESC) t
                 WHERE rownum < 20) a, v$sqlarea b, dba_users c
         WHERE a.sql_id = b.sql_id AND a.user_id = c.user_id
         ORDER BY cnt DESC) t, v$session s
 WHERE t.sql_id = s.sql_id(+);

 SELECT *
  FROM (SELECT sql_id, sql_text, s.executions, s.last_load_time, s.first_load_time, s.disk_reads, s.buffer_gets, s.parse_calls
          FROM v$sql s
      ORDER BY s.executions DESC)
 WHERE rownum <= 10;

 SELECT *
  FROM (SELECT sql_id, sql_text, s.EXECUTIONS, s.LAST_LOAD_TIME, s.FIRST_LOAD_TIME, s.DISK_READS, s.BUFFER_GETS, s.PARSE_CALLS
          FROM v$sql s
      ORDER BY s.PARSE_CALLS DESC)
 WHERE rownum <= 10;

6. 容灾信息

所谓容灾信息就是指数据库有容灾配置模式的场景,比如Oracle的Data Guard。如果有相关的配置,则需要检查主备库的同步是否异常,主要通过归档日志的同步信息来判断分析。这个时候需要获取归档目标名称、状态、数据库当前模式、目的地路径等静态信息,需要获取每个节点应用日志和归档日志的最大号,SCN最大值和最小值等相关信息。以下脚本(for Oracle)供参考:


SELECT dest_name,
       status,
       database_mode,
       destination  
  FROM v$archive_dest_status 
 WHERE dest_id in ('1','2');

 SELECT m.thread#, 
       m.sequence#, 
       first_change#, 
       next_change#
  FROM v$log_history m,
       (SELECT thread#, max(sequence#) as sequence#
          FROM v$log_history
      GROUP BY thread#) t
 WHERE m.thread# = t.thread#
   AND m.sequence# = t.sequence#;

   SELECT UNIQUE thread# AS thread, 
       MAX(sequence#) OVER (PARTITION BY thread#) AS last 
  FROM v$archived_log;

三、实现自动化脚本集成

1. 将基本SQL文转换成可用SQL脚本

通过手动执行SQL命令的方式,查询结果可以展现在屏幕上。但是以脚本后台模式执行的时候,就需要将查询结果格式化之后输入到结果文件当中。可以采用文本文件或者HTML文件。采用文件文件的时候,需要对输出结果进行美观易读方面的格式化,想再升级为HTML文件时,SQL脚本中要加HTML头。例如以下是格式化之前和之后的对比:

PROMPT 

XX公司数据库巡检报告

PROMPT

数据库基本信息 COLUMN log_mode FOR a12 COLUMN guard_status FOR a10 COLUMN force_logging FOR a15 COLUMN flashback_on FOR a15 COLUMN db_unique_name FOR a10 COLUMN platform_name FOR a20 SELECT name AS db_name, dbid, created, platform_name, db_unique_name, log_mode, guard_status, force_logging, flashback_on FROM v$database;

2. 通过Shell脚本调用SQL脚本

如果单纯通过Shell脚本完成对SQL脚本的调用还是有些单薄。因此Shell脚本的设计需要完成两方面的功能,一方面需要收集操作系统相关的诊断信息,另外一方面要定义SQL脚本执行的初始化参数,完成对SQL脚本的调用执行。对于操作系统级别的诊断信息收集,无非是在Shell当中调用操作系统命令,如:uptime、vmstat、free、df、sar、iostat;对于SQL脚本的调用,需要完善相关初始化信息,如位置信息(脚本位置、输入文件位置、数据库安装目录等相关信息),时间信息(执行日期、时间),连接必要信息(用户、数据库名称、标识等)。以下调用过程脚本片段供参考:


function _dbinfo(){
cd $PWDDIRECTORY
su - oracle -c "cd $LOGDIR/db;export ORACLE_SID=$1;sqlplus -s /nolog <

3. 通过Python脚本实现集中自动化调用

对于这个步骤实现的功能有两个重点,一个是完成集中化的优势,一个是完成自动化交互的功能。集中化的优势需要在循环语句当中完成。自动化交互的功能本来就是Python之类脚本语言的优势。只是我们在调用的时候需要考虑的初始化参数的定义和确认。以下调用过程脚本片段供参考:


def ssh_cmd(ip, user, passwd, cmd):
    ret = -1
    ssh = pexpect.spawn('ssh %s@%s "%s"' % (user,ip, cmd),timeout=120)
    try:
        i = ssh.expect(['password:', 'continue connecting (yes/no)?'], timeout=5)
        if i == 0 :
            ssh.sendline(passwd)
        elif i == 1:
            ssh.sendline('yes\n')
            ssh.expect('password: ')
            ssh.sendline(passwd)
        ssh.sendline(cmd)
        r = ssh.read()
    print (r)
        ret = r
    except pexpect.EOF:
        print ("EOF")
        ssh.close()
        ret = -1
    except pexpect.TIMEOUT:
    print ("TIMEOUT")
    ssh.close()
    ret = -2
    return ret

4. 工具化产品化打造过程

大部分数据库管理员设计脚本工具都是为了便于自己的日常管理工作,基本上不会考虑到工具的可维护性、健壮性、规范化以及未来的扩展性。如果想让自己辛苦设计出来的脚本最终可以转化为自动化运维工具,那么还要考虑到以下三个方面的问题:

(1). 遵循开发者规范,将脚本的设计趋向于标准化、模块化。例如脚本注释标准化,利用函数模块化设计,变量定义规范化。

(2). 每一个执行步骤要考虑到执行前的确认步骤,执行后的反馈步骤,执行中的异常处理场景。

(3). 所有参数输入尽量采用变量化设计,所有常量以文件方式隔离出脚本本身。

四、参考脚本

-- -----------------------------------------------------------------------------------
-- File Name    : dailyhealthycheck_withadg_pdb.sql
-- Author       : [email protected]
-- Description  : Daily Healthy Checking for Oracle Rac Database.
-- Requirements : Access to the V$ views.
-- Call Syntax  : by script "healthyCheck_forLinux.sh" or "healthyCheck1_forAIX.sh"
-- Last Modified: 23/05/2017
-- -----------------------------------------------------------------------------------
SET MARKUP HTML ON SPOOL ON ENTMAP OFF PREFORMAT OFF
SET TERM OFF
SET HEADING ON
SET VERIFY OFF
SET FEEDBACK OFF
SET LINE 55555
SET PAGES 999999
SET LONG 999999999
SET LONGCHUNKSIZE 999999

-- -----------------------------------------------------------------------------------
-- SECTION: 巡检脚本初始化
-- -----------------------------------------------------------------------------------
COLUMN dbid new_value spool_dbid
COLUMN inst_num new_value spool_inst_num
SELECT dbid 
  FROM v$database 
 WHERE rownum = 1;

SELECT instance_number AS inst_num 
  FROM v$instance 
 WHERE rownum = 1;

COLUMN spoolfile_name new_value spoolfile
SELECT 'spool_'||(SELECT instance_name FROM v$instance WHERE rownum=1)||'_'||TO_CHAR(SYSDATE,'yy-mm-dd_hh24.mi')||'_daily' AS spoolfile_name 
  FROM dual;
spool &&spoolfile..html


PROMPT 

XXX数据库日巡检报告

-- ----------------------------------------------------------------------------------- -- SECTION: 数据库基本信息 -- ----------------------------------------------------------------------------------- PROMPT

数据库基本信息汇总

/* 数据库标识、数据库名、创建日期、平台名称、唯一名称、归档模式等信息。 */ PROMPT

数据库状态 COLUMN log_mode FOR a12 COLUMN guard_status FOR a10 COLUMN force_logging FOR a15 COLUMN flashback_on FOR a15 COLUMN db_unique_name FOR a10 COLUMN platform_name FOR a20 SELECT name AS db_name, dbid, created, platform_name, db_unique_name, log_mode, guard_status, force_logging, flashback_on FROM v$database; /* 实例的序号、名称、主机名、启动时间、状态、运行时间等。 */ PROMPT

数据库实例基本信息 CLEAR COLUMNS SET LINE 200 COLUMN host_name FOR A50 SELECT instance_name, inst_id, host_name, TO_CHAR(startup_time, 'YYYY/MM/DD HH24:MI:SS') AS startup_time, status, SYSDATE - STARTUP_TIME AS Running_Time FROM gv$instance ORDER BY inst_id; -- ----------------------------------------------------------------------------------- -- SECTION: 联机重做日志信息 -- ----------------------------------------------------------------------------------- PROMPT

redo信息

/* 联机日志的组号、线程号、序列号、文件路径、大小、状态、归档状态、组内成员数量。 */ PROMPT

数据库联机日志 CLEAR COLUMNS SET LINE 200 SET PAGES 1000 COL status FOR a30 COL member FOR a45 SELECT t2.member, t1.group#, t1.thread#, t1.sequence#, t1.bytes / 1024 / 1024 AS SIZE_MB, t1.status, t1.archived, t1.members FROM v$log t1, v$logfile t2 WHERE t1.group# = t2.group# ORDER BY thread#, group#; /* (即可分析6天的波度,又可分析24小时内,可很容易看出异常情况)。 */ PROMPT

最近7天中每天日志切换的量 SELECT SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH:MI:SS'),1,5) Day, SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'00',1,0)) H00, SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'01',1,0)) H01, SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'02',1,0)) H02, SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'03',1,0)) H03, SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'04',1,0)) H04, SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'05',1,0)) H05, SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'06',1,0)) H06, SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'07',1,0)) H07, SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'08',1,0)) H08, SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'09',1,0)) H09, SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'10',1,0)) H10, SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'11',1,0)) H11, SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'12',1,0)) H12, SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'13',1,0)) H13, SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'14',1,0)) H14, SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'15',1,0)) H15, SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'16',1,0)) H16, SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'17',1,0)) H17, SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'18',1,0)) H18, SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'19',1,0)) H19, SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'20',1,0)) H20, SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'21',1,0)) H21, SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'22',1,0)) H22 , SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'23',1,0)) H23, COUNT(*) TOTAL FROM v$log_history a WHERE first_time>=to_char(sysdate-7) GROUP BY SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH:MI:SS'),1,5) ORDER BY SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH:MI:SS'),1,5) DESC; /* 日志切换频率分析(注意观察各行里first_time之间的时间差异会不会很短,很短就是切换过频繁)。 */ PROMPT

日志切换频率分析 SET LINE 200 SET PAGES 1000 SELECT thread#, sequence#, to_char(first_time, 'MM/DD/RR HH24:MI:SS') FROM v$log_history WHERE thread# = 1 AND first_time > sysdate - 1 ORDER BY first_time DESC; SELECT thread#, sequence#, to_char(first_time, 'MM/DD/RR HH24:MI:SS') FROM v$log_history WHERE thread# = 2 AND first_time > sysdate - 1 ORDER BY first_time DESC; -- ----------------------------------------------------------------------------------- -- SECTION: 归档日志信息 -- ----------------------------------------------------------------------------------- PROMPT

归档日志信息

/* 归档日志产生日期,产生的大小,和文件数等。 */ PROMPT

查询归档产生状况 SELECT lpad(to_char(first_time, 'yyyymmdd'), 12) "DATE", trunc(sum(blocks * block_size) / 1024 / 1024) "SIZE(MB)", count(*) FROM v$archived_log WHERE first_time > sysdate - 6 AND creator = 'ARCH' GROUP BY lpad(to_char(first_time, 'yyyymmdd'), 12) ORDER BY 1; -- ----------------------------------------------------------------------------------- -- SECTION: 空间使用 -- ----------------------------------------------------------------------------------- PROMPT

空间信息

/* 表空间名称,类型,区管理方式,区初始大小,段的管理方式,总大小,剩余大小,剩余百分比。 */ PROMPT

数据库表空间信息 CLEAR COLUMNS COLUMN tablespace_name FOR a20 COLUMN contents FOR a9 COLUMN ext_mgmt FOR a12 COLUMN alloc_type FOR a9 COLUMN ext_kb FOR 9999999 COLUMN ssm FOR a10 COLUMN total_gb FOR 99999999.99 COLUMN free_gb FOR 99999999.99 COLUMN free_pct FOR 999.99 SET PAGES 100 SELECT c.tablespace_name, c.contents, c.extent_management ext_mgmt, c.allocation_type alloc_type, c.initial_extent / 1024 ext_kb, c.segment_space_management SSM, nvl(a.total_gb, 0) total_gb, decode(c.contents, 'TEMPORARY', nvl(a.total_gb, 0) - c.block_size * b.used_blocks / 1073741824, nvl(b.free_gb, 0)) free_gb, 100 * decode(c.contents, 'TEMPORARY', nvl(a.total_gb, 0) - c.block_size * b.used_blocks / 1073741824, nvl(b.free_gb, 0)) / nvl(a.total_gb, 1) free_pct FROM (SELECT tablespace_name, nvl(sum(bytes) / 1073741824, 0) total_gb FROM dba_data_files GROUP BY tablespace_name UNION all SELECT tablespace_name, nvl(sum(bytes) / 1073741824, 0) total_gb FROM dba_temp_files GROUP BY tablespace_name) a, (SELECT tablespace_name, nvl(sum(bytes) / 1073741824, 0) free_gb, 0 used_blocks FROM dba_free_space GROUP BY tablespace_name UNION all SELECT tablespace_name, 0 free_gb, nvl(sum(used_blocks), 0) used_blocks FROM gv$sort_segment GROUP BY tablespace_name) b, dba_tablespaces c WHERE c.tablespace_name = b.tablespace_name(+) AND c.tablespace_name = a.tablespace_name(+) ORDER BY c.contents, free_pct, c.tablespace_name; PROMPT

数据库表空间与数据文件 /* 表空间文件对应的表空间,是否自动扩展,当前值,最大值。 */ SET LINE 200 SET PAGES 100 COLUMN file_name FOR A65 COLUMN tablespace_name FOR A30 SELECT file_id, file_name, tablespace_name, autoextensible, bytes / 1073741824 as current_gb, maxbytes / 1073741824 as max_gb FROM dba_data_files UNION SELECT file_id, file_name, tablespace_name as ts_name, autoextensible, bytes / 1073741824 as cur_gb, maxbytes / 1073741824 as max_gb FROM dba_temp_files ORDER BY tablespace_name, file_id, file_name; -- ----------------------------------------------------------------------------------- -- SECTION: 存储信息 -- ----------------------------------------------------------------------------------- PROMPT

ASM信息

/* 磁盘组的序号、名称、冗余类型、总大小、剩余大小、使用频繁的数据大小、镜像需要的空间、可用于镜像的大小、offline的磁盘数量。 */ PROMPT

查询ASM磁盘组信息 CLEAR COLUMNS COLUMN group_number FOR 9999999999 COLUMN name FOR A20 COLUMN type FOR A20 COLUMN total_mb FOR 9999999999 COLUMN free_mb FOR 9999999999 COLUMN hot_used_mb FOR 9999999999 COLUMN required_mirror_free_mb FOR 9999999999 COLUMN usable_file_mb FOR 9999999999 COLUMN offline_disks FOR 9999999999 SELECT group_number, name, type, total_mb, free_mb, hot_used_mb, required_mirror_free_mb, usable_file_mb, offline_disks FROM v$asm_diskgroup_stat; /* 磁盘组对应磁盘的信息。 */ PROMPT

查询ASM磁盘组存储信息 CLEAR COLUMNS COLUMN g_number FOR 9999 COLUMN d_number FOR 9999 COLUMN g_name FOR A8 COLUMN d_name FOR A15 COLUMN g_total_mb FOR 99999999 COLUMN d_total_mb FOR 99999999 COLUMN path FOR A30 COLUMN h_status FOR A8 COLUMN g_free_mb FOR 99999999 COLUMN d_free_mb FOR 99999999 SELECT g.group_number g_number, g.name g_name, g.total_mb g_total_mb, g.free_mb g_free_mb, g.state g_state, d.disk_number d_number, d.name d_name, d.path, d.total_mb d_total_mb, d.free_mb d_free_mb, d.header_status h_status, d.mode_status m_status FROM v$asm_diskgroup_stat g, v$asm_disk_stat d WHERE g.group_number = d.group_number ORDER BY g.group_number, d.disk_number; -- ----------------------------------------------------------------------------------- -- SECTION: 闪回区信息 -- ----------------------------------------------------------------------------------- PROMPT

闪回信息

/* 文件类型、空间使用百分比、可回收百分比、文件的数量。 */ PROMPT

闪回区空间使用状况 CLEAR COLUMNS SET LINE 100 SELECT * FROM v$flash_recovery_area_usage; -- ----------------------------------------------------------------------------------- -- SECTION: 备份信息 -- ----------------------------------------------------------------------------------- PROMPT

数据库备份信息

/* 会话信息标识,备份内容,开始时间,结束时间,消耗时间。 */ PROMPT

查询备份信息 CLEAR COLUMNS SET PAGES 1000 COLUMN time_taken_display FOR a10 COLUMN start_time FOR a20 COLUMN end_time FOR a20 COLUMN status FOR a15 COLUMN tt FOR a10 SELECT session_key, start_time, end_time, status, time_taken_display tt FROM v$rman_backup_job_details WHERE start_time > sysdate -1 ORDER BY session_key; -- ----------------------------------------------------------------------------------- -- SECTION: 会话信息 -- ----------------------------------------------------------------------------------- PROMPT

session情况

/* 查询登录异常的会话。 */ PROMPT

登录时间最长的会话 SET LINE 500 CLEAR COLUMNS COLUMN spid for a15 COLUMN sql_id for a20 COLUMN program for a40 COLUMN event for a35 SELECT * FROM (SELECT t.sid, t.serial#, trunc(sysdate - logon_time) AS online_time, t.PROGRAM, t.status, t.LOGON_TIME, t.sql_id, t.prev_sql_id, t.event FROM gv$session t WHERE t.type <> 'BACKGROUND' AND program is not null ORDER BY logon_time) WHERE rownum <= 30; PROMPT

查看进程资源限制 /* 资源所在的实例,资源的名称,当前大小,上次启动以来的最大值,初始大小 */ CLEAR COLUMNS SET PAGES 100 SET LINE 200 COLUMN resource_name FOR a30 COLUMN current_utilization FOR 9999999 COLUMN max_utilization FOR 9999999 COLUMN initial_allocation FOR a18 COLUMN limit_value FOR a15 SELECT * FROM gv$resource_limit WHERE trim(limit_value) != 'UNLIMITED'; -- ----------------------------------------------------------------------------------- -- SECTION: 等待事件 -- ----------------------------------------------------------------------------------- prompt 24小时内CPU等待最长的事件 CLEAR COLUMNS SET PAGES 100 SET LINE 800 COLUMN user_id for a45 COLUMN SQL_ID for a30 COLUMN Event for a45 COLUMN sql_id for a45 COLUMN user_id for a45 COLUMN cnt for a45 COLUMN machine for a45 COLUMN program for a45 SELECT t.*, s.sid, s.serial#, s.machine, s.program, s.osuser FROM (SELECT c.USERNAME, a.event, to_char(a.cnt) as seconds, a.sql_id, dbms_lob.substr(b.sql_fulltext, 100, 1) sqltext FROM (SELECT rownum rn, t.* FROM (SELECT decode(s.session_state, 'WAITING', s.event, 'Cpu + Wait For Cpu') Event, s.sql_id, s.user_id, count(*) cnt FROM v$active_session_history s WHERE sample_time > sysdate - 1 GROUP BY s.user_id, decode(s.session_state, 'WAITING', s.event, 'Cpu + Wait For Cpu'), s.sql_id ORDER BY cnt DESC) t WHERE rownum < 20) a, v$sqlarea b, dba_users c WHERE a.sql_id = b.sql_id AND a.user_id = c.user_id ORDER BY cnt DESC) t, v$session s WHERE t.sql_id = s.sql_id(+); -- ----------------------------------------------------------------------------------- -- SECTION: 内存信息 -- ----------------------------------------------------------------------------------- PROMPT

04031报错

PROMPT

04031报错(出错时,检查shared_pool_reserved池是否有报错) SELECT free_space, avg_free_size, used_space, avg_used_size, request_failures, last_failure_size FROM v$shared_pool_reserved; -- ----------------------------------------------------------------------------------- -- SECTION: SQL信息 -- ----------------------------------------------------------------------------------- PROMPT

当前SQL

PROMPT

执行次数最多 SET LINE 300 SELECT * FROM (SELECT sql_id, sql_text, s.executions, s.last_load_time, s.first_load_time, s.disk_reads, s.buffer_gets, s.parse_calls FROM v$sql s ORDER BY s.executions DESC) WHERE rownum <= 10; PROMPT

解析次数最多 SET LINE 300 SELECT * FROM (SELECT sql_id, sql_text, s.EXECUTIONS, s.LAST_LOAD_TIME, s.FIRST_LOAD_TIME, s.DISK_READS, s.BUFFER_GETS, s.PARSE_CALLS FROM v$sql s ORDER BY s.PARSE_CALLS DESC) WHERE rownum <= 10; -- ----------------------------------------------------------------------------------- -- SECTION: ADG -- ----------------------------------------------------------------------------------- PROMPT

ADG信息

/* 归档目标名称、状态、数据库当前模式、目的地路径 */ PROMPT

归档目的地状态 column dest_name FOR a20 column status FOR a10 column database_mode FOR a20 column destination FOR a20 SELECT dest_name, status, database_mode, destination FROM v$archive_dest_status WHERE dest_id in ('1','2'); /* 每个节点,应用日志的最大号,和scn最大值和最小值 */ PROMPT

查询最大的应用日志号 CLEAR COLUMNS SET LINE 150 SELECT m.thread#, m.sequence#, first_change#, next_change# FROM v$log_history m, (SELECT thread#, max(sequence#) as sequence# FROM v$log_history GROUP BY thread#) t WHERE m.thread# = t.thread# AND m.sequence# = t.sequence#; /* 查询每个节点最大的归档日志号 */ PROMPT

查询最大的归档日志号 SET LINE 100 SELECT UNIQUE thread# AS thread, MAX(sequence#) OVER (PARTITION BY thread#) AS last FROM v$archived_log; /* PROMPT

查询GAP 备库缺少日志的起始号码 clear columns set line 200 SELECT THREAD#,LOW_SEQUENCE#,HIGH_SEQUENCE# FROM V$ARCHIVE_GAP; */ PROMPT
Back to Top PROMPT

End of Report

PROMPT

SPOOL OFF exit; #!/bin/bash #----------------------------------------------------------------------------------------------------+ # [email protected] | #----------------------------------------------------------------------------------------------------| # COPYRIGHT (C) 1998-2015 HAIZDL ALL RIGHTS RESERVED. | #----------------------------------------------------------------------------------------------------| # DATABASE : ORACLE | # FILE : healthyCheck1_forLinux.sh | # CLASS : DATABASE ADMINISTRATION | # PURPOSE : COLLECTING LOGS FOR ORACLE DATABASE DAILY CHECKING. | # NOTE : AS WITH ANY CODE, ENSURE TO TEST THIS SCRIPT IN A DEVELOPMENT | # ENVIRONMENT BEFORE ATTEMPTING TO RUN IT IN PRODUCTION. | #----------------------------------------------------------------------------------------------------+ #---------------------------------------------- #CHECK JOB OPERATOR USER INFO. #---------------------------------------------- function _userVerify(){ USER=`whoami` if [ "$USER" != "root" ]; then echo "PLS USE ROOT USER FOR THIS JOB." echo "USAGE: $0" exit 1 fi } #---------------------------------------------- #BASIC VARIABLE. #---------------------------------------------- HOST=`hostname` PWDDIRECTORY=/tmp/ora DATESTRING=`date '+%Y%m%d'` LOGDIR=$PWDDIRECTORY/$DATESTRING$HOST ASMSID=+ASM1 #LOG DIRECTORY PARAMETER OINSTALLDIRECTORY=/home/u01/app RDBMSDIRECTORY=$OINSTALLDIRECTORY/oracle/diag/rdbms GRIDLOGDIRECTORY=$OINSTALLDIRECTORY/11.2.0.4/grid/log GRIDDIRECTORY=$OINSTALLDIRECTORY/grid ALERTOFCLUSTER=$GRIDLOGDIRECTORY/$HOST/alert$HOST.log ALERTOFLISTENER=$GRIDDIRECTORY/diag/tnslsnr/$HOST/listener/trace/listener.log ALERTOFLISTENERSCAN=$GRIDLOGDIRECTORY/diag/tnslsnr/$HOST/listener_scan1/trace/listener_scan1.log ALERTOFASM=$GRIDDIRECTORY/diag/asm/+asm/$ASMSID/trace/alert_$ASMSID.log OPATCHORACLEDIR=$OINSTALLDIRECTORY/oracle/product/11.2.0.4/dbhome_1/OPatch OPATCHGRIDDIR=$OINSTALLDIRECTORY/11.2.0.4/grid/OPatch #DIAGNOSTIC FILES LOGOFOSINFO=$LOGDIR/os/info_os_`date '+%y%m%d%H%M'` LOGOFCLUSTER=$LOGDIR/cluster/info_cluster_`date '+%y%m%d%H%M'` DBFILE=$LOGDIR/dbs.tmp #---------------------------------------------- #CREATE LOG INFO DIRECTORY. #---------------------------------------------- function _initialization(){ if [ -d $LOGDIR ]; then echo "INIT: LOG DIRECTORY EXISTS, DELETING." rm -rf $LOGDIR fi echo "INIT: LOG DIRECTORY CREATING." mkdir $LOGDIR mkdir $LOGDIR/os mkdir $LOGDIR/cluster mkdir $LOGDIR/db chmod -R 777 $LOGDIR if [ $? -ne 0 ]; then echo "INIT: LOG DIRECTORY CREATING ERROR!" exit 1 fi } #----------------------------------------------------------------------------------------------------| # 1. CLUSTER LOG. | #----------------------------------------------------------------------------------------------------| function _clusterlog(){ cd $PWDDIRECTORY if [ -e $ALERTOFCLUSTER ]; then ALERTOFCLUSTERNUM=$(($(awk '{print NR}' $ALERTOFCLUSTER |tail -n1) - $(cat -n $ALERTOFCLUSTER| grep "`date -d last-day +%Y-%m-%d|cut -c 1-10`" | awk 'BEGIN {min=99999999; max=-1;} {if(min>$1) min=$1; if(max<$1) max=$1;} END {printf("%d", min);}') +1)) if [ $ALERTOFCLUSTERNUM -gt 0 ]; then tail -$ALERTOFCLUSTERNUM $ALERTOFCLUSTER >>$LOGDIR/cluster/alert$HOST_`date '+%y%m%d'`.log if [ $? -ne 0 ]; then echo " JOB-1-1: CLUSTER ALERT LOG COPY ERROR!" exit 1 fi fi fi if [ -e $ALERTOFLISTENER ]; then ALERTOFLISTENERNUM=$(($(awk '{print NR}' $ALERTOFLISTENER |tail -n1) - $(cat -n $ALERTOFLISTENER| grep "`date -d last-day "+%d-%^b-%Y"`" | awk 'BEGIN {min=9999999; max=-1;} {if(min>$1) min=$1; if(max<$1) max=$1;} END {printf("%d", min);}') +1)) if [ $ALERTOFLISTENERNUM -gt 0 ]; then tail -$ALERTOFLISTENERNUM $ALERTOFLISTENER >>$LOGDIR/cluster/alert_listener_$HOST_`date '+%y%m%d'`.log if [ $? -ne 0 ]; then echo " JOB-1-2: CLUSTER ALERT LOG COPY ERROR!" exit 1 fi fi fi if [ -e $ALERTOFLISTENERSCAN ]; then ALERTOFLISTENERSCANNUM=$(($(awk '{print NR}' $ALERTOFLISTENERSCAN |tail -n1) - $(cat -n $ALERTOFLISTENERSCAN| grep "`date -d last-day "+%d-%^b-%Y"`" | awk 'BEGIN {min=9999999; max=-1;} {if(min>$1) min=$1; if(max<$1) max=$1;} END {printf("%d", min);}') +1)) if [ $ALERTOFLISTENERSCANNUM -gt 0 ]; then tail -$ALERTOFLISTENERSCANNUM $ALERTOFLISTENERSCAN >>$LOGDIR/cluster/alert_listener_scan1_`date '+%y%m%d'`.log if [ $? -ne 0 ]; then echo " JOB-1-3: CLUSTER ALERT LOG COPY ERROR!" exit 1 fi fi fi if [ -e $ALERTOFASM ]; then ALERTOFASMNUM=$(($(awk '{print NR}' $ALERTOFASM |tail -n1) - $(cat -n $ALERTOFASM| grep "`date -d last-day|cut -c 1-8``date -d last-day +%d`" | awk 'BEGIN {min=9999999; max=-1;} {if(min>$1) min=$1; if(max<$1) max=$1;} END {printf("%d", min);}') +1)) if [ $ALERTOFASMNUM -gt 0 ]; then tail -$ALERTOFASMNUM $ALERTOFASM >>$LOGDIR/cluster/alert_$ASMSID_`date '+%y%m%d'`.log if [ $? -ne 0 ]; then echo " JOB-1-4: CLUSTER ALERT LOG COPY ERROR!" exit 1 fi fi fi echo " " >>$LOGOFCLUSTER echo " " >>$LOGOFCLUSTER } #----------------------------------------------------------------------------------------------------| # 2. OS INFO. | #----------------------------------------------------------------------------------------------------| function _osinfo(){ echo "---------------------------------------------------------------" >>$LOGOFOSINFO echo "- UPTIME:" >>$LOGOFOSINFO echo "---------------------------------------------------------------" >>$LOGOFOSINFO uptime >>$LOGOFOSINFO echo " " >>$LOGOFOSINFO echo "---------------------------------------------------------------" >>$LOGOFOSINFO echo "- VMSTAT:" >>$LOGOFOSINFO echo "---------------------------------------------------------------" >>$LOGOFOSINFO vmstat 1 5 >>$LOGOFOSINFO echo " " >>$LOGOFOSINFO echo "---------------------------------------------------------------" >>$LOGOFOSINFO echo "- POWERMT:" >>$LOGOFOSINFO echo "---------------------------------------------------------------" >>$LOGOFOSINFO powermt display dev=all >>$LOGOFOSINFO echo " " >>$LOGOFOSINFO echo "---------------------------------------------------------------" >>$LOGOFOSINFO echo "- FREE:" >>$LOGOFOSINFO echo "---------------------------------------------------------------" >>$LOGOFOSINFO free -m >>$LOGOFOSINFO echo " " >>$LOGOFOSINFO echo "---------------------------------------------------------------" >>$LOGOFOSINFO echo "- DF:" >>$LOGOFOSINFO echo "---------------------------------------------------------------" >>$LOGOFOSINFO df -h >>$LOGOFOSINFO echo " " >>$LOGOFOSINFO echo "---------------------------------------------------------------" >>$LOGOFOSINFO echo "- SAR-PAGE:" >>$LOGOFOSINFO echo "---------------------------------------------------------------" >>$LOGOFOSINFO sar -B 1 5 >>$LOGOFOSINFO echo " " >>$LOGOFOSINFO echo "---------------------------------------------------------------" >>$LOGOFOSINFO echo "- SAR-CPU:" >>$LOGOFOSINFO echo "---------------------------------------------------------------" >>$LOGOFOSINFO sar -u 1 5 >>$LOGOFOSINFO echo " " >>$LOGOFOSINFO echo "---------------------------------------------------------------" >>$LOGOFOSINFO echo "- IOSTAT:" >>$LOGOFOSINFO echo "---------------------------------------------------------------" >>$LOGOFOSINFO iostat -mx 1 5 | grep power >>$LOGOFOSINFO echo " " >>$LOGOFOSINFO echo "---------------------------------------------------------------" >>$LOGOFOSINFO echo "- SAR-NETWORK:" >>$LOGOFOSINFO echo "---------------------------------------------------------------" >>$LOGOFOSINFO sar -n DEV 1 5 | grep bond >>$LOGOFOSINFO echo " " >>$LOGOFOSINFO if [ $? -ne 0 ]; then echo " JOB-2-1: OS INFO ERROR!" exit 1 fi LOGOFSYS=/var/log/messages if [ -e $LOGOFSYS ]; then cp $LOGOFSYS $LOGDIR/os if [ $? -ne 0 ]; then echo " JOB-2-2: OS INFO ERROR!" exit 1 fi fi } #----------------------------------------------------------------------------------------------------| # 3. CLUSTER INFO. | #----------------------------------------------------------------------------------------------------| function _clusterinfo(){ echo "---------------------------------------------------------------" >>$LOGOFCLUSTER echo "- 查看集群资源状态:" >>$LOGOFCLUSTER echo "---------------------------------------------------------------" >>$LOGOFCLUSTER echo " " >>$LOGOFCLUSTER su - grid -c "crsctl status res -t " >>$LOGOFCLUSTER echo " " >>$LOGOFCLUSTER echo "---------------------------------------------------------------" >>$LOGOFCLUSTER echo "- 查看集群服务状态:" >>$LOGOFCLUSTER echo "---------------------------------------------------------------" >>$LOGOFCLUSTER echo " " >>$LOGOFCLUSTER su - grid -c "crsctl check cluster -all" >>$LOGOFCLUSTER echo " " >>$LOGOFCLUSTER echo "---------------------------------------------------------------" >>$LOGOFCLUSTER echo "- 查看集群监听状态:" >>$LOGOFCLUSTER echo "---------------------------------------------------------------" >>$LOGOFCLUSTER echo " " >>$LOGOFCLUSTER echo "#1 本地监听 ">>$LOGOFCLUSTER su - grid -c "lsnrctl status" >>$LOGOFCLUSTER echo "#2 SCAN监听 " >>$LOGOFCLUSTER su - grid -c "lsnrctl status LISTENER_SCAN1" >>$LOGOFCLUSTER echo " " >>$LOGOFCLUSTER echo "---------------------------------------------------------------" >>$LOGOFCLUSTER echo "- 查看磁盘组信息及状态 ">>$LOGOFCLUSTER echo "---------------------------------------------------------------" >>$LOGOFCLUSTER echo "- #1 ">>$LOGOFCLUSTER su - grid -c "asmcmd lsdg" >>$LOGOFCLUSTER echo "---------------------------------------------------------------" >>$LOGOFCLUSTER echo "- 查看OCR信息 ">>$LOGOFCLUSTER echo "---------------------------------------------------------------" >>$LOGOFCLUSTER echo "- #1 心跳盘 ">>$LOGOFCLUSTER su - grid -c "crsctl query css votedisk" >>$LOGOFCLUSTER echo "- #2 一致性检查 " >>$LOGOFCLUSTER su - grid -c "ocrcheck" >>$LOGOFCLUSTER echo " " >>$LOGOFCLUSTER } #----------------------------------------------------------------------------------------------------| # 4. DB INFO. | #----------------------------------------------------------------------------------------------------| function _dbinfo(){ cd $PWDDIRECTORY su - oracle -c "cd $LOGDIR/db;export ORACLE_SID=$1;sqlplus -s /nolog <$1) min=$1; if(max<$1) max=$1;} END {printf("%d", min);}') +1)) if [ $ALERTOFDBNUM -gt 0 ]; then tail -$ALERTOFDBNUM $ALERTOFDB >>$LOGDIR/db/alert_$1_`date '+%y%m%d_%H:%M:%S'`.log if [ $? -ne 0 ]; then echo " JOB-5: DB LOG ERROR!" exit 1 fi fi fi } #----------------------------------------------------------------------------------------------------| # 6. RMAN INFO. | #----------------------------------------------------------------------------------------------------| function _rmaninfo(){ cd $PWDDIRECTORY LOGOFRMAN=$LOGDIR/rman_$1-`date '+%y%m%d_%H:%M:%S'`.log su - oracle -c "export ORACLE_SID=$1;rman target / <$LOGOFRMAN show all; list backup of database summary; exit; EOF" if [ $? -ne 0 ]; then echo "JOB-6: RMAN INFO ERROR!" exit 1 fi } #----------------------------------------------------------------------------------------------------| # 7. AWR. | #----------------------------------------------------------------------------------------------------| function _awrinfo(){ cd $PWDDIRECTORY su - oracle -c "cd $LOGDIR/db;mkdir ./$1AWR;cd $1AWR;export ORACLE_SID=$1;sqlplus -silent /nolog < $LOGDIR/html.tmp cat $LOGDIR/html.tmp | while read FNAME do sed -i 's/GBK/UTF8/g' $FNAME done rm -rf $LOGDIR/html.tmp rm -f *$HOST.tar.gz if [ -d $DATESTRING$HOST ]; then tar -czvf $DATESTRING-$HOST.tar.gz ./$DATESTRING$HOST rm -rf ./$DATESTRING$HOST else echo " JOB-8: TAR ERROR." fi } #----------------------------------------------------------------------------------------------------| # MAIN(). | #----------------------------------------------------------------------------------------------------| echo "=========================================================" echo "= JOB-NAME: $0" echo "=========================================================" _userVerify _initialization echo "----------------------------------------------------" echo "- JOB-1: CLUSTER LOG." echo "----------------------------------------------------" _clusterlog echo "----------------------------------------------------" echo "- JOB-2: OS INFO." echo "----------------------------------------------------" _osinfo echo "----------------------------------------------------" echo "- JOB-3: CLUSTER INFO." echo "----------------------------------------------------" _clusterinfo echo "----------------------------------------------------" echo "- JOB-(4-7): DB JOBs." echo "----------------------------------------------------" ps -ef | grep ora_pmon | grep -v grep | awk '{print $8}' | awk -F "_" '{print $3}' >$DBFILE if [ -e $DBFILE ]; then cat $DBFILE | while read DBSID do echo "- # JOB-4: DB INFO." _dbinfo $DBSID echo "- # JOB-5: DB LOG." _dblog $DBSID done else echo " JOB-4-7: NO DBSID FILES ERROR." fi rm -rf $DBFILE echo "----------------------------------------------------" echo "- JOB-8: LOG TAR." echo "----------------------------------------------------" _logtar

#!/bin/bash#----------------------------------------------------------------------------------------------------+#                      [email protected]                                                                |#----------------------------------------------------------------------------------------------------|#      COPYRIGHT (C) 1998-2015 HAIZDL ALL RIGHTS RESERVED.                                         |#----------------------------------------------------------------------------------------------------|# DATABASE : ORACLE                                                                                  |# FILE     : healthyCheck1_forLinux.sh                                                                          |# CLASS    : DATABASE ADMINISTRATION                                                                 |# PURPOSE  : COLLECTING LOGS FOR ORACLE DATABASE DAILY CHECKING.                                     |# NOTE     : AS WITH ANY CODE, ENSURE TO TEST THIS SCRIPT IN A DEVELOPMENT                           |#            ENVIRONMENT BEFORE ATTEMPTING TO RUN IT IN PRODUCTION.                                  |#----------------------------------------------------------------------------------------------------+#----------------------------------------------#CHECK JOB OPERATOR USER INFO.#----------------------------------------------function _userVerify(){    USER=`whoami`    if [ "$USER" != "root" ]; then        echo "PLS USE ROOT USER FOR THIS JOB."        echo "USAGE: $0"        exit 1    fi}#----------------------------------------------#BASIC VARIABLE.#----------------------------------------------HOST=`hostname`PWDDIRECTORY=/tmp/oraDATESTRING=`date '+%Y%m%d'`LOGDIR=$PWDDIRECTORY/$DATESTRING$HOSTASMSID=+ASM1#LOG DIRECTORY PARAMETEROINSTALLDIRECTORY=/home/u01/appRDBMSDIRECTORY=$OINSTALLDIRECTORY/oracle/diag/rdbmsGRIDLOGDIRECTORY=$OINSTALLDIRECTORY/11.2.0.4/grid/logGRIDDIRECTORY=$OINSTALLDIRECTORY/gridALERTOFCLUSTER=$GRIDLOGDIRECTORY/$HOST/alert$HOST.logALERTOFLISTENER=$GRIDDIRECTORY/diag/tnslsnr/$HOST/listener/trace/listener.logALERTOFLISTENERSCAN=$GRIDLOGDIRECTORY/diag/tnslsnr/$HOST/listener_scan1/trace/listener_scan1.logALERTOFASM=$GRIDDIRECTORY/diag/asm/+asm/$ASMSID/trace/alert_$ASMSID.logOPATCHORACLEDIR=$OINSTALLDIRECTORY/oracle/product/11.2.0.4/dbhome_1/OPatchOPATCHGRIDDIR=$OINSTALLDIRECTORY/11.2.0.4/grid/OPatch#DIAGNOSTIC FILESLOGOFOSINFO=$LOGDIR/os/info_os_`date '+%y%m%d%H%M'`LOGOFCLUSTER=$LOGDIR/cluster/info_cluster_`date '+%y%m%d%H%M'`DBFILE=$LOGDIR/dbs.tmp#----------------------------------------------#CREATE LOG INFO DIRECTORY.#----------------------------------------------function _initialization(){    if [ -d $LOGDIR ]; then            echo "INIT: LOG DIRECTORY EXISTS, DELETING."            rm -rf $LOGDIR    fi    echo "INIT: LOG DIRECTORY CREATING."    mkdir $LOGDIR    mkdir $LOGDIR/os    mkdir $LOGDIR/cluster    mkdir $LOGDIR/db    chmod -R 777 $LOGDIR    if [ $? -ne 0 ]; then            echo "INIT: LOG DIRECTORY CREATING ERROR!"            exit 1    fi}#----------------------------------------------------------------------------------------------------|#      1. CLUSTER LOG.                                                                               |#----------------------------------------------------------------------------------------------------|function _clusterlog(){    cd $PWDDIRECTORY    if [ -e $ALERTOFCLUSTER ]; then            ALERTOFCLUSTERNUM=$(($(awk '{print NR}' $ALERTOFCLUSTER |tail -n1) - $(cat -n $ALERTOFCLUSTER| grep "`date -d last-day +%Y-%m-%d|cut -c 1-10`" | awk 'BEGIN {min=99999999; max=-1;} {if(min>$1) min=$1; if(max<$1) max=$1;} END {printf("%d", min);}') +1))            if [ $ALERTOFCLUSTERNUM -gt 0 ]; then                tail -$ALERTOFCLUSTERNUM $ALERTOFCLUSTER >>$LOGDIR/cluster/alert$HOST_`date '+%y%m%d'`.log                if [ $? -ne 0 ]; then                    echo "   JOB-1-1: CLUSTER ALERT LOG COPY ERROR!"                    exit 1                fi            fi    fi    if [ -e $ALERTOFLISTENER ]; then            ALERTOFLISTENERNUM=$(($(awk '{print NR}' $ALERTOFLISTENER |tail -n1) - $(cat -n $ALERTOFLISTENER| grep "`date -d last-day "+%d-%^b-%Y"`" | awk 'BEGIN {min=9999999; max=-1;} {if(min>$1) min=$1; if(max<$1) max=$1;} END {printf("%d", min);}') +1))            if [ $ALERTOFLISTENERNUM -gt 0 ]; then                tail -$ALERTOFLISTENERNUM $ALERTOFLISTENER >>$LOGDIR/cluster/alert_listener_$HOST_`date '+%y%m%d'`.log                if [ $? -ne 0 ]; then                    echo "   JOB-1-2: CLUSTER ALERT LOG COPY ERROR!"                    exit 1                fi            fi    fi    if [ -e $ALERTOFLISTENERSCAN ]; then            ALERTOFLISTENERSCANNUM=$(($(awk '{print NR}' $ALERTOFLISTENERSCAN |tail -n1) - $(cat -n $ALERTOFLISTENERSCAN| grep "`date -d last-day "+%d-%^b-%Y"`" | awk 'BEGIN {min=9999999; max=-1;} {if(min>$1) min=$1; if(max<$1) max=$1;} END {printf("%d", min);}') +1))            if [ $ALERTOFLISTENERSCANNUM -gt 0 ]; then                tail -$ALERTOFLISTENERSCANNUM $ALERTOFLISTENERSCAN >>$LOGDIR/cluster/alert_listener_scan1_`date '+%y%m%d'`.log                if [ $? -ne 0 ]; then                    echo "   JOB-1-3: CLUSTER ALERT LOG COPY ERROR!"                    exit 1                fi            fi    fi    if [ -e $ALERTOFASM ]; then            ALERTOFASMNUM=$(($(awk '{print NR}' $ALERTOFASM |tail -n1) - $(cat -n $ALERTOFASM| grep "`date -d last-day|cut -c 1-8``date -d last-day +%d`" | awk 'BEGIN {min=9999999; max=-1;} {if(min>$1) min=$1; if(max<$1) max=$1;} END {printf("%d", min);}') +1))            if [ $ALERTOFASMNUM -gt 0 ]; then                tail -$ALERTOFASMNUM $ALERTOFASM >>$LOGDIR/cluster/alert_$ASMSID_`date '+%y%m%d'`.log                if [ $? -ne 0 ]; then                    echo "   JOB-1-4: CLUSTER ALERT LOG COPY ERROR!"                    exit 1                fi            fi    fi    echo "                                                               " >>$LOGOFCLUSTER    echo "                                                               " >>$LOGOFCLUSTER}#----------------------------------------------------------------------------------------------------|#      2. OS INFO.                                                                                   |#----------------------------------------------------------------------------------------------------|function _osinfo(){    echo "---------------------------------------------------------------" >>$LOGOFOSINFO    echo "- UPTIME:"                                                       >>$LOGOFOSINFO    echo "---------------------------------------------------------------" >>$LOGOFOSINFO    uptime                                                                 >>$LOGOFOSINFO    echo "                                                               " >>$LOGOFOSINFO    echo "---------------------------------------------------------------" >>$LOGOFOSINFO    echo "- VMSTAT:"                                                       >>$LOGOFOSINFO    echo "---------------------------------------------------------------" >>$LOGOFOSINFO    vmstat 1 5                                                             >>$LOGOFOSINFO    echo "                                                               " >>$LOGOFOSINFO     echo "---------------------------------------------------------------" >>$LOGOFOSINFO    echo "- POWERMT:"                                                      >>$LOGOFOSINFO    echo "---------------------------------------------------------------" >>$LOGOFOSINFO    powermt display dev=all                                                >>$LOGOFOSINFO    echo "                                                               " >>$LOGOFOSINFO     echo "---------------------------------------------------------------" >>$LOGOFOSINFO    echo "- FREE:"                                                         >>$LOGOFOSINFO    echo "---------------------------------------------------------------" >>$LOGOFOSINFO    free -m                                                                >>$LOGOFOSINFO    echo "                                                               " >>$LOGOFOSINFO     echo "---------------------------------------------------------------" >>$LOGOFOSINFO    echo "- DF:"                                                           >>$LOGOFOSINFO    echo "---------------------------------------------------------------" >>$LOGOFOSINFO    df -h                                                                  >>$LOGOFOSINFO    echo "                                                               " >>$LOGOFOSINFO    echo "---------------------------------------------------------------" >>$LOGOFOSINFO    echo "- SAR-PAGE:"                                                     >>$LOGOFOSINFO    echo "---------------------------------------------------------------" >>$LOGOFOSINFO    sar -B 1 5                                                             >>$LOGOFOSINFO    echo "                                                               " >>$LOGOFOSINFO     echo "---------------------------------------------------------------" >>$LOGOFOSINFO    echo "- SAR-CPU:"                                                      >>$LOGOFOSINFO    echo "---------------------------------------------------------------" >>$LOGOFOSINFO    sar -u 1 5                                                             >>$LOGOFOSINFO    echo "                                                               " >>$LOGOFOSINFO     echo "---------------------------------------------------------------" >>$LOGOFOSINFO    echo "- IOSTAT:"                                                       >>$LOGOFOSINFO    echo "---------------------------------------------------------------" >>$LOGOFOSINFO    iostat -mx 1 5 | grep power                                            >>$LOGOFOSINFO    echo "                                                               " >>$LOGOFOSINFO     echo "---------------------------------------------------------------" >>$LOGOFOSINFO    echo "- SAR-NETWORK:"                                                  >>$LOGOFOSINFO    echo "---------------------------------------------------------------" >>$LOGOFOSINFO    sar -n DEV 1 5 | grep bond                                             >>$LOGOFOSINFO    echo "                                                               " >>$LOGOFOSINFO     if [ $? -ne 0 ]; then        echo "   JOB-2-1: OS INFO ERROR!"        exit 1    fi    LOGOFSYS=/var/log/messages    if [ -e $LOGOFSYS ]; then            cp $LOGOFSYS $LOGDIR/os            if [ $? -ne 0 ]; then                echo "   JOB-2-2: OS INFO ERROR!"                exit 1            fi    fi}#----------------------------------------------------------------------------------------------------|#      3. CLUSTER INFO.                                                                              |#----------------------------------------------------------------------------------------------------|function _clusterinfo(){    echo "---------------------------------------------------------------" >>$LOGOFCLUSTER    echo "- 查看集群资源状态:"                                             >>$LOGOFCLUSTER    echo "---------------------------------------------------------------" >>$LOGOFCLUSTER    echo "                                                               " >>$LOGOFCLUSTER    su - grid -c "crsctl status res -t "                                   >>$LOGOFCLUSTER    echo "                                                               " >>$LOGOFCLUSTER    echo "---------------------------------------------------------------" >>$LOGOFCLUSTER    echo "- 查看集群服务状态:"                                             >>$LOGOFCLUSTER    echo "---------------------------------------------------------------" >>$LOGOFCLUSTER    echo "                                                               " >>$LOGOFCLUSTER    su - grid -c "crsctl check cluster -all"                               >>$LOGOFCLUSTER    echo "                                                               " >>$LOGOFCLUSTER    echo "---------------------------------------------------------------" >>$LOGOFCLUSTER    echo "- 查看集群监听状态:"                                             >>$LOGOFCLUSTER    echo "---------------------------------------------------------------" >>$LOGOFCLUSTER    echo "                                                               " >>$LOGOFCLUSTER    echo "#1 本地监听                                                     ">>$LOGOFCLUSTER    su - grid -c "lsnrctl status"                                          >>$LOGOFCLUSTER    echo "#2 SCAN监听                                                    " >>$LOGOFCLUSTER    su - grid -c "lsnrctl status LISTENER_SCAN1"                           >>$LOGOFCLUSTER    echo "                                                               " >>$LOGOFCLUSTER    echo "---------------------------------------------------------------" >>$LOGOFCLUSTER    echo "- 查看磁盘组信息及状态                                          ">>$LOGOFCLUSTER    echo "---------------------------------------------------------------" >>$LOGOFCLUSTER    echo "- #1                                                            ">>$LOGOFCLUSTER    su - grid -c "asmcmd lsdg"                                             >>$LOGOFCLUSTER    echo "---------------------------------------------------------------" >>$LOGOFCLUSTER    echo "- 查看OCR信息                                                   ">>$LOGOFCLUSTER    echo "---------------------------------------------------------------" >>$LOGOFCLUSTER    echo "- #1 心跳盘                                                     ">>$LOGOFCLUSTER    su - grid -c "crsctl query css votedisk"                               >>$LOGOFCLUSTER    echo "- #2 一致性检查                                                " >>$LOGOFCLUSTER    su - grid -c "ocrcheck"                                                >>$LOGOFCLUSTER    echo "                                                               " >>$LOGOFCLUSTER}#----------------------------------------------------------------------------------------------------|#      4. DB INFO.                                                                                   |#----------------------------------------------------------------------------------------------------|function _dbinfo(){cd $PWDDIRECTORYsu - oracle -c "cd $LOGDIR/db;export ORACLE_SID=$1;sqlplus -s /nolog <$1) min=$1; if(max<$1) max=$1;} END {printf("%d", min);}') +1))        if [ $ALERTOFDBNUM -gt 0 ]; then            tail -$ALERTOFDBNUM $ALERTOFDB >>$LOGDIR/db/alert_$1_`date '+%y%m%d_%H:%M:%S'`.log            if [ $? -ne 0 ]; then                echo "   JOB-5: DB LOG ERROR!"                exit 1            fi        fi    fi}#----------------------------------------------------------------------------------------------------|#      6. RMAN INFO.                                                                                 |#----------------------------------------------------------------------------------------------------|function _rmaninfo(){cd $PWDDIRECTORYLOGOFRMAN=$LOGDIR/rman_$1-`date '+%y%m%d_%H:%M:%S'`.logsu - oracle -c "export ORACLE_SID=$1;rman target / <$LOGOFRMANshow all;list backup of database summary;exit;EOF"    if [ $? -ne 0 ]; then        echo "JOB-6: RMAN INFO ERROR!"        exit 1    fi}#----------------------------------------------------------------------------------------------------|#      7. AWR.                                                                                       |#----------------------------------------------------------------------------------------------------|function _awrinfo(){cd $PWDDIRECTORYsu - oracle -c "cd $LOGDIR/db;mkdir ./$1AWR;cd $1AWR;export ORACLE_SID=$1;sqlplus -silent /nolog < $LOGDIR/html.tmp cat $LOGDIR/html.tmp | while read FNAMEdo    sed -i 's/GBK/UTF8/g' $FNAMEdonerm -rf $LOGDIR/html.tmprm -f *$HOST.tar.gz     if [ -d $DATESTRING$HOST ]; then        tar -czvf $DATESTRING-$HOST.tar.gz ./$DATESTRING$HOST        rm -rf ./$DATESTRING$HOST    else        echo "   JOB-8: TAR ERROR."    fi}#----------------------------------------------------------------------------------------------------|#      MAIN().                                                                                       |#----------------------------------------------------------------------------------------------------|echo "========================================================="echo "= JOB-NAME: $0"echo "========================================================="_userVerify_initializationecho "----------------------------------------------------"echo "- JOB-1: CLUSTER LOG."echo "----------------------------------------------------"_clusterlogecho "----------------------------------------------------"echo "- JOB-2: OS INFO."echo "----------------------------------------------------"_osinfoecho "----------------------------------------------------"echo "- JOB-3: CLUSTER INFO."echo "----------------------------------------------------"_clusterinfoecho "----------------------------------------------------"echo "- JOB-(4-7): DB JOBs."echo "----------------------------------------------------"ps -ef | grep ora_pmon | grep -v grep | awk '{print $8}' | awk -F "_" '{print $3}' >$DBFILEif [ -e $DBFILE ]; then    cat $DBFILE | while read DBSID       do            echo "- # JOB-4: DB INFO."            _dbinfo $DBSID            echo "- # JOB-5: DB LOG."            _dblog $DBSID       doneelse    echo "   JOB-4-7: NO DBSID FILES ERROR."firm -rf $DBFILEecho "----------------------------------------------------"echo "- JOB-8: LOG TAR."echo "----------------------------------------------------"_logtar

你可能感兴趣的:(oracle,dba,自动化,数据库)