通过操作系统命令toptopas glance等查看top进程号,确认是系统进程还是oracle应用进程,查询当前top进程执行的操作和sql语句进行分析。
根据进程号获取正在执行的sql
SELECT a.osuser, a.username,b.address,b.hash_value, b.sql_text fromv$session a, v$sqltext b, v$process p
where p.spid = &spid
and p.addr = a.paddr
and a.STATUS = 'ACTIVE'
and a.sql_address =b.address
order by address, piece;
数据库无法连接,一般可能是如下原因造成:
(1)数据库宕了
(2)监听异常
(3)数据库挂起
(4)归档目录满
(5)数据库或应用主机的网卡出现问题不能正常工作
(6)应用主机到数据库主机的网络出现问题。
1、数据库宕了
立即启动数据库。
2、监听异常
此时一般体现为:
监听进程占用CPU资源大;
监听日志异常。
此时,立即重启监听,监听重启一般能在1分钟之内完成。
3、数据库挂起
立即重启数据库。
4、归档目录满
(1)在没有部署OGG数据同步的情况下,立即清理归档日志文件。
(2)如果部署了OGG数据同步,查看OGG正在读取的归档日志文件,立即
清理OGG不再需要的日志文件。
5、数据库或应用主机的网卡出现问题不能正常工作。
立即联系主机工程师处理。
6、应用主机到数据库主机的网络出现问题。
立即联系网络维护人员查看。
1、进入/tmp目录下,看是否产生了crsctl.xxxxx文件
如果有的话,看文件内容,一般会提示OCR无法访问,或者心跳IP无法
正常绑定等信息。
2、如果/tmp目录下没有crsctl.xxxxx文件
此时查看ocssd.log文件,看是否能从中得到有价值的信息。
可能的问题:网络心跳不通。
3、/tmp目录无crsctl.xxxxx且日志中没有报错信息,只有停CRS时的日志信
息。
此时可能是RAC两个节点对并发裸设备的访问有问题,此时考虑:
(1)停掉两个节点的CRS。
(2)两个节点先同时去激活并发VG,然后再激活VG。
(3)重新启动CRS。
分析$GRID_HOME/log/nodename目录下的日志文件,看是否能从中找出无法启动的原因。
常见问题:
1、心跳IP不同。
2、ASM实例无法启动。
对CRS的故障诊断和分析,参加本文档中RAC部分的MOS文档.
应急处理步骤:
(1)找到占用CPU资源大的sql或者模块,然后停掉此应用模块。
(2)如果属于由于种种原因引起的数据库hang住情况,立即重启数据库,此时重启需要约15分钟时间。
重要说明:
如果重启数据库的话,会有如下负面影响:
(1)要kill掉所有连接到数据库中的会话,所有会话都会回滚。
(2)立即重启的话,不能获取并保留分析数据库挂起原因的信息,在后续分析问题时,没有足够信息用于分析问题产生的根本原因。
一般正常重启的话,都需要手动获取用于分析数据库重启原因的信息,以便编写分析报告,但是在最长情况下,获取日志信息可能就要40分钟时间。此时一般做systemstate dump,且如果是rac情况的话,需要2个节点都做,且需要做2次或以上。
常规处理步骤,分如下几种情况处理:
(1)所有业务模块都慢。
(2)部分业务模块慢。
(3)数据库hang住。
此时首先查看系统资源,看是否属于CPU资源使用率100%的问题,如果是,参考本章“CPU使用率高的问题”解决办法。如果系统资源正常,那很可能是数据库hang住了,此时参考数据库Hang部分。
分析运行慢的模块的sql语句:
(1)看是否是新上的sql。
(2)看执行计划是否高效。
(3)优化运行慢的模块的sql语句。
应急处理方式:重启数据库。
常规处理方式:
(1)分析alert日志,看是否能从alert日志中,可以很快找到引起问题的原
因。
(2)做3级别的hanganalyze,先做一次,然后隔一分钟以后再做一次。
并分析hanganalyze生成的trace文件,看是否可以找到引起数据库hang
住的会话的信息。
(3)做systemstate dump
此时生成systemstatedump的时间会比较长,尤其是在会话数量较多的情
况下。且生成dump文件的大小较大,在G级别以上。在生成一次以
后,过一分钟再收集一次,另外如果是RAC,那么两个节点都需要收
集。
对hang做dump请参考“对数据库HANG做DUMP一章”。
此问题,没有应急办法,只能按如下步骤处理:
1、对于10g及以上版本,看是否可以通过闪回进行恢复。
2、查看测试环境数据库,看其中是否有需要的数据。
3、使用备份进行恢复,此方法一般花费时间较长。
1. 停止监听
2. 做一个检查点操作
SQL> alter system checkpoint;
3. 杀掉所有LOCAL=NO的操作系统进程
AIX、HP-UX、Linux、Solaris:
$ ps -ef|grep $ORACLE_SID| grep LOCAL=NO | grep -v grep |awk '{print$2}'|xargs -i kill -9 {}
Windows:
SQL> select 'orakill ' ||
(select value from v$parameter where name = 'instance_name') || ' '||p.spid
from v$process p, v$bgprocess bp
where p.ADDR = bp.PADDR(+)
and bp.PADDR is null
and p.SPID is not null;
在命令行执行:
C:\> orakill db1 7642
C:\> orakill db1 7644
4. 停止数据库
SQL> shutdown immediate
-- 9i需要设置_sum_debug_mode
SQL> alter session set "_smu_debug_mode" = 4;
alter session set nls_date_format='YYYY-MM-DD HH24:MI:SS';
column local_trna_id format a20
column global_tran_id format a25
SELECT LOCAL_TRAN_ID, GLOBAL_TRAN_ID, FAIL_TIME,STATE, MIXED
FROM DBA_2PC_PENDING;
LOCAL_TRAN_ID GLOBAL_TRAN_ID FAIL_TIME STATE MIX
-------------- ------------------------- -------------------- ---------------- ---
12.29.103137 TAXIS.9572b613.12.29.103137 30-aug-2011 10:09:11 collecting no
SQL> commit force '12.29.103137';
Commit complete.
SQL> EXECUTE DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('12.29.103137');
PL/SQL procedure successfully completed.
SQL> commit; --清理每个分布式事务都需要commit;
1. 相关参数
PARALLEL参数考虑
可以设置成物理CPU(不是逻辑CPU)数的两倍数目,然后调整
对于DataPump Export,PARALLEL参数必须要小于等于dump files数
对于DataPump Import,PARALLEL不要比dump文件数大很多,可以大一些。这个参数也指定了导入时创建索引的并行度。
PARALLEL只允许在企业版使用。
nohup expdp system/manager schemas=kdjm DIRECTORY=DUMP_FILES PARALLEL=3dumpfile=expCASES_%U.dmp logfile=nnsiexp2008_12_28.log &
通配符 %U,它指示文件将按需要创建,格式将为expCASES_nn.dmp,其中nn 从 01 开始,然后按需要向上增加
-- 监控长事务
set linesize 120
column opname heading 'Operation' format a25
column target heading 'Target' format a15
column pct heading 'Percent' format 999
column es heading 'Elapsed|Seconds' format 999999
column tr heading 'Time|Remaining|Seconds' format 99999
column program format a30
column machine format a16
select L.sid ssid,
substr(opname,1,25) opname,
target,
trunc((sofar/totalwork)*100)pct,
to_char(60*sofar*8192/(24*60*(last_update_time-start_time))/1024/1024/60,
'9999.0') Rate,
round(elapsed_seconds/60, 2)es,
round(time_remaining/60, 2)tr,
program,
machine
from v$session_longops L, v$sessions
where time_remaining > 0 and l.sid = s.sid
order by start_time;
在遇到坏块的时,一般应按以下的流程来处理:
1 如果坏块的对象是索引,重建索引
2 使用备份来进行恢复
3 使用10231事件,或者DBMS_REPAIR.SKIP_CORRUPT_BLOCKS过程,让oracle跳过坏块,然后用exp导出表和使用CREATE TABLE AS创建新表。
4 尝试使用SQL脚本将完好的数据复制到一个新表中,或者用EXP配合QUERY参数导出完好的数据。
5 手工修改坏块。
有两种情况是不能使用事件10231和DBMS_REPAIR.SKIP_CORRUPT_BLOCKS来跳过坏块的:
1 硬件问题造成OS层不能读取数据。
2 表中的非数据块,或者说是元数据块。比如段头,ExtentMap块。这种坏块是不能跳过的。
3 在表中存在有其他异常的块,从单个块来看都没有损坏,checksum值也是正确的,但是有的块在段内却是有问题的。比如在段的高水位下存在未格式化的块,查询这样的表时,会报ORA-8103错误;如果块的object id与段在数据字典里的data object id不相符,则会报ORA-1401错误。
Oracle数据文件的坏块,可分为物理坏块和逻辑坏块。物理坏块(也称为介质坏块),指的是块格式本身是坏的,块内的数据没有任何意义。而逻辑坏块,指的是块内的数据在逻辑上存在问题。比如说索引块的索引值没有按从小到大排列。物理坏块一般是由于内存问题、OS问题、IO子系统问题和硬件问题,逻辑坏块是由于ORACLEBUG等原因引起。
对数据库中的坏块进行验证。
RMAN> backup validate database;
恢复一个数据文件上的多个坏块
RMAN> blockrecover datafile 14 block 56,107,276,517;
检验后我们查V$DATABASE_BLOCK_CORRUPTION
SQL> select * from v$database_block_corruption;
FILE# BLOCK# BLOCKS CORRUPTION_CHANGE# CORRUPTIO
---------- ---------- ---------- ------------------ ---------
14 276 1 0 CHECKSUM
14 517 1 0 CHECKSUM
14 107 1 0 CHECKSUM
14 56 1 0 CHECKSUM
还可以通过blockrecovercorruption list进行块的恢复,这是在大量块损坏时或全部块损坏时使用,
前提是先执行backupvalidate database,在V$DATABASE_BLOCK_CORRUPTION里有对应的坏块的列表。
RMAN> blockrecover corruption list;
declare
v_hhint sys.sqlprof_attr;
cl_sql_text clob;
begin
select SQL_TEXT
into cl_sql_text
from v$sql
where sql_id = '1304vvhkfctzq'
and rownum < 2;
v_hhint :=sys.sqlprof_attr('BEGIN_OUTLINE_DATA',
'IGNORE_OPTIM_EMBEDDED_HINTS',
'OPTIMIZER_FEATURES_ENABLE(''10.2.0.4'')',
'ALL_ROWS',
'OUTLINE_LEAF(@"SEL$1")',
'INDEX_RS_ASC(@"SEL$1" "T2"@"SEL$1"("T2"."STATUS"))',
'END_OUTLINE_DATA');
dbms_sqltune.import_sql_profile(cl_sql_text,
v_hhint,
'SQLPROFILE_1304vvhkfctzq',
force_match => true,
replace => true);
end;
/
SELECT name, created,category,sql_Text from dba_sql_profiles ORDER BY created DESC;
BEGIN
DBMS_SQLTUNE.DROP_SQL_PROFILE(name => 'PROFILE_NAME');
END;
/
获取OUTLINEHINT的信息
SELECT sql_attr.attr_val outline_hints
FROM dba_sql_profiles sql_profiles, sys.SQLPROF$ATTR sql_attr
WHERE sql_profiles.signature = sql_attr.signature
AND sql_profiles.name = 'SQLPROFILE_1304vvhkfctzq'
ORDER BY sql_attr.attr# ASC;
In order to execute Sql Tuning Adviser API, one must begranted with “ADVISER” role.
grant adviser to
– for a specific statement from AWR
SET SERVEROUTPUT ON
-- Tuning task created for specific a statement from the AWR.
DECLARE
l_sql_tune_task_id VARCHAR2(100);
BEGIN
l_sql_tune_task_id := DBMS_SQLTUNE.create_tuning_task (
begin_snap => 764,
end_snap => 938,
sql_id => '19v5guvsgcd1v',
scope => DBMS_SQLTUNE.scope_comprehensive,
time_limit => 60,
task_name => '19v5guvsgcd1v_AWR_tuning_task',
description=> 'Tuning task for statement 19v5guvsgcd1v in AWR.');
DBMS_OUTPUT.put_line('l_sql_tune_task_id: ' || l_sql_tune_task_id);
END;
– or for a specific statement from Shared LibraryCache
DECLARE
l_sql_tune_task_id VARCHAR2(100);
BEGIN
l_sql_tune_task_id := DBMS_SQLTUNE.create_tuning_task (
sql_id => 'fhahkc71k304u',
scope => DBMS_SQLTUNE.scope_comprehensive,
time_limit => 60,
task_name => 'fhahkc71k304u_tuning_task',
description=> 'Tuning task for statement fhahkc71k304u.');
DBMS_OUTPUT.put_line('l_sql_tune_task_id:' || l_sql_tune_task_id);
END;
/
-- Interrupt and resume a tuning task.
EXEC DBMS_SQLTUNE.interrupt_tuning_task (task_name =>'emp_dept_tuning_task');
EXEC DBMS_SQLTUNE.resume_tuning_task (task_name =>'emp_dept_tuning_task');
-- Cancel a tuning task.
EXEC DBMS_SQLTUNE.cancel_tuning_task (task_name =>'emp_dept_tuning_task');
-- Reset a tuning task allowing it to bere-executed.
EXEC DBMS_SQLTUNE.reset_tuning_task (task_name => 'emp_dept_tuning_task');
execute the tuning task
EXEC DBMS_SQLTUNE.execute_tuning_task(task_name =>'fhahkc71k304u_AWR_tuning_task');
report tuning task findings
SET LONG 100000;
SET PAGESIZE 1000
SET LINESIZE 300
SELECT DBMS_SQLTUNE.report_tuning_task('fhahkc71k304u_AWR_tuning_task') ASrecommendations FROM dual;
SET PAGESIZE 24
accept recommendations
execute dbms_sqltune.accept_sql_profile(task_name=>'fhahkc71k304u_AWR_tuning_task', replace => TRUE);
useful views
DBA_ADVISOR_TASKS
DBA_ADVISOR_FINDINGS
DBA_ADVISOR_RECOMMENDATIONS
DBA_ADVISOR_RATIONALE
DBA_SQLTUNE_STATISTICS
DBA_SQLTUNE_BINDS
DBA_SQLTUNE_PLANS
DBA_SQLSET
DBA_SQLSET_BINDS
DBA_SQLSET_STATEMENTS
DBA_SQLSET_REFERENCES
DBA_SQL_PROFILES
V$SQL
V$SQLAREA
V$ACTIVE_SESSION_HISTORY
exec dbms_stats.gather_table_stats(
ownname => 'SYS',
tabname => 'T2',
estimate_percent => 100,
cascade => true,
method_opt => 'for all columns size 1' );
select * from table(dbms_xplan.display);
select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));
select * from table(dbms_xplan.display_awr('a7tgurqg403wp', null, null,'ALL'));
显示执行计划的outline
select * from table(dbms_xplan.display_cursor('sql_id', null, 'outline'));
select *
from
table(
dbms_xplan.display(
null,
null,
'typicalalias -rows -bytes -cost'
)
)
;
declare
srec dbms_stats.statrec;
m_distcnt number;
m_density number;
m_nullcnt number;
m_avgclen number;
n_array dbms_stats.numarray;
begin
dbms_stats.get_column_stats(
ownname => user,
tabname => 't1',
colname => 'n1',
distcnt => m_distcnt,
density => m_density,
nullcnt => m_nullcnt,
srec => srec,
avgclen => m_avgclen
);
srec.bkvals := null;
srec.novals := dbms_stats.numarray(
utl_raw.cast_to_number(srec.minval),
utl_raw.cast_to_number(srec.maxval)
);
srec.epc := 2;
dbms_stats.prepare_column_values(srec, srec.novals);
m_density := 1/m_distcnt;
dbms_stats.set_column_stats(
ownname => user,
tabname => 't1',
colname => 'n1',
distcnt => m_distcnt,
density => m_density,
nullcnt => m_nullcnt,
srec => srec,
avgclen => m_avgclen
);
exception
when others then
raise; -- should handle div/0
end;
/
一般使用altertable table_name move tablespace,然后rebuild索引,因为move会导致rowid改变,从而原来的索引失效。
查看哪些表在空间上存在问题,可以查看dba_tables视图的avg_space列,这列显示的是数据库块的平均空闲空间大小,我们在建表的时候可以通过制定PCTFREE参数来设置每个数据块中保留空间的百分比,这部分空间用于因更新块内的行而导致的增长,默认值是10%,也就是819 bytes左右(块大小为8KB)。如果avg_space大于2KB,可能重整空间会有意义,这个值太大说明空间利用率较低,块空闲较多。我们调整空间的主要目的是降低高水位线(HIGHWATER MARK),使扫描的表块数变小,从而提高效率。
alter table table_name enable row movement;
alter table table_name shrink space cascade;
alter table table_name move disable row movement;
lsnrctl set log_status off
lsnrctl set log_status on
$ cat /etc/oraInst.loc
inventory_loc=/oracle/oraInventory
inst_group=oinstall
$ cd $ORACLE_HOME/oui/bin
$ ./runInstaller -silent -attachHome ORACLE_HOME=$ORACLE_HOMEORACLE_HOME_NAME="OraDb11g_home1"
或者(/oracle/oraInventory目录可以mv掉,新建个目录)
export ORACLE_HOME=/oracle/product/crs/10.2.0
cd $ORACLE_HOME/oui/bin
./attachHome.sh
删除基于DBLINK的物化视图的时候要确认DBLINK的存在,否则会造成物化视图注册信息无法清除。
删除物化视图并不意味着要删除物化视图日志。因为物化视图日志可以同时支持多个物化视图的刷新,如果物化视图日志被删除,那么所有基于这个物化视图日志的物化视图无法再进行快速刷新。
在ONCOMMIT刷新模式下, 如果基表的DML很频繁, 会造成刷新很频繁, 这可能会造成DROP物化视图的语句一直挂起。
1.使数据文件离线
alter database datafile 'old_path' offline
2.移动数据文件
RMAN> copy datafile 'old_path' to 'new_path';
3 重命名数据文件
SQL>alter tablespace tbs_name rename datafile ' old_path ' to 'new_path';
4 恢复数据文件
RMAN> recover datafile 'new_path';
5 使数据文件在线
alter database datafile 'new_path' online;
修改(需重启监听)
$ORACLE_HOME/network/admin/sqlnet.ora :
tcp.validnode_checking=yes
tcp.invited_nodes=(localhost, 本机ip, 应用服务器ip,管理机ip等)
重启监听:lsnrctlstop;lsnrctl start。
When do SQLNET.ORA changes take effect ? (Doc ID 562589.1)
How to Centralize TNSNAMES.ORA, LISTENER.ORA And SQLNET.ORA files (Doc ID362761.1)
Configure Multiple DB Instances To Share SQLNET.ORA Without GivingWALLET_LOCATION For EUS (Doc ID 405682.1)
Parameter OUTBOUND_CONNECT_TIMEOUT in SQLNET.ORA (Doc ID 519391.1)
主要针对ORA-00600 2662
MOS 30681.1文章如下
Doc ID: Note:30681.1
Subject: EVENT: ADJUST_SCN - QuickReference
Type: REFERENCE
Status: PUBLISHED
Content Type: TEXT/PLAIN
Creation Date: 20-OCT-1997
Last Revision Date: 04-AUG-2000
Language: USAENG
ADJUST_SCN Event
~~~~~~~~~~~~~~~~
*** WARNING ***
This event should only ever beused under the guidance
of an experienced Oracle analyst.
If an SCN is ahead of the currentdatabase SCN, this indicates
some form of database corruption.The database should be rebuilt
after bumping the SCN.
****************
The ADJUST_SCN event is useful insome recovery situations where the
current SCN needs to beincremented by a large value to ensure it
is ahead of the highest SCN inthe database. This is typically
required if either:
a. An ORA-600 [2662] error issignalled against database blocks
or
b. ORA-1555 errors keepoccuring after forcing the database open
or ORA-604 / ORA-1555 errorsoccur during database open.
(Note: If startup reportsORA-704 & ORA-1555 errors together
then the ADJUST_SCNevent cannot be used to bump the
SCN as the error isoccuring during bootstrap.
Repeatedstartup/shutdown attempts may help if the SCN
mismatch is small)
or
c. If a database has beenforced open used _ALLOW_RESETLOGS_CORRUPTION
(See
The ADJUST_SCN event acts asdescribed below.
**NOTE: You can check that theADJUST_SCN event has fired as it
should write a message to the alert log inthe form
"Debugging event used to advance scn to%s".
If this message is NOT present in the alertlog the event
has probably not fired.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If the database will NOT open:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Take a backup.
You can use event 10015 to trigger anADJUST_SCN on database open:
startup mount;
alter session set events '10015trace name adjust_scn level 1';
(NB: You can only useIMMEDIATE here on an OPEN database. If the
database is only mounted use the 10015trigger to adjust SCN,
otherwise you get ORA 600 [2251], [65535],[4294967295] )
alter database open;
If you get an ORA 600:2256shutdown, use a higher level and reopen.
Do *NOT* set this event ininit.ora or the instance will crash as soon
as SMON or PMON try to do anyclean up. Always use it with the
"alter session"command.
~~~~~~~~~~~~~~~~~~~~~~~~~~
If the database *IS* OPEN:
~~~~~~~~~~~~~~~~~~~~~~~~~~
You can increase the SCN thus:
alter session set events'IMMEDIATE trace name ADJUST_SCN level 1';
LEVEL: Level 1 is usually sufficient - it raises theSCN to 1 billion
(1024*1024*1024)
Level 2 raises it to 2 billion etc...
If you try to raise the SCN to a level LESSTHAN or EQUAL to its
current setting you will get
Ie: The event steps the SCN to knownlevels. You cannot use
the same level twice.
Calculating a Level from 600errors:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
To get a LEVEL for ADJUST_SCN:
a) Determine the TARGET scn:
ora-600 [2662] See
ora-600 [2256] See
b)Multiply the TARGET wrap number by 4. This will give you the level
to use in the adjust_scn to get the correctwrap number.
c) Next, add the followingvalue to the level to get the desired base
value as well :
Add to Level Base
~~~~~~~~~~~~ ~~~~~~~~~~~~
0 0
1 1073741824
2 2147483648
3 3221225472
在Oracle 10g之前,一般采用使用这个event的方式来增大SCN.
ORA-00600: internal error code, arguments: [2662], [0], [547743994], [0],[898092653], [8388617], [], []
这个报错参数的含义在metalink中如此描述的:
Arg [a] Current SCN WRAP
Arg [b] Current SCN BASE
Arg [c] dependent SCN WRAP
为了存储更大的SCN值,当SCN BASE到足够大并开始重置的时候,SCN WRAP将加1。
Arg [d] dependent SCN BASE
Arg [e] Where present this is the DBA where the dependent SCN came from.
也就是Arg [d]的值是从哪个block中找到的,通常是一个data block address。
通过这几个参数根据一定的规则可以计算出我们需要的level。计算规则如下:
1. Arg [c] *4得出一个数值,假设为V_Wrap
2. 如果Arg[d]=0,则V_Wrap值为需要的level
Arg [d] < 1073741824,V_Wrap+1为需要的level
Arg [d] < 2147483648,V_Wrap+2为需要的level
Arg [d] < 3221225472,V_Wrap+3为需要的level
计算方法的例子
ORA-00600: internal error code, arguments: [2662], [0], [2179133], [8656],[70114056], [33855201], [], []
Arg [C] *4 = 8656 * 4 = 34624
Arg [d] = 70114056 < 1073741824
所以level =34624 + 1 = 34625
因此在这个例子中我们应该执行
alter session set events 'IMMEDIATE trace name ADJUST_SCN level 34625';
但是随着oraclescn的patch更新后,11g后续的版本有些已经不能使用这种方式推进scn了.
11.2.0.4,12.0.1.0默认屏蔽;
11.2.0.2.5屏蔽
这个参数可以将SCN提高1G的倍数,即1024*1024*1024.
ALTER system SET "_minimum_giga_scn" = 1 scope = spfile;
或者在pfile里加入
但是随着oraclescn的patch更新后,11g后续的版本有些已经不能使用这种方式推进scn了.
11.2.0.4,12.0.1.0默认屏蔽;
dump函数联合ue修改
http://blog.mchz.com.cn/?p=9009
http://www.askmaclean.com/archives/advanced-diagnostic-using-oradebug-dumpvar.html
http://www.dbform.com/html/2006/208.html
Master Note: Overview for SCN issues (Doc ID 1503937.1)
http://www.killdb.com/2013/01/21/about-controlfile-structure.html
CREATE TABLE year_interval_partition_table
(id number,time_col date) partition by range(time_col)
INTERVAL (NUMTOYMINTERVAL(1,'year')
(PARTITION part1 VALUES LESS THAN (TO_DATE ('2013-06-01', 'YYYY-MM-DD')),
PARTITION part2 VALUES LESS THAN(TO_DATE ('2014-06-01', 'YYYY-MM-DD'))
)
create table month_interval_partition_table (id number,time_col date)partition by range(time_col)
interval (numtoyminterval(1,'month'))
(
partition p_month_1 values less than(to_date('2014-01-01','yyyy-mm-dd'))
);
begin
for i in 0..11 loop
insert intoMONTH_INTERVAL_PARTITION_TABLEvalues(i,add_months(to_date('2014-01-01','yyyy-mm-dd'),i));
end loop;
commit;
end;
/
create table day_interval_partition_table (id number,time_col date)partition by range(time_col)
interval (numtodsinterval(1,'day'))
(
partition p_day_1 values less than(to_date('2014-01-01','yyyy-mm-dd'))
);
begin
for i in 1..12 loop
insert intoDAY_INTERVAL_PARTITION_TABLE values(i,trunc(to_date('2014-01-01','yyyy-mm-dd')+i));
end loop;
commit;
end;
/
删除前一天的分区
declare
min_par varchar2(30);
max_par varchar2(30);
str_sql varchar2(500);
begin
select min(partition_name) into min_par from user_tab_partitions wheretable_name='T_PARTITION' and partition_name!='P_DAY_1';
select max(partition_name) into max_par from user_tab_partitions wheretable_name='T_PARTITION' and partition_name!='P_DAY_1';
if min_par <> max_par then
str_sql:='alter table T_partition drop partition '||min_par;
execute immediate str_sql;
end if;
end;
/
select table_name,partition_name from user_tab_partitions wheretable_name=’TABLE_NAME’;
创建
BEGIN
DBMS_SCHEDULER.CREATE_JOB (
job_name => 'INSERT_STAT_TBL',
job_type => 'STORED_PROCEDURE',
job_action => 'INSERT_TABLESPACE_STAT',
start_date => sysdate,
repeat_interval => 'FREQ=DAILY;INTERVAL=1');
END;
/
执行
exec dbms_scheduler.run_job('INSERT_STAT_TBL');
在有些情况下需要去trace某些用户的行为,这时候就需要一个登陆触发器
例如下面的脚本
CREATE OR REPLACE TRIGGER SYS.set_trace
AFTER LOGON ON DATABASE
WHEN (USER like '&USERNAME')
DECLARE
lcommand varchar(200);
BEGIN
EXECUTE IMMEDIATE 'alter sessionset statistics_level=ALL';
EXECUTE IMMEDIATE 'alter sessionset max_dump_file_size=UNLIMITED';
EXECUTE IMMEDIATE 'alter sessionset events ''10046 trace name context forever, level 12''';
END set_trace;
/
需要授权给用户
grant alter session to
create tablespace perfstat datafile 'datafile_path' size 500M extentmanagement local;
执行创建脚本,创建过程中会提示输入default_tablespace和temporary_tablespace的内容,若有错误可以查看相应生成的.lis文件.
SQL> @spcreate
若需要重建,可以运行spdrop.sql脚本来删除这些对象.然后重新运行spcreate.sql
SQL> @spdrop.sql
SQL> @spcreate
level 0: 一般性能统计.包括等待事件、系统事件、系统统计、回滚段统计、行缓存、SGA、会话、锁、缓冲池统计等等.
level 5: 增加SQL语句.除了包括level0的所有内容,还包括SQL语句的收集,收集结果记录在stats$sql_summary中,快照的默认级别.
level 6:包括SQL的执行计划,关注SQL的性能
level 7: 增加了段级统计信息(Segmentlevel Statistics),
level 10: 增加子锁存统计.包括level5的所有内容.并且还会将附加的子锁存存入stats$lathc_children中.在使用这个级别时需要慎重,建议在Oracle support的指导下进行.
execute statspack.snap(i_snap_level=>0,i_modify_parameter=>’true’);
如果你只是想本次改变收集级别,可以忽略i_modify_parameter参数.
execute statspack.snap(i_snap_level=>0);
默认级别
exec statspack.snap
exec statspack.snap(i_snap_level => 6, i_modify_parameter => 'true')
如果你只是想本次改变收集级别,可以忽略i_modify_parameter参数.
exec statspack.snap(i_snap_level => 6)
带注释的snapshot
execute statspack.snap(i_snap_level=>7, I_UCOMMENT=>'performanceissue');
exec statspack.modify_statspack_parameter(i_snap_level => 6);
SQL> @spauto
脚本中有关时间间隔的设置内容,可以修改spauto.sql其内容来更改执行间隔,默认间隔为一小时.
dbms_job.submit(:jobno, 'statspack.snap;',trunc(sysdate+1/24,'HH'),'trunc(SYSDATE+1/24,''HH'')', TRUE, :instno);
这个job任务定义了收集数据的时间间隔:
一天有24个小时,1440分钟,那么:
1/24 HH每小时一次
1/48 MI每半小时一次
1/144 MI每十分钟一次
1/288 MI每五分钟一次
SQL> @spreport.sql
第一种方法:删除stats$snapshot数据表中的相应数据,其他表中的数据会相应的级连删除:
SQL> select max(snap_id) from stats$snapshot;
SQL> delete from stats$snapshot where snap_id <= &snap_id;
第二种方法:使用自带的脚本sptrunc.sql
SQL> @sptrunc
使用spuexp.par文件exp保存相应数据
spuexp.par内容:
file=spuexp.dmp log=spuexp.log compress=y grants=y indexes=y rows=yconstraints=y wner=PERFSTAT consistent=y
导出语句:
exp userid=perfstat/perfstat parfile=spuexp.par
SQL> @sprepsql.sql
快照门限只应用于stats$sql_summary表中获取的SQL语句.
因为每一个快照都会收集很多数据,每一行都代表获取快照时数据库中的一个SQL语句,所以stats$sql_summary很快就会成为Statspack中最大的表.
门限存储在stats$statspack_parameter表中:
executions_th这是SQL语句执行的数量(默认值是100)
disk_reads_tn这是SQL语句执行的磁盘读入数量(默认值是1000)
parse_calls_th这是SQL语句执行的解析调用的数量(默认值是1000)
buffer_gets_th这是SQL语句执行的缓冲区获取的数量(默认值是10000)
任何一个门限值超过以上参数就会产生一条记录.
通过调用statspack.modify_statspack_parameter函数改变门限的默认值:
exec statspack.modify_statspack_parameter(i_buffer_gets_th=>100000,i_disk_reads_th=>100000);
select * from dba_hist_wr_control;
如将收集间隔时间改为30 分钟一次。并且保留5天时间(注:单位都是为分钟):
exec dbms_workload_repository.modify_snapshot_settings(interval=>30,retention=>5*24*60);
把interval设为0则关闭自动捕捉快照
exec DBMS_WORKLOAD_REPOSITORY.CREATE_SNAPSHOT();
select * from sys.wrh$_active_session_history;
exec WORKLOAD_REPOSITORY.DROP_SNAPSHOT_RANGE(low_snap_id => 22,high_snap_id => 32, dbid => 3310949047);
在10.2.0.3的环境下执行这个命令并不会删除快照,这是个bug,bug信息为MOS 8622802。
如果打算删除指定数据库的所有快照,执行下面的语句
select dbid, status, count(*) from wrm$_snapshot group by dbid, status;
exec dbms_swrf_internal.unregister_database(dbid);
select dbid,baseline_name,start_snap_id,end_snap_id from dba_hist_baseline;
exec dbms_workload_repository.create_baseline (56,59,'baseline_name');
exec DBMS_WORKLOAD_REPOSITORY.DROP_BASELINE(baseline_name =>'baseline_name', cascade => FALSE);
$ORACLE_HOME/rdbms/admin/awrextr.sql --导出
拷贝导出的awr数据到目标数据库的指定DIRECTORY
$ORACLE_HOME/rdbms/admin/awrload.sql --导入
导入需要注意几点,输入DIRECTORY的时候注意大小写,另外输入导入的文件名时只输入前缀,不要输入完整的文件名。
导入的awr保存期限超过了100年,所以不会被自动删除
产生整个数据库的AWR报告,运行脚本awrrpt.sql。
@$ORACLE_HOME/rdbms/admin/awrrpt.sql
产生某个实例的AWR报告,运行脚本awrrpti.sql。
@$ORACLE_HOME/rdbms/admin/awrrpti.sql
产生某条SQL语句的AWR报告,运行脚本awrsqrpt.sql。
ORACLE 11GR2中,可以将多个实例生成的AWR汇总到一个文件
@$ORACLE_HOME/rdbms/admin/awrsqrpt.sql
select dbid from v$database
set linesize 200
col BEGIN_INTERVAL_TIME for a40
col END_INTERVAL_TIME for a40
col event for a40
select snap_id,begin_interval_time,end_interval_time,snap_level from dba_hist_snapshot;
select max(snap_id) from dba_hist_snapshot;
col event for a40
select case wait_rank
when 1 then
inst_id
end "Inst Num",
case wait_rank
when 1 then
snap_id
end "Snap Id",
case wait_rank
when 1 then
begin_snap
end "Begin Snap",
case wait_rank
when 1 then
end_snap
end "End Snap",
event_name "Event",
total_waits "Waits",
time_waited"Time(s)",
round((time_waited /total_waits) * 1000) "Avg wait(ms)",
round((time_waited / db_time)* 100, 2) "% DB time",
substr(wait_class, 1, 15)"Wait Class"
from (select inst_id,
snap_id,
to_char(begin_snap,'DD-MM-YY hh24:mi:ss') begin_snap,
to_char(end_snap,'hh24:mi:ss') end_snap,
event_name,
wait_class,
total_waits,
time_waited,
dense_rank()over(partition by inst_id, snap_id order by time_waited desc) - 1 wait_rank,
max(time_waited)over(partition by inst_id, snap_id) db_time
from (selects.instance_number inst_id,
s.snap_id,
s.begin_interval_time begin_snap,
s.end_interval_time end_snap,
event_name,
wait_class,
total_waits -lag(total_waits, 1, total_waits) over(partition by s.startup_time,s.instance_number, stats.event_name order by s.snap_id) total_waits,
time_waited -lag(time_waited, 1, time_waited) over(partition by s.startup_time,s.instance_number, stats.event_name order by s.snap_id) time_waited,
min(s.snap_id)over(partition by s.startup_time, s.instance_number, stats.event_name)min_snap_id
from (select dbid,
instance_number,
snap_id,
event_name,
wait_class,
total_waitstotal_waits,
round(time_waited_micro / 1000000, 2) time_waited
fromdba_hist_system_event
wherewait_class not in ('Idle', 'System I/O')
union all
select dbid,
instance_number,
snap_id,
stat_name event_name,
nullwait_class,
nulltotal_waits,
round(value / 1000000, 2) time_waited
from dba_hist_sys_time_model
wherestat_name in ('DB CPU', 'DB time')) stats,
dba_hist_snapshot s
wherestats.instance_number = s.instance_number
and stats.snap_id= s.snap_id
and stats.dbid =s.dbid
and s.dbid =520444635
ands.instance_number = 1
and stats.snap_idbetween 29930 and 29943)
where snap_id >min_snap_id
and nvl(total_waits, 1) > 0)
where event_name != 'DB time'
and wait_rank <= 5
order by inst_id, snap_id;
select dbid from v$database
col event for a40
set linesize 200
col BEGIN_INTERVAL_TIME for a40
col END_INTERVAL_TIME for a40
col event for a40
select case wait_rank
when 1 then
inst_id
end "Inst Num",
case wait_rank
when 1 then
snap_id
end "Snap Id",
case wait_rank
when 1 then
begin_snap
end "Begin Snap",
case wait_rank
when 1 then
end_snap
end "End Snap",
event_name "Event",
total_waits "Waits",
time_waited"Time(s)",
round((time_waited /total_waits) * 1000) "Avg wait(ms)",
round((time_waited / db_time)* 100, 2) "% DB time",
substr(wait_class, 1, 15)"Wait Class"
from (select inst_id,
snap_id,
to_char(begin_snap,'DD-MM-YY hh24:mi:ss') begin_snap,
to_char(end_snap,'hh24:mi:ss') end_snap,
event_name,
wait_class,
total_waits,
time_waited,
dense_rank() over(partition by inst_id,snap_id order by time_waited desc) - 1 wait_rank,
max(time_waited)over(partition by inst_id, snap_id) db_time
from (selects.instance_number inst_id,
s.snap_id,
s.begin_interval_time begin_snap,
s.end_interval_time end_snap,
event_name,
wait_class,
total_waits -lag(total_waits, 1, total_waits) over(partition by s.startup_time,s.instance_number, stats.event_name order by s.snap_id) total_waits,
time_waited - lag(time_waited, 1,time_waited) over(partition by s.startup_time, s.instance_number,stats.event_name order by s.snap_id) time_waited,
min(s.snap_id)over(partition by s.startup_time, s.instance_number, stats.event_name)min_snap_id
from (select dbid,
instance_number,
snap_id,
event_name,
wait_class,
total_waits_fg total_waits,
round(time_waited_micro_fg / 1000000, 2) time_waited
fromdba_hist_system_event
wherewait_class not in ('Idle', 'System I/O')
union all
select dbid,
instance_number,
snap_id,
stat_name event_name,
nullwait_class,
nulltotal_waits,
round(value / 1000000, 2) time_waited
fromdba_hist_sys_time_model
wherestat_name in ('DB CPU', 'DB time')) stats,
dba_hist_snapshot s
wherestats.instance_number = s.instance_number
and stats.snap_id= s.snap_id
and stats.dbid =s.dbid
and s.dbid =546108817
and s.instance_number = 1
and stats.snap_idbetween 66 and 69)
where snap_id >min_snap_id
and nvl(total_waits, 1)> 0)
where event_name != 'DB time'
and wait_rank <= 5
order by inst_id, snap_id;
在某些情况下,比如解决oraclescn问题,解决某些bug,需要为数据库打补丁,其中既包括DB的补丁,也包括CRS、GI的补丁。在打补丁时需要注意如下事项:
(1)确认补丁打上以后解决问题,也就是说确保“对症下药”。
(2)仔细查看补丁的readme,并严格按照readme中的步骤进行操作。
(3)使用最新版本的opatch。
(4)对于RAC,在打补丁时,尽量分节点打,也就是说,尽量使用:
opatch apply -local 方式打补丁。
(5)对于10.2.0.3及以上版本的数据库,如果打的补丁是PSU,则需要为监听配置COST。
对于数据库的大版本升级,根据是否跨平台、停机时间长短、数据量大小,可以根据实际的情况,采用同步软件、RMAN增量备份恢复、逻辑导出等多种方式进行迁移,但是最好是先进行测试。
对于小版本的升级,首先查看升级说明README,然后先在测试环境下进行测试,完成测试后,对生产环境进行备份,备份既包括数据库数据的备份,还包括软件的备份(CRS、DB),当各项工作完成后,确认各项工作都准备好后,在生产环境下进行升级。
Different Upgrade Methods For Upgrading Your Database (Doc ID 419550.1)
Master Note For Oracle Database Upgrades and Migrations (Doc ID 1152016.1)
Complete Checklist for Manual Upgrades to 10gR2 (Doc ID 316889.1)
Best Practices to Minimize Downtime During Upgrade (Doc ID 455744.1)
Complete Checklist for Manual Upgrades to 11gR2 (Doc ID 837570.1)
Complete Checklist for Manual Upgrades to Oracle Database 12c Release 1(12.1) (Doc ID 1503653.1)
Complete checklist for out-of-place manual upgrade from previous 11.2.0.Nversion to the latest 11.2.0.N patchset. (Doc ID 1276368.1)
Complete Checklist for Upgrading to Oracle Database 12c Release 1 usingDBUA (Doc ID 1516557.1)
Complete Checklist to Upgrade the Database to 11gR2 using DBUA (Doc ID870814.1)
Database Server Upgrade Downgrade Compatibility Matrix (Doc ID 551141.1)
How to estimate the time required to upgrade a database (Doc ID 739485.1)
Oracle Database Upgrade Path Reference List (Doc ID 730365.1)
Upgrade Advisor Database from 9.2 to 11.2 (Doc ID 264.1)
Upgrade Advisor Database from 10.2 to 11.2 (Doc ID 251.1)
Oracle Clusterware的命令集可以分为4种
节点层:osnodes
网络层:oifcfg
集群层:crsctl,ocrcheck,ocrdump,ocrconfig
应用层:srvctl,onsctl,crs_stat
emctl stop dbconsole
srvctl stop instance -d raw -i raw1
srvctl stop instance -d raw -i raw2
srvctl stop asm -n raw1
srvctl stop asm -n raw2
srvctl stop nodeapps -n raw1
srvctl stop nodeapps -n raw2
srvctl start nodeapps -n raw1
srvctl start nodeapps -n raw2
srvctl start asm -n raw1
srvctl start asm -n raw2
srvctl start instance -d raw -i raw2
srvctl start instance -d orcl -i raw1
emctl start dbconsole
-- 10.2、11.1
crsctl stop crs # stops Oracleclusterware
crsctl start crs # startsOracle Clusterware
crsctl enable crs # enablesOracle Clusterware
crsctl disable crs # disablesOracle Clusterware
-- 11.2
crsctl stop cluster –all
crsctl stop cluster -n
1. 命令汇总
-- 查看节点
$CRS_HOME/bin/olsnodes
-- 查看网络
$CRS_HOME/bin/oifcfg getif
-- 检查CRS
$CRS_HOME/bin/crsctl check crs
-- 检查VIP
srvctl config nodeapps -n
-- 11gR2查看CRS状态
crsctl stat res -t -init
crsctl status resource -t
-- 检查CRS版本
crsctl query crs activeversion
-- 启动CRS排它模式
crsctl start crs -excl -nocrs 11.2.0.2
crsctl start crs -excl 11.2.0.1
-- 直接启动CRSD服务
crsctl start res ora.crsd -init
-- 注册资源
crs_unregister ora.node1.ASM1.asm
crs_stat -p ora.node1.ASM1.asm > ora.node1.ASM1.asm.cap
crs_register ora.node1.ASM1.asm
-- 11gR2
crsctl stat res ora.scan1.vip -p > /tmp/1.txt
crsctl add resource ora.n2_scan1.vip -type ora.scan_vip.type -file /tmp/1.txt
-- 查看资源未启用的资源
crsctl stat res -w 'ENABLED != 0'
-- 查看当前OCR
ocrcheck
-- 查看OCR备份
ocrconfig -showbackup
-- 查看Votingdisk
crsctl query css votedisk
-- 备份OCR,路径:Grid_home/cdata/cluster_name,
ocrconfig -export ocr.bak
ocrconfig -manualbackup
-- 转储OCR
ocrdump -backupfile backup_file_name
-- OCR先加后删
ocrconfig -add new_OCR_location
ocrconfig -delete current_OCR_location
-- 移动OCR位置,OCR必须要2份以上
ocrconfig -repair -replace current_OCR_location -replacementtarget_OCR_location
-- 查看OLR
ocrcheck -local
-- 以文件模式查看OLR
ocrdump -local -stdout
-- 备份与恢复
ocrconfig -local -export file_name
ocrconfig -local -import file_name
ocrconfig –local –manualbackup
-- 查看OLR备份内容
ocrdump -local -backupfile olr_backup_file_name
-- 修改OLR备份路径
ocrconfig -local -backuploc new_olr_backup_path
a.设置ASM Compatibility为11.2.0.0
b.检查Clusterware升级完成
crsctl query crs activeversion
c.使用ASMCA来在所有节点启动ASM
d.使用ASMCA来创建放置OCR的DG
e.往DG中添加OCR(确保Clusterware 正在运行)
ocrconfig -add +new_disk_group
f.删除原始的OCR
ocrconfig -deleteold_storage_location
a.查看节点:olsnodes
b.停止CRS
crsctl stop crs
crsctl stop crs -f
c. 以排它模式起CRS
crsctl start crs -excl
检查crsd是否起,如果起了的话,停止
crsctl stop resource ora.crsd -init
d. 导入OCR
ocrconfig -import file_name
ocrcheck
e.停止CRS crsctl stop crs -f
f.正常启动CRS crsctl start crs
g.检查 cluvfycomp ocr -n all -verbose
# crsctl stop crs
# ocrconfig -local -restore file_name
# ocrcheck -local
# crsctl start crs
$ cluvfy comp olr
$ crsctl start crs -excl
$ crsctl query css votedisk
-- ----- ----------------- --------- ---------
## STATE File Universal Id File Name Disk group
1. ONLINE 7c54856e98474f61bf349401e7c9fb95 (/dev/sdb1)[DATA]
$ crsctl replace votedisk +asm_disk_group
$ crsctl delete css votedisk FUID
# crsctl stop crs
# crsctl start crs
Placement of Voting disk and OCR Files in Oracle RAC 10g and 11gR1 (Doc ID293819.1)
OCR / Vote disk Maintenance Operations: (ADD/REMOVE/REPLACE/MOVE) (Doc ID428681.1)
$GRID_HOME/bin/srvctl config scan
SCAN name: onocrsst-scan, Network: 1/113.52.185.0/255.255.255.0/bge0
SCAN VIP name: scan1, IP: /onocrsst-scan/113.52.185.230
# DB配置初始化参数
*.remote_listener='onocrsst-scan:1521'
$ nslookup oncndb-situat-scanip
Server: 10.16.26.33
Address: 10.16.26.33#53
Name: oncndb-situat-scanip.sit.shanghaionstar.com
Address: 113.52.185.230
当停止SCAN IP所在节点的CRS时,SCAN IP自动漂到另一个节点
也可以手工发出命令:
srvctl relocate scan -i
srvctl relocate scan_listener -i
srvctl modify scan_listener -p 3521
srvctl stop scan_listener
srvctl start scan_listener
srvctl modify scan_listener -p "TCP:3521/TCP:1521"
srvctl stop scan_listener
srvctl start scan_listener
-- 校验OCR的一致性
cluvfy comp ocr -n all -verbose
-- 校验gpnp的一致性 11gR2
cluvfy comp gpnp -n all -verbose
1.oradebug ipc
sqlplus / as sysdba
oradebug ipc
oradebug tracefile_name;
2. select * from x$skgxpia;
3. select * from x$ksxpia;
4. select * from v$cluster_interconnects;
5. SELECT * FROM V$CONFIGURED_INTERCONNECTS;
-- 查看连接分布
set linesize 120
set pagesize 999
column service_name format a20
column username format a20
SELECT inst_id, TYPE,service_name, Count(*)
FROM gv$session GROUP BYinst_id, TYPE,service_name;
INST_ID TYPE SERVICE_NAME COUNT(*)
---------- ---------- -------------------- ----------
1 BACKGROUNDSYS$BACKGROUND 29
2 BACKGROUNDSYS$BACKGROUND 28
2 USER SYS$USERS 46
1 USER SYS$USERS 48
1 USER onpdlcl 51
2 USER onpdlcl 65
SELECT inst_id, TYPE, service_name, username, Count(*)
FROM gv$session GROUP BY inst_id, TYPE,service_name, username
order by 1,2,3,4;
$ srvctl relocate service -h
$ srvctl add service -h
$ srvctl add service -d orcl -s szcg2 -r orcl2 -a orcl1
$ srvctl start service -d orcl -s szcg2
$ srvctl stop service -d orcl -s szcg2
$ srvctl status service -d RPTS
$ srvctl relocate service -d RPTS -s RPTS3 -i RPTS2 -t RPTS3
SET PAGESIZE 60 COLSEP '|' NUMWIDTH 8 LINESIZE 132 VERIFY OFF FEEDBACK OFF
COLUMN service_name FORMAT A20 TRUNCATED HEADING 'Service'
COLUMN begin_time HEADING 'Begin Time' FORMAT A10
COLUMN end_time HEADING 'End Time' FORMAT A10
COLUMN instance_name HEADING 'Instance' FORMAT A10
COLUMN service_time HEADING 'Service Time|mSec/Call' FORMAT 999999999
COLUMN throughput HEADING 'Calls/sec'FORMAT 99.99
BREAK ON service_name SKIP 1
SELECT
service_name
, TO_CHAR(begin_time, 'HH:MI:SS')begin_time
, TO_CHAR(end_time, 'HH:MI:SS')end_time
, instance_name
, elapsedpercall service_time
, callspersec throughput
FROM
gv$instance i
, gv$active_services s
, gv$servicemetric m
WHERE s.inst_id = m.inst_id
AND s.name_hash =m.service_name_hash
AND i.inst_id = m.inst_id
AND m.group_id = 10
ORDER BY
service_name
, i.inst_id
, begin_time ;
| | | |Service Time|
Service |Begin Time|EndTime |Instance | mSec/Call|Calls/sec
--------------------|----------|----------|----------|------------|---------
SYS$BACKGROUND |09:52:39 |09:52:44 |tea1 | 0| .00
|09:52:39 |09:52:44 |tea2 | 0| .00
SYS$USERS |09:52:39 |09:52:44 |tea1 | 0| .00
|09:52:39 |09:52:44 |tea2 | 0| .00
dp_test |09:52:39 |09:52:44 |tea2 | 0| .00
tea |09:52:39 |09:52:44 |tea1 | 0| .00
|09:52:39 |09:52:44 |tea2 | 0| .00
-- 停止服务
exec dbms_service.stop_service('SYS$SYS.SCHEDULER$_EVENT_QUEUE.INAS','inas1');
-- 启动服务
exec dbms_service.start_service('SYS.SCHEDULER$_EVENT_QUEUE','inas1');
-- 查看组件模块
[oracle@ora1 bin]$
-- 设置组件模块(root用户)
[root@ora1 bin]#
-- 日志级别(一般设置2即可)
level 0 = turn off
level 2 = default
level 3 = verbose
level 4 = super verbose
crsctl get css misscount
crsctl set css misscount 60
crsctl get css disktimeout -- VD的超时时间,如果超时,VD置为offline
crsctl get css reboottime --CSS被驱逐后,等待的时间
$ crsctl set css diagwait 13
$ crsctl get css diagwait
11g不需要设置
Top 11 Things to do NOW to Stabilize your RAC Cluster Environment (Doc ID1344678.1)
Pre-11.2: Node Fails to Reboot after Node Eviction or CRS Can't Rejoin theCluster after Node reboot as diagwait has Wrong Value (Doc ID 1277538.1)
Using Diagwait as a diagnostic to get more information for diagnosingOracle Clusterware Node evictions (Doc ID 559365.1)
Pre-11.2 CRS Not Starting After Reboot as diagwait was Changed to WrongValue (Doc ID 1481477.1)
1.查询ocssd.log:不一定会有
grep -i "master node" ocssd.log | tail -1
2.查询crsd.log文件:
grep MASTER crsd.log | tail -1
3.查看OCR备份,OCR备份在master节点上
ocrconfig -showbackup
4.查询V$GES_RESOURCE视图
方法一:
[root@tea2 ~]# su - grid
[grid@tea2 ~]$ cd $ORACLE_HOME/bin
[grid@tea2 bin]$ ./cemutlo -n
tea-cluster
方法二:
[root@rac1 ~]# cd /oracle/app/crs/bin
[root@rac1 bin]# ./ocrdump
[root@rac1 bin]# vi OCRDUMPFILE
crsctl modify res 'ora.oc4j' -attr "ENABLED=0"
crsctl modify res 'ora.oc4j' -attr AUTO_START=never
export SRVM_TRACE=true
[grid@tea1 admin]$ cat listener.ora
LISTENER_SCAN1=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN1)))) # line added by Agent
LISTENER=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER)))) # line added by Agent
SID_LIST_LISTENER =(SID_LIST=(SID_DESC=(SID_NAME=newtea1)(GLOBAL_DBNAME=newtea_DGMGRL)(ORACLE_HOME=/oracle/app/11.2.0.4/db)))
ENABLE_GLOBAL_DYNAMIC_ENDPOINT_LISTENER=ON # line added by Agent
ENABLE_GLOBAL_DYNAMIC_ENDPOINT_LISTENER_SCAN1=ON # line added by Agent
[grid@tea2 admin]$ cat listener.ora
LISTENER_SCAN1=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN1)))) # line added by Agent
LISTENER=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER)))) # line added by Agent
SID_LIST_LISTENER=(SID_LIST=(SID_DESC=(SID_NAME=newtea2)(GLOBAL_DBNAME=newtea_DGMGRL)(ORACLE_HOME=/oracle/app/11.2.0.4/db)))
ENABLE_GLOBAL_DYNAMIC_ENDPOINT_LISTENER=ON # line added by Agent
ENABLE_GLOBAL_DYNAMIC_ENDPOINT_LISTENER_SCAN1=ON # line added by Agent
当在LISTENER_SCAN1上配置了GLOBAL_NAME后,自动在每个节点的LISTENER上有相应的服务名
[grid@tea2 admin]$ lsnrctl status LISTENER_SCAN1
......
Service "newtea_DGMGRL" has 3 instance(s).
Instance "newtea1",status READY, has 1 handler(s) for this service...
Instance "newtea2",status UNKNOWN, has 1 handler(s) for this service...
Instance "newtea2",status READY, has 1 handler(s) for this service...
The command completed successfully
[grid@tea2 admin]$ lsnrctl status LISTENER
......
Service "newtea_DGMGRL" has 1 instance(s).
Instance "newtea2",status READY, has 1 handler(s) for this service...
The command completed successfully
[grid@tea1 admin]$ lsnrctl status LISTENER
......
Service "newtea_DGMGRL" has 1 instance(s).
Instance "newtea1",status READY, has 1 handler(s) for this service...
The command completed successfully
节点被驱逐一般是因为心跳超时所致,此时需要从crs日志、css日志、alert日志、操作系统日志等入手进行分析。同时,可以参考metalink上的如下文档:
1、Top 5 issues for InstanceEviction[文章 ID 1374110.1]
2、Frequent Instance Eviction in9i or Node Eviction in 10g/11g[文章 ID461662.1]
3、Node reboot or eviction: Howto check if your private interconnect CRS can transmit network heartbe[文章 ID 1445075.1]
4、11gR2: LMON received aninstance eviction notification from instance n[文章 ID 1440892.1]
5、Exadata: Instance evictionwith IPC sendtimeout messages.[文章 ID1404576.1]
6、11gR2 GI Node May not Join theCluster After Private Network is Functional After Eviction due to Pr[文章 ID 1479380.1]
Oracle 10g rac(修改所有ip)
1.备份OCR
2.停止oracle相关进程(root执行)
./srvctl stop database -d database_name
./srvctl stop asm -n node_name
./srvctl stop asm -n node_name
./srvctl stop nodeapps -n node_name
./srvctl stop nodeapps -n node_name
/etc/init.d/init.crs stop 所有的节点
备份修改/etc/hosts文件
修改网卡ip地址,重启网络生效
启动crs,但是不启动asm database和nodeapp
-- 查看当前配置:root
./oifcfg getif -global
eth0 10.223.10.0 global public
eth1 192.168.1.0 global cluster_interconnect
-- 删除当前配置 root执行 下同
./oifcfg delif -global eth0
./oifcfg delif -global eth1
./oifcfg getif
-- 重新添加root
./oifcfg setif -global eth0/10.223.10.0:public
./oifcfg setif -global eth1/192.223.1.0:cluster_interconnect
./oifcfg getif
eth0 10.223.10.0 global public
eth1 192.168.1.0 global cluster_interconnect
3.3 修改VIProot
./srvctl modify nodeapps -n raw1 -A 10.223.10.100/255.255.255.0/eth0
./srvctl modify nodeapps -n raw2 -A 10.223.10.101/255.255.255.0/eth0
修改listener.ora和tnsnames.ora
如果你使用了ocfs,修改ocfs配置文件(/etc/ocfs/cluster.conf),验证修改后是否可用。
启动asmdatabase nodeapp
How to Modify Private Network Information in Oracle Clusterware (Doc ID283684.1)
How to Validate Network and Name Resolution Setup for the Clusterware andRAC (Doc ID 1054902.1)
How to Modify Public Network Information including VIP in OracleClusterware (Doc ID 276434.1)
第一步安装执行dbca,选择Oracle Real ApplicationCluster database,点”Next”
第二步:选择Services Management,点”Next”
第三步:添加service name,需设定不同的InstanceRole
(1)点击”Add”,添加service name,在填出框输入servicename,名称可自定,本例使用cwvpd,然后点”OK”
(2)默认情况下,新创建servicename的Instance Role是相同的,都为Preferred,这里需调整一个实例为Preferred,一个为Available;接着设置TAF Policy为Basic,然后点”Finish”
上述两种方式各有优劣,前者建立连接的开销相对较小,但failover时会产生延迟,而后者正好与前者相反。
(3)在弹出框点”OK”
(4)在弹出框点”Yes”
Oracle RAC 客户端故障转移(failover),当采用TAF方式时,对于已经建立连接的客户端,在连接的实例或节点出现故障时,客户端无需再次发出连接请求,仍然可以继续之前的数据库操作,此称之为透明故障转移。
TAF就是说对于那些已经成功连接到特定实例的客户端,如果该实例或节点异常宕机,客户端会自动重新发出到剩余实例的连接请求。使得客户端感觉不到它所连接的实例或节点已经出现故障,这个就称之为透明转移。但其间的活动事务将被回滚。
使用Server-side TAF配置,可采用本文档中推荐的参数设置。
步骤如下:
1)调用dbms_service.modify_service修改failover设置
Begin
Dbms_Service.Modify_Service(
Service_Name => 'cwvpd', --根据实际情况定义服务名
Failover_Method => Dbms_Service.Failover_Method_Basic,
Failover_Type =>Dbms_Service.Failover_Type_Select,
Failover_Retries => 180,
Failover_Delay => 5);
End;
2)检查修改是否生效
SELECT NETWORK_NAME, FAILOVER_METHOD, FAILOVER_TYPE, FAILOVER_RETRIES,FAILOVER_DELAY FROM DBA_SERVICES WHERE NAME = 'CWVPD';
查看节点state是否为ONLINE状态
检查状态,命令参考如下:
检查配置,包括Instance Role和TAF策略,命令如下:
set linesize 120
set pagesize 999
column service_name format a15
select INST_ID, service_name, FAILOVER_TYPE, FAILOVER_METHOD, FAILED_OVER
from gv$session where type = 'USER';
ALTER DATABASE ADD LOGFILE THREAD 1 GROUP 5('+DGIDT2PLF/idt2plf/onlinelog/redo05_1.log','+DGIDT2PLF/idt2plf/onlinelog/redo05_2.log')SIZE 500M;
ALTER DATABASE ADD LOGFILE THREAD 1 GROUP 6('+DGIDT2PLF/idt2plf/onlinelog/redo06_1.log','+DGIDT2PLF/idt2plf/onlinelog/redo06_2.log')SIZE 500M;
ALTER DATABASE ADD LOGFILE THREAD 2 GROUP7('+DGIDT2PLF/idt2plf/onlinelog/redo07_1.log','+DGIDT2PLF/idt2plf/onlinelog/redo07_2.log') SIZE 500M;
ALTER DATABASE ADD LOGFILE THREAD 2 GROUP8('+DGIDT2PLF/idt2plf/onlinelog/redo08_1.log','+DGIDT2PLF/idt2plf/onlinelog/redo08_2.log') SIZE 500M;
alter database drop logfile group 1;
alter database drop logfile group 2;
alter database drop logfile group 3;
alter database drop logfile group 4;
步骤一:检查参数
一个节点操作:
SQL> show parameter control_files
SQL> show parameter spfile
步骤二:关闭数据库
两个节点操作:
SQL> shutdown immediate(两个节点)
步骤三:启动数据库到nomount状态
一个节点:
SQL> startup nomount
步骤四:恢复控制文件到默认路径
一个节点:
使用RMAN恢复控制文件。
ORACLE$ rman target /
RMAN> restore controlfile from '+DATA/yyjc/controlfile/xxxx.xxx.xx';
(登录ASM查看恢复出的控制文件名,修改参数需要用到此文件名)
步骤五:修改参数文件
一个节点:
修改SPFILE增加控制文件,并重启实例
alter system setcontrol_files='+DATABASE_DG/yjdb/controlfile/current.260.817470477' , '文件名' scope=spfile sid='*';
步骤六:关闭数据库
两个节点:
SQL> shutdown immediate
步骤七:启动数据库
两个节点操作:
SQL> startup
步骤八:检查数据库
一个节点操作:
检查并确认
SQL> show parameter control_files
SQL> show parameter spfile
SQL> select * from v$controlfile;
看到两个控制文件,则正常。
select kj.*, le.le_Addr from (
select kjblname, kjblname2, kjblowner, kjblmaster, kjbllockp,
substr ( kjblname2, instr(kjblname2,',')+1, instr(kjblname2,',',1,2)-instr(kjblname2,',',1,1)-1)/65536 fl,
substr ( kjblname2, 1, instr(kjblname2,',')-1) blk
from x$kjbl
) kj, x$le le
where le.le_kjbl = kj.kjbllockp
order by le.le_addr;
oradebug lkdebug -m pkey object_id
Adding a Node to a 10g RAC Cluster (Doc ID 270512.1)
How To Rebuild a Corrupted CRS Home (Doc ID 1296093.1)
Top 3 RAC Database Instance Hang Issues (Doc ID 1389520.1)
Top 5 RAC Instance Crash Issues (Doc ID 1375405.1)
Top 5 issues for Instance Eviction (Doc ID 1374110.1)
Node reboot or eviction How to check if your private interconnect CRS cantransmit network heartbeats (Doc ID 1445075.1)
Oracle Clusterware Cannot Start on all Nodes Network communication withnode NAME missing for 90% of timeout interval (Doc ID 1507482.1)
RAC and Oracle Clusterware Best Practices and Starter Kit (HP-UX) (Doc ID811303.1)
RAC and Oracle Clusterware Best Practices and Starter Kit (Linux) (Doc ID811306.1)
RAC and Oracle Clusterware Best Practices and Starter Kit (AIX) (Doc ID811293.1)
Data Collecting for Troubleshooting Oracle Clusterware (CRS or GI) And RealApplication Cluster (RAC) Issues (Doc ID 289690.1)
Adding a Node to a 10g RAC Cluster (Doc ID 270512.1)
addnode.sh The virtual host name vipfor node nodenameis already in use. (Doc ID 782849.1)
Troubleshoot Grid Infrastructure Startup Issues (Doc ID 1050908.1)
CRS Diagnostic Data Gathering A Summary of Common tools and their Usage(Doc ID 783456.1)
CSS Timeout Computation in Oracle Clusterware (Doc ID 294430.1)
Failover Issues and Limitations [Connect-time failover and TAF] (Doc ID97926.1)
Instance Service Registration to Listeners (Doc ID 69546.1)
CRS 10g Diagnostic Collection Guide (Doc ID 272332.1)
CRS 10gR2 11gR1 11gR2 Diagnostic Collection Guide (Doc ID 330358.1)
Data Gathering for Instance Evictions in a RAC environment (ORA-29740) (DocID 412884.1)
Resolving Instance Evictions on Windows Platforms (Doc ID 297498.1)
Troubleshooting 11.2 Clusterware Node Evictions (Reboots) (Doc ID1050693.1)
Troubleshooting ORA-29740 in a RAC Environment (Doc ID 219361.1)
Steps To Shutdown(stop)/Startup(start) The CRS, OHAS, ASM, RDBMS & ACFSServices on RAC 11.2 Configuration. (Doc ID 1355977.1)
How To Validate ASM Instances And Diskgroups On A RAC Cluster (When CRSDoes Not Start). (Doc ID 1609127.1)
外部冗余
CREATE DISKGROUP DATA EXTERNAL REDUNDANCY DISK
'c:/asmdisks/_file_disk1',
'c:/asmdisks/_file_disk2',
'c:/asmdisks/_file_disk3',
'c:/asmdisks/_file_disk4';
Normal冗余
CREATE DISKGROUP DATA_DW01 NORMAL REDUNDANCY
FAILGROUP dw01cel01 DISK
'/oracle/disks/_asm_disk1',
'/oracle/disks/_asm_disk2',
'/oracle/disks/_asm_disk3',
'/oracle/disks/_asm_disk4'
FAILGROUP dw01cel02 DISK
'/oracle/disks/_asm_disk5',
'/oracle/disks/_asm_disk6',
'/oracle/disks/_asm_disk7',
'/oracle/disks/_asm_disk8';
-- 11gR2参数
attribute
'AU_SIZE'='4M',
'CELL.SMART_SCAN_CAPABLE'='TRUE',
'compatible.rdbms'='11.2.0.2',
'compatible.asm'='11.2.0.2'
alter diskgroup data add disk '/dev/raw/raw4';
SQL> select disk_number,state,name from v$asm_disk;
DISK_NUMBER STATE NAME
----------- ---------------- --------------------
2 NORMAL DATA_0002
1 NORMAL DATA_0001
3 NORMAL DATA_0003
0 NORMAL DATA_0000
alter diskgroup data drop disk DATA_0001;
在drop磁盘时,会自动迁移数据,所以drop不是瞬间完成
alter diskgroup data drop disk DATA_0001 rebalance power 5;
alter diskgroup data drop disk DATA_0001 force;
注意:force是直接删除磁盘,如果diskgroup的normal或high,ASM拿另一份数据再生成冗余。不使用force 是先平衡数据,再删除磁盘。
放弃删除磁盘
ALTER DISKGROUP
select * from v$asm_operation;
GROUP_NUMBER OPERATION STATE POWER ACTUAL SOFAR EST_WORK EST_RATE EST_MINUTES
------------ ---------- -------- ---------- ---------- -------------------- ---------- -----------
1 REBAL RUN 1 1 268 515 296 0
-- POWER值为0到11,0为停止,11为最快
ALTER DISKGROUP data REBALANCE POWER 5;
alter diskgroup DATA_DW01 rebalance;
alter diskgroup DATA_DW01 check all norepair;
alter diskgroup DATA_DW01 check all repair;
在相应的ASM的alert日志中会发现:
Thu Oct 10 00:48:50 CST 2013
SQL> alter diskgroup DATA_DW01 check all repair
Thu Oct 10 00:48:50 CST 2013
NOTE: starting check of diskgroup DATA_DW01
WARNING: disk DATA_DW01_0001, changing DD used AUs from 981 to 979
WARNING: disk DATA_DW01_0002, changing DD used AUs from 500 to 490
WARNING: disk DATA_DW01_0004, changing DD used AUs from 479 to 491
SUCCESS: check of diskgroup DATA_DW01 found no errors
-- ASM使用OMF自管理,添加数据文件更简单
建议添加的时候最好还是制定全路径,免得误操作添加到文件系统中。
create tablespace test datafile ‘+DATA/tea/datafile/system.264.773951505’size 20M;
Linux2.6内核,RDBMS在10.2.0.2版本之后支持直接访问块设备
但OUI(10.1.0和10.2.0)在装CRS时不支持直接访问块设备,必须要在11.1.0.6.0版本之后
参考Linux2.6 Kernel Deprecation of Raw Devices [ID 357492.1]
asm_diskstring不允许设置成/dev/或/dev/*,这会引起扫描所有设备,参考如下:
root.sh Hangs or Reboots Node on Non First Node if ASM Discovery Stringasm_diskstring is Set to /dev/ or /dev/* [ID 1356567.1]
在11.2.0.2AIX 5.3上设置挂起。
查看ACFS挂载点
select FS_NAME, VOL_DEVICE, TOTAL_MB , FREE_MB from V$ASM_ACFSVOLUMES;
1.11gR2 RAC
ASMCMD> orapwusr --modify --password sys
2.11gR2 SI
ASMCMD> passwd sys
读取磁盘头
$ kfed read /dev/raw/raw1234 text=raw1234.out
从10.2.0.5开始,Oracle备份磁盘头信息
$ kfed read /dev/raw/raw1234 blknum=510 text=raw1234.out -> 一般备份的磁盘头放在510块上
$ kfed repair /dev/raw/raw1234 ->grid用户操作
SQL> alter diskgroup data_dg mount;
ORACLE提供了相关工具叫做asmiostat用来监控磁盘的IO性能。
具体请查阅文章
ASMIOSTAT Script to collect iostats for ASM disks (Doc ID 437996.1)
How to Prepare Storage for ASM (Doc ID 452924.1)
Master Note for Automatic Storage Management (ASM) (Doc ID 1187723.1)
ASM Technical Best Practices For 10g and 11gR1 Release (Doc ID 265633.1)
ASM Concepts Quick Overview (Doc ID 1086199.1)
How To Validate ASM Diskgroup Consistency/State After ASM ReclamationUtility (ASRU) Execution Aborted. (Doc ID 1668673.1)
ASM Technical Best Practices For 10g and 11gR1 Release (Doc ID 265633.1)
Information Center: Oracle Automatic Storage Management (Doc ID 1472204.2)
Information Center: Install and Configure Oracle Automatic StorageManagement (Doc ID 1522675.2)
ASMCMD - ASM command line utility (Doc ID 332180.1)
ASM Technical Best Practices For 10g and 11gR1 Release (Doc ID 265633.1)
-- 在生产库停止DataGuard操作
SQL> show parameter log_archive_dest
SQL> alter system set log_archive_dest_state_2=defer;
-- 在生产库开启DataGuard操作:
SQL> alter system set log_archive_dest_state_2=enable;
主库:
SELECT THREAD# "Thread", SEQUENCE# "Last SequenceGenerated"
FROM V$ARCHIVED_LOG
WHERE (THREAD#, FIRST_TIME) IN
(SELECT THREAD#,MAX(FIRST_TIME) FROM V$ARCHIVED_LOG GROUP BY THREAD#)
ORDER BY 1;
备库:
SELECT ARCH.THREAD# "Thread",
ARCH.SEQUENCE# "LastSequence Received",
APPL.SEQUENCE# "LastSequence Applied",
(ARCH.SEQUENCE# -APPL.SEQUENCE#) "Difference"
FROM (SELECT THREAD#, SEQUENCE#
FROM V$ARCHIVED_LOG
WHERE (THREAD#, FIRST_TIME)IN
(SELECT THREAD#,MAX(FIRST_TIME)
FROM V$ARCHIVED_LOG
GROUP BY THREAD#))ARCH,
(SELECT THREAD#, SEQUENCE#
FROM V$LOG_HISTORY
WHERE (THREAD#, FIRST_TIME)IN
(SELECT THREAD#,MAX(FIRST_TIME)
FROM V$LOG_HISTORY
GROUP BY THREAD#))APPL
WHERE ARCH.THREAD# = APPL.THREAD#
ORDER BY 1;
在备库查看应用事件:
SELECT EVENT_TIME,EVENT,XIDUSN, XIDSLT, XIDSQN FROM DBA_LOGSTDBY_EVENTSorder by 1 desc;
可跳过某些引起阻塞的DDL或DML应用,然后手工执行这些应用:
alter database stop logical standby apply;
exec dbms_logstdby.skip_transaction(14,21,517969);
ALTER DATABASE START LOGICAL STANDBY APPLY IMMEDIATE;
在备份库跳过特定的DML或DDL操作
alter database stop logical standby apply;
begin
execute dbms_logstdby.skip (stmt =>'DML',
schema_name => 'BANPING',
object_name => 'TABLENAME',
proc_name => null);
end;
alter database start logical standby apply;
alter database stop logical standby apply;
execute dbms_logstdby.unskip('DML','BANPING','TABLENAME');
exec dbms_logstdby.instantiate_table('BANPING','TABLENAME','dblink_name');
alter database start logical standby apply;
注意这里建立的DBLINKdblink_name必须是public的
select * from V$LOGSTDBY_STATE;
select * from V$LOGSTDBY_PROGRESS;
SQL> ALTER DATABASE REGISTER LOGICAL LOGFILE'/stb_arch/1_143313_640266118.dbf';
Database altered.
1. 应用日志
--开启应用
SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT;
--停止应用
SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;
2. 实时应用
前提是Standby数据库端配置了Standby Redologs。
-- 物理Standby
SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT USINGCURRENT LOGFILE;
-- 逻辑Standby
SQL> ALTER DATABASE START LOGICAL STANDBY APPLY IMMEDIATE;
3. 延迟应用
-- delay单位是分钟,归档会正常发送,standby应用归档延迟
SQL> alter database recover managed standby database DELAY 60 disconnectfrom session;
4. 执行不完全应用
SQL> alter database recover managed standby database cancel;
SQL> ALTER DATABASE RECOVER AUTOMATIC STANDBY DATABASE until time'2008-02-25 17
:25:53';
-- 进程状态
select process,status from v$managed_standby;
RFS- Remote file server(负责接收远程日志文件)
MRP0- Detached recovery server process(负责应用恢复接收的日志
MR(fg)- Foreground recovery session·
SQL> desc v$archive_gap;
Name Null? Type
------------------------------------------------- ----------------------------
THREAD# NUMBER
LOW_SEQUENCE# NUMBER
HIGH_SEQUENCE# NUMBER
查看日志传送情况:
SQL> set linesize 120
SQL> format dest_name format a25
SQL> format error format a50
SQL> select dest_name,status,error from v$archive_dest;
查看备库的状态,保护模式,归档日志路径状态
select dest_name,status,database_mode,recovery_mode,protection_mode fromv$archive_dest_status;
查看归档应用情况:
SQL> select first_time,applied,sequence#,status,next_time fromv$archived_log;
1. 使用rm删除部分归档文件
$ ls -lt --查看文件,按时间排序
$ rm -rf arch1_4601* --删除部分归档日志
2. 手工切换重做日志,让系统继续运行
SQL> alter system switch logfile;
3. 处理RMAN中手工删除的归档
-- 无catalog库手工删除归档的处理
$ rman target /
RMAN> crosscheck archivelog all; --检查归档
RMAN> delete noprompt expired archivelog all; --删除所有过期归档
-- 有catalog库手工删除归档的处理
RMAN> crosscheck copy of archivelog all
RMAN> crosscheck archivelog all
RMAN> resync catalog
RMAN> delete force obsolete;
RMAN> delete expired archivelog all ;
有时侯碰到归档进程无法归档,可能IO有问题,需要重启归档进程
停止归档
alter system archive log stop;
kill掉归档进程(这个时侯注意下是否可以正常杀掉归档进程)
重启归档
alter system archive log start
1. 查看归档备份情况:
RMAN>list backup of archivelog time between "to_date('2011-11-08 18:00:00','yyyy-mm-ddhh24:mi:ss')" and "to_date('2011-11-09 11:00','yyyy-mm-dd hh24:mi:ss')";
RMAN>list archivelog all --查看保存在归档目录下未备份的归档日志。
RMAN>list backup of archivelog all --查看是已被RMAN 备份的归档日志。
2. 归档status说明
STATUS=AAvailable, 表示归档日志文件有效
STATUS=UUnavailable,表示归档日志文件无效
STATUS=DDeleted, 表示归档日志已经被删除
STATUS=XeXpired, 表示归档日志被用户使用操作系统的命令删除了
3. 删除指定sequence之前的归档
deletenoprompt archivelog until sequence 46975 thread1; -- 46975会删除 删除thread1的
4. 删除归档(通配符)
deletenoprompt archivelog like '/home/arch/1_20725%.dbf';
5. 删除指定目录的归档
deletenoprompt archivelog until sequence 315 like '/u01/app/oracle/arch/db1/%';
6. 备份指定目录归档(只删除已备份的归档,其它目录归档不删除)
backuparchivelog like '/u01/app/oracle/arch/db1/%' format'/u01/app/oracle/arch/rman/ARC_%d_%T_%s_%p' delete input;
1. 查看当前时间点前1天的数据
select* from t1 AS OF TIMESTAMP (SYSTIMESTAMP - INTERVAL '1' DAY);
2. 查看当前时间点前10秒的数据
select* from t1 as of timestamp (systimestamp - interval'10'second);
3. 查看某个时间点的数据
select*from njcomm.staff AS OF TIMESTAMP to_timestamp('2014-03-1100:00:00','yyyy-mm-dd hh24:mi:ss');
4. 查看在该SCN下的数据
select* from t1 as of SCN 10771468800439
5. 查看闪回查询的事务(10g)
selectq.start_scn, q.commit_scn, q.logon_user, q.undo_change#, q.undo_sql,q.operation
fromflashback_transaction_query q
whereq.table_owner = 'SCOTT';
6. 多版本查询(10g)
set linesize160
columnVERSIONS_STARTTIME format a25
columnversions_endtime format a25
altersession set nls_date_format='YYYY-MM-DD HH24:MI:SS';
SELECTversions_startscn,
versions_starttime,
versions_endscn,versions_endtime,
versions_xid, versions_operation,
DEPTNO,
DNAME,
LOC
FROM dept VERSIONS BETWEENTIMESTAMP MINVALUE AND MAXVALUE;
7. 闪回表
flashbacktable test to before drop rename to test3;
指定序列号
RMAN>run {
setarchivelog destination to ‘/tmp’;
restorearchivelog from logseq=60 until logseq=70 thread 1;
}
列出某个归档的备份信息
listbackup of archivelog sequence 18884 thread 2;
利用dbms_backup_restore包来恢复归档
DECLARE
v_devvarchar2(50);
v_doneboolean:=false;
BEGIN
v_dev:=sys.dbms_backup_restore.deviceAllocate(type=>'sbt_tape',ident=>'t1',params=>'ENV=(TDPO_OPTFILE=/usr/tivoli/tsm/client/oracle/bin64/tdpo.opt)');
dbms_backup_restore.RestoreSetArchivedLog(destination=>'/arch');
dbms_backup_restore.RestoreArchivedLog(thread=>2,sequence=>18884);
dbms_backup_restore.RestoreBackupPiece(done=> v_done,handle => 'xxx_archlog_backup
sys.dbms_backup_restore.deviceDeallocate;
END;
recoverdatabase skip tablespace GGISSTAGING_DATA, GGISSTAGING_INDEX until scn20076662789;
run
{
allocatechannel c1 type disk;
allocatechannel c2 type disk;
copydatafile '/oradata/anqing/users01.dbf' to'/oradata/anqing2/users01.dbf';
copydatafile '/oradata/anqing/sysaux01.dbf' to'/oradata/anqing2/sysaux01.dbf';
copydatafile '/oradata/anqing/undotbs01.dbf' to'/oradata/anqing2/undotbs01.dbf';
copydatafile '/oradata/anqing/system01.dbf' to'/oradata/anqing2/system01.dbf';
releasechannel c2;
releasechannel c1;
}
如果是10g
RMAN>switch datafile 1 to copy;
$ rmantarget / log rman.log trace rman.trc debug
Space issue in FlashRecovery Area( FRA ) [ID 829755.1]
-- 查询闪回区使用情况
Selectfile_type, percent_space_used as used,
percent_space_reclaimableas reclaimable,
number_of_filesas "number"
fromv$flash_recovery_area_usage;
columnname format a20
columnSPACE_USED format 999999999
columnSPACE_RECLAIMABLE heading "Space|Reclaimable"
columnNUMBER_OF_FILES heading "Number|Of|Files"
SELECT name,SPACE_LIMIT/1024/1024 as SPACE_LIMIT,
SPACE_USED / 1024/1024 as SPACE_USED,
SPACE_RECLAIMABLE, NUMBER_OF_FILES
FROM V$RECOVERY_FILE_DEST;
Number
Space Of
NAME SPACE_LIMIT SPACE_USEDReclaimable Files
------------------------------- ---------- ----------- ----------
/oradata/arch_tmp 409600 12227 0 73
11.2.0.3数据库环境,使用rman进行归档日志备份,想实现:
(1)每天备份归档日志,备份完并不删除归档日志
(2)归档日志备份成功一次之后,下次再备份的时候rman就自动不会再次备份这个归档日志
RMAN>backup archivelog all not backed up;
使用该命令如果遇到 满足 not backed up xx times 的归档才会备份,否则即便该归档仍在DISK上未被删除也不会重复备份, 避免了重复备份带来的问题,也无需每次备份均删除磁盘上的归档文件。
gcbuffer busy是RAC数据库中常见的等待事件,11g开始gc buffer busy分为gc buffer busy acquire和gc buffer busy release。
gcbuffer busy acquire是当session#1尝试请求访问远程实例(remote instance) buffer,但是在session#1之前已经有相同实例上另外一个session#2请求访问了相同的buffer,并且没有完成,那么session#1等待gc buffer busy acquire。
gc buffer busy release是在session#1之前已经有远程实例的session#2请求访问了相同的buffer,并且没有完成,那么session#1等待gc buffer busy release。
在AWR中Segments by Global CacheBuffer Busy 记录了访问频繁的gcbuffer.
解决方法可以根据热点块的类型采取不同的解决方法,比如采取分区表,分区索引,反向index等等。这点与单机数据库中的buffer busy waits类似。
低效SQL语句会导致不必要的buffer被请求访问,增加了buffer busy的机会。在AWR中可以找到TOP SQL。解决方法可以优化SQL语句减少buffer访问。这点与单机数据库中的buffer busy waits类似。
RAC数据库,同一数据在不同数据库实例上被请求访问。
如果应用程序可以实现,那么我们建议不同的应用功能/模块数据分布在不同的数据库实例上被访问,避免同一数据被多个实例交叉访问,可以减少buffer的争用,避免gc等待。
对于10g可以使用hash分区索引,对于9i数据库,只能使用反转键索引,但是反转键不支持range scan.
To findthe offending index:
Findthe data block addresses (dba) of the splitting index from the V$SESSION_WAITview. Two different dbas are given plus the level of the index block:
P1 :rootdba: The root of the index
P2 :level: The level of the index where the block is being split
P3 :childdba: The actual block of the index being split
SELECTsid, event, p1, p2, p3 FROM v$session_wait
Findthe physical location of the splitting index byusing the DBMS_UTILITY package. Two functions will help zero in on the physicallocation of the index block using the rootdba value from step 1:
DATA_BLOCK_ADDRESS_FILE:Returns the file number of the dba
DATA_BLOCK_ADDRESS_BLOCK:Returns the block number the dba
SELECT
DBMS_UTILITY.DATA_BLOCK_ADDRESS_FILE(
DBMS_UTILITY.DATA_BLOCK_ADDRESS_BLOCK(
FROMdual;
Findthe offending index object from DBA_EXTENTSusing the FILE_ID and BLOCK_ID values determined from step 2:
SELECTowner, segment_name
FROMdba_extents
WHEREfile_id =
AND
对数据库做hanganalyze对诊断数据库hang是很有帮助的
RAC数据库收集hanganalyze的命令:
SQL>conn / as sysdba
SQL>oradebug setmypid
SQL>oradebug unlimit
SQL>oradebug -g all hanganalyze 3
DIAG进程的dia0 trace也会记录hang信息
有些情况dba_hist_active_sess_history也会记录hang信息
Bug 13787307 : HANG IN RAC WITH 'GC CURRENT REQUEST'<='GC BUFFER BUSY ACQUIRE' SIGNATURE |
当一个session试图去访问buffer cache中的一个块但是因为buffer正忙的时候,buffer busy wait就发生了。换言之,就是说另一个session正在修改这个块,块的内容处在变化状态。
为保证读的会话可以获取到coherent image of the block(coherent image可以包括块的全部改变或者没有改变),oracle的session会给block header标记一个flag,让other users知道这个change发生了,等待全部的change完成。
这个等待事件主要发生在下面两种情况:
1. Anothersession is reading the block into the buffer - this specific case has beensplit out into a "read by other session" wait event in 10g and higher。
2. Anothersession holds the buffer in an incompatible mode to our request。
下面是产生等待事件的例子
1)Hot block issue, such as thefirst block on the free list of a table, with high concurrent inserts. Allusers will insert into that block at the same time, until it fills up, thenusers start inserting into the next free block on the list, and so on。
2)Multiple users running aninefficient SQL statement performing a full table scan on the same large tableat the same time. One user will read the block off disk, and the other userswill wait on buffer busy waits (or read by other session in 10g and higher) forthe physical I/O to complete
SELECTs.sql_hash_value, sw.p1 file#, sw.p2 block#, sw.p3 reason
FROMgv$session_wait sw, gv$session s
WHEREsw.event = 'buffer busy waits'
ANDsw.sid = s.sid;
SELECTowner , segment_name , segment_type
FROMdba_extents
WHEREfile_id = &FileNo
AND&BlockNo BETWEEN block_id AND (block_id + blocks-1);
SELECTowner, segment_name, file_id, block_id starting_block_id, block_id + blocksending_block_id, blocks
FROMdba_extents
WHEREfile_id = &file_num AND ( block_id <= &block_id AND (&block_id< (block_id + blocks)))
colowner format a10
colobject_name format a20
coltsname format a10
colvalue format 99999
SELECT*
FROM (
SELECT owner, object_name, subobject_name, object_type, tablespace_name TSNAME,value
FROM gv$segment_statistics
WHERE statistic_name='buffer busy waits'
ORDER BY value DESC)
WHEREROWNUM < 11;
How to Identify Which Segment is Associated with Buffer BusyWaits (Doc ID 413931.1)
Bydefault INITRANS value for table is 1 and for index is 2. When too manyconcurrent DML transactions are competing for the same data block weobserve this wait event - " enq: TX - allocate ITL entry"
Toreduce enq: TX - allocate ITL entry" wait events, We need to follow thebelow steps.
A)
1)Depending on the amount of transactions in the table we need to alter the valueof INITRANS.
altertable
等级 |
说明 |
1 |
Include PGA |
2 |
Include SGA |
4 |
Include UGA |
8 |
Include indirect memory dumps |
ALTER SESSION SET EVENTS 'immediate trace name library_cache level level';
ORADEBUGDUMP LIBRARY_CACHE level
等级包括如下
等级 |
说明 |
1 |
Dump library cache statistics |
2 |
Include hash table histogram |
3 |
Include dump of object handles |
4 |
Include dump of object structures (heap 0) |
例如
ALTER SESSION SET EVENTS'immediate trace name library_cache level 1';
ALTER SESSION SET EVENTS 'immediate trace name row_cache level level';
ORADEBUG DUMP ROW_CACHE level
等级对照表
Level |
Description |
1 |
Dump row cache statistics |
2 |
Include hash table histogram |
8 |
Include dump of object structures |
例如
ALTER SESSION SET EVENTS 'immediate trace name row_cache level1';
To dump buffer headers and buffer contents for buffers currently in thecache
ALTER SESSION SET EVENTS 'immediate trace name buffers level level ';
where level is one of the following
Level |
Description |
1 |
Buffer headers only |
2 |
Level 1 + block headers |
3 |
Level 2 + block contents |
4 |
Buffer headers only + hash chain |
5 |
Level 1 + block headers + hash chain |
6 |
Level 2 + block contents + hash chain |
8 |
Buffer headers only + hash chain + users/waiters |
9 |
Level 1 + block headers + hash chain + users/waiters |
10 |
Level 2 + block contents + hash chain + users/waiters |
To dump the top-level heap in a memory area
ALTER SESSION SET EVENTS 'immediate trace name heapdump level level ';
level 包括
Level |
Description |
1 |
PGA summary |
2 |
SGA summary |
4 |
UGA summary |
8 |
Callheap (Current) |
16 |
Callheap (User) |
32 |
Large pool |
64 |
Streams pool |
128 |
Java pool |
1025 |
PGA with contents |
2050 |
SGA with contents |
4100 |
UGA with contents |
8200 |
Callheap with contents (Current) |
16400 |
Callheap with contents (User) |
32800 |
Large pool with contents |
65600 |
Streams pool with contents |
131200 |
Java pool with contents |
To dump theprocess state use:
ALTER SESSION SET EVENTS 'immediatetrace name processstate level level ';
ORADEBUG DUMP PROCESSSTATE level;
For example
ALTER SESSION SET EVENTS 'immediate trace name processstate level10';
system state dump 包含每个进程的进程状态。
State objects 被保存在SGA中。
Oracle推荐用system dump去诊断下面的问题:
To dump the system state use
ALTER SESSION SET EVENTS 'immediate trace name systemstate level level';
例如
ALTER SESSION SET EVENTS 'immediatetrace name systemstate level 10';
or
ORADEBUG DUMP SYSTEMSTATE level
A system state dump can be triggered by an error, for example thefollowing init.ora parameter
System state dump可以被一个ORA错误触发,比如在pfile里如下的ORA-00060错误参数,在数据库出现ORA-00060错误时即会触发system dump。
event = "60 trace namesystemstate level 10"
Error stack描述了进程的当前状态信息,包括了当前SQL语句的信息和当前进程的进程状态信息。
Oracle推荐用error stack诊断下面的问题:
Error stack dumps can also be triggered by an error .
To dump an error stack use
ALTER SESSION SET EVENTS'immediate trace name errorstack level level';
ORADEBUG DUMP ERRORSTACK level;
Level包含如下
Level |
Description |
0 |
Error stack only |
1 |
Error stack and function call stack |
2 |
As level 1 plus the process state |
3 |
As level 2 plus the context area |
To dump the current enqueue states use
ALTER SESSION SET EVENTS'immediate trace name enqueues level level';
ORADEBUG DUMP ENQUEUES level;
Level |
Description |
1 |
Dump a summary of active resources and locks, the resource free list and the hash table |
2 |
Include a dump of resource structures |
3 |
Include a dump of lock structures |
To dump the current latch status use
ALTER SESSION SET EVENTS'immediate trace name latches level level';
ORADEBUG DUMP LATCHES level
Level |
Description |
1 |
Latches |
2 |
Include statistics |
To dump a block
ALTER SYSTEM DUMP DATAFILE absolute_file_numberBLOCK block_number;
To dump a range of blocks
ALTER SYSTEM DUMP DATAFILE absolute_file_number
BLOCK MIN minimum_block_number
BLOCK MAX maximum_block_number;
The DATAFILE clause can specify an absolute file number of a datafilename. If the DATAFILE clause specifies a datafile name, the blocks can also bedumped from a closed database e.g.
ALTER SYSTEM DUMP DATAFILE 'file_name'BLOCK block_number;
Normally a symbolic block dump is output. However, this may not bepossible if the block has become corrupt. It is possible to output the blockdump in hexadecimal.
To dump a block in hexadecimal, enable event 10289
ALTER SESSION SET EVENTS '10289trace name context forever, level 1';
dump the block(s) using one of the above commands and then disable 10289again using
ALTER SESSION SET EVENTS'10289 trace name context off';
An index tree can be dumped using
ALTER SESSION SET EVENTS'immediate trace name treedump level object_id';
ORADEBUG DUMP TREEDUMP object_id;
where object_id is the object number of the index (in DBA_OBJECTS)
The tree dump includes
To dump an undo segment header use the command function
ALTER SYSTEM DUMP UNDO_HEADER'segment_name';
A list of undo segment IDs and names can be obtained using
SELECT segment_id,segment_name
FROM dba_rollback_segs
ORDER BY segment_id;
Dump undo块跟dump普通的数据文件块一样。
To dump all the undo written for a specific transaction, first identifythe transaction ID using
SELECT xidusn, xidslot, xidsqn FROM v$transaction;
Dump the undo using the transaction ID
ALTER SYSTEM DUMP UNDO BLOCK 'segment_name' XID xidusn xidslot xidsqn;
To dump a redo log file use
ALTERSYSTEM DUMP LOGFILE 'FileName';
-- 10g后使用
sqlplus-prelim / as sysdba
oradebugsetmypid
oradebugunlimit;
oradebugdump systemstate 10
参考How to Capture a SystemstateDump Using GDB Utility on Linux [ID 374569.1]
当Oracle系统hang住 ,无法使用一切方法登录时 (包括 sqlplus -prelim / as sysdba),我们可以使用gdb调试工具来对 Oracle做系统 dump ,通过 系统 dump信息 判断 具体hang的原因 。
ps -ef| grep orcl (LOCAL=NO)
oracle3846 1 0 10:00 ? 00:00:00 orcl (LOCAL=NO)
oracle5116 1 0 10:15 ? 00:00:00 orcl (LOCAL=NO)
oracle9121 1 0 11:00 ? 00:00:00 orcl
Example:process id 3846
$ gdb$ORACLE_HOME/bin/oracle 3846
(outputof this command will return to the screen)
.....
Readingsymbols from /lib....done
LoadingSymbols for /lib....
.....
(gdb)print ksudss(10)
参考Taking Systemstate Dumps whenYou cannot Connect to Oracle [ID 121779.1]
(saki)% ps -ef |grep sqlplus
osupport 78526 154096 0 12:11:05 pts/1 0:00 sqlplus scott/tiger
osupport 94130 84332 1 12:11:20 pts/3 0:00 grep sqlplus
(saki)% ps -ef |grep 78526
osupport 28348 78526 0 12:11:05 - 0:00 oracles734 (DESCRIPTION=(LOCAL
osupport 78526 154096 0 12:11:05 pts/1 0:00 sqlplus scott/tiger
osupport 94132 84332 1 12:11:38 pts/3 0:00 grep 78526
(saki)% dbx -a 28348
Waitingto attach to process 28348 ...
Successfullyattached to oracle.
warning:Directory containing oracle could not be determined.
Apply'use' command to initialize source path.
Type'help' for help.
readingsymbolic information ...
stoppedin read at 0xd016fdf0
0xd016fdf0(read+0x114) 80410014 lwz r2,0x14(r1)
(dbx)print ksudss(10)
(dbx)detach -- 退出
在user_dump_dest目录中会生成相关进程ID的trc文件
(saki)% ls -lrt *28348*
-rw-r----- 1 osupport dba 46922 Oct 10 12:12 ora_28348.trc
参考UsingHP-UX Debugger GDB To Produce System State Dump [ID 273324.1]
pstack
pstack
说明:数据库或主机停机,需要提前关闭OGG。数据库或主机启动后,需要手动开启OGG。
1、 启动OGG
Oracle用户下
$cd /ogghome
$./ggsci
GGSCI>startmgr
GGSCI>start*
2、 关闭OGG
Oracle用户下
$cd /ogghome
$./ggsci
GGSCI>stop*
GGSCI>stopmgr!
如果不能正常停则:
GGSCI>sendsextr01 forcestop
GGSCI>sendspump01 forcestop
GGSCI>stopmgr!
1、 检查进程状态
Oracle用户下
$cd /ogghome
$./ggsci
GGSCI> info all
Program Status Group Lag Time Since Chkpt
MANAGER RUNNING
EXTRACT RUNNING SEXTR01 00:00:00 00:00:05
EXTRACT RUNNING SPUMP01 00:00:00 00:00:00
REPLICAT RUNNING TREPLVA 00:00:00 00:00:00
说明:其中manager为OGG的管理进程,EXTRAC为OGG的抽取进程或者投递进程,REPLICAT为OGG的投递进程,确保所有进程都是RUNNING状态。具体的进程状态可使用以下命令查看:
GGSCI> info SEXTR01
GGSCI> info SPUMP01
…..
2、 检查进程的统计信息
GGSCI>stats <进程名称>,<时间频度>,table GGSCI>stats <进程名称>,total列出自进程启动以来处理的所有记录数。 GGSCI>stats <进程名称>,daily, table 其中<进程名称>分别是SEXTR01和SPUMP01, GGSCI> view param SEXTR01 --可以得到抽取的表名 GGSCI> stats sextr01, total --列出抽取进程启动以来处理的所有记录 GGSCI> stats sextr01, daily --列出抽取进程启动当天处理的所有记录 GGSCI> stats sextr01, daily, table --列出抽取进程启动当天某张表处理的所有记录 GGSCI> stats SPUMP01, total --列出投递进程启动以来处理的所有记录 GGSCI> stats SPUMP01, daily --列出投递进程启动当天处理的所有记录 GGSCI> stats SPUMP01, daily, table --列出投递进程启动当天某张表处理的所有记录 3、 查看进程的参数和报告 GGSCI> view param sextr01 --查看抽取进程参数文件配置 GGSCI> edit param sextr01 --编辑抽取进程参数文件 GGSCI> view paramSPUMP01 --查看投递进程参数文件配置 GGSCI> edit param SPUMP01 --编辑投递进程参数文件 GGSCI> view report sextr01 --查看抽取进程的报告 GGSCI> view report SPUMP01 --查看投递进程的报告 当进程出现故障时,可通过查看进程的报告来确定故障原因。 序号 复制说明 OGG部署地址 管理端口 动态端口 进程说明 1 基建管控系统复制 7950 7950-7990 SEXTR01(抽取进程)、SPUMP01(投递进程) 2 营销稽查监控复制 7950 7950-7990 3 财务管控系统复制 7950 7950-7990 序号 复制说明 OGG部署地址 管理端口 动态端口 进程说明 1 基建管控系统复制 7950 7950-7990 TREPLVA(复制进程) 2 营销稽查监控复制 7950 7950-7990 TREPLVB(复制进程) 3 财务管控系统复制 7950 7950-7990 TREPLVC(复制进程) 4 省(市)上传 7950 7950-7990 EXTBAM(抽取进程)、PUMPBAM(投递进程) 5 总部下发 7950 7950-7990 REPBAMXX(复制进程) 如果是某些表由于各种原因造成两边数据不一致,需要重新进行同步,但实际业务始终24小时可用,不能提供时间窗口,则可以参照以下步骤(因较为复杂,使用需谨慎,此处以scott.emp表为例说明。): 1) 确认ext/dpe/rep进程均无较大延迟,否则等待追平再执行操作; 2) 停止目标端的rep进程; 注意:步骤3-5为将源端数据通过expdp/impdp导入到目标端,客户也可以选择其它初始化方式,比如PL/SQL导入语插入。 3) 在源端获得当前的scn号。例如: select dbms_flashback.get_system_change_numberfrom dual; 以下以获得的scn号为1176681为例 4) 在源端使用exp导出所需重新初始化的表或者几张表数据,并且指定到刚才记下的scn号。例如: expdp ogguser/ogguser_123directory=ogg dumpfile=nanhai.dmp tables=scott.emp grants=n statistics=nonetriggers=n compress=n FLASHBACK_SCN=1176681 --以scott.emp为例 5) 通过ftp传输到目标端; 6) 在目标库删除该表 droptable scott.emp; 7) 在目标端,使用imp导入数据; nohupimpdp ogguser/ogguser_123 directory=ogg dumpfile=nanhai.dmp remap_schema=scott:ods_yyjc_bufrepmap_tablespace=源端表空间名字:odsview 注:需要转换相应用户与表空间名字,可以向DBA求助。 8) 对该表添加updatetime字段 Altertable scott.emp add updatetime default sysdate; 9) 编辑目标端对应的rep参数文件,在其map里面加入一个过滤条件,只对这些重新初始化的表应用指定scn号之后的记录(一定要注意不要修改本次初始化之外的其它表,会造成数据丢失!): map scott.emp,target ods_yyjc_buf.emp, filter ( @GETENV ("TRANSACTION","CSN") > 1176681 ) ; 10) 确认参数无误后,重新启动目标端的rep进程; 使用info repxx或者lag repxx直到该进程追上,停止该进程去掉filter重启后即可进入正常复制。 删除表只需要编辑抽取进程、投递进程、复制进程参数文件,将要删除的表从参数文件都删除即可。 1. 增加表前先查询源数据库,确认表是存在的。 2. 在源端为新增表添加附加日志(以SG_BAM.TEST为例): cd/ogghome ./ggsci GGSCI >dblogin userid ogguser, password ogguser_123 GGSCI> add trandata SG_BAM.TEST --新增表 GGSCI> info trandata SG_BAM.TEST --查看是否已添加成功,成功为enable 3. 停止复制进程。 4. 修改抽取进程参数(以extbam为例),添加新表: GGSCI >edit param extbam --相当于vi /ogghome/dirprm/extbam.prm EXTRACTextbam SETENV(NLS_LANG=AMERICAN_AMERICA.ZHS16GBK) USERIDogguser@yyjc,PASSWORD AACAAAAAAAAAAAHANGWARBMCFHHDCFMF,ENCRYPTKEY default DYNAMICRESOLUTION tranlogoptionsaltarchivelogdest primary instance yyjc1 /arch1, altarchivelogdest instanceyyjc2/arch2 reportat 1:59 reportrolloverat 2:00 Gettruncates warnlongtrans3h, checkinterval 10m TRANLOGOPTIONSRAWDEVICEOFFSET 0 EXTTRAIL./dirdat/SG_BAM/BA TABLESG_BAM.BAM_PROCESSINST; TABLESG_BAM.BAM_ACTIVITYINST; TABLESG_BAM.BAM_PROCINST_VIOLATION_DETAIL; TABLESG_BAM.BAM_ACTINST_VIOLATION_DETAIL; TABLESG_BAM.WZ_CGSQ_H; TABLESG_BAM.WZ_HT_H; TABLESG_BAM.WZ_CGDD_H; TABLESG_BAM.WZ_DHYS_H; TABLESG_BAM.WZ_FKSQ_H; TABLESG_BAM.WZ_ZLWT_H; TABLESG_BAM.WZ_Q_CCPSTM; TABLESG_BAM.WZ_Q_HTGLTM; TABLESG_BAM.WZ_Q_LCJC; TABLESG_BAM.WZ_Q_XQCGTM; TABLESG_BAM.DWM_TRANSMISSIONPRO_H; TABLE SG_BAM.DWM_SINGLEPRO; TABLESG_BAM.DWM_CONTRACTINFO; TABLESG_BAM.DWM_TRANSMISSIONPRO_PLAN; TABLESG_BAM.DWM_TRANSMISSIONPRO_SAFE; TABLESG_BAM.TEST; --比如添加此行,注意要加分号 5. 重启抽取、投递进程。 GGSCI >stop extbam GGSCI> start extbam GGSCI> info all 确认抽取和投递进程都处于running状态。 6. 在源端获得当前的scn号。进行表的初始化(如果新增表无数据,则跳过此步)。 selectdbms_flashback.get_system_change_number from dual; 以下以获得的scn号为1176681为例 在源端使用expdp导出所需重新初始化的表或者几张表数据,并且指定到刚才记下的scn号,例如: expdp ogguser/ogguser_123directory=ogg dumpfile=nanhai.dmp tables=scott.test grants=n statistics=nonetriggers=n compress=n FLASHBACK_SCN=1176681 --以scott.test为例 在目标端,使用impdp导入数据。 7. 修改目标端的复制进程 编辑目标端对应的rep参数文件,在其map里面加入一个过滤条件,只对这些重新初始化的表应用指定scn号之后的记录(一定要注意不要修改本次初始化之外的其它表,会造成数据丢失!): map scott.emp, target ods_yyjc_buf.test, filter ( @GETENV ("TRANSACTION", "CSN") > 1176681 ) ; 如果本次新增表没有初始化数据,则无需上面加亮总部。 总部启动复制进程,使用info repxx或者lag repxx直到该进程追上,停止该进程去掉filter重启后即可进入正常复制。 如果源系统中涉及运监流程数据接入的表的结构进行变更,需将变更语句及变更信息及时通知运监人员,运监OGG运维人员需及时进行调整。 在表结构变更之前确保源端抽取进程停止,表结构变更完成之后,表级别附加日志需要重新添加。重新添加表级别附加日志操作如下,以test.t表为例: $cd/ogghome $./ggsci GGSCI>dblogin userid ogguser, password ogguser GGSCI>delete trandata test.t GGSCI>add trandata test.t 如果采用异构转换方式进程数据同步的系统,还需在源端重新生成异构文件,并传送至目标端,覆盖原先的异构文件,启动复制进程。 待目标端相应设置都完成之后源端可以启动抽取进程。 注意:尽量避免主键列的变动,如果有主键列的变动请将相关变动与ogg运维人员商量实施。 1. 主/备机节点停止HACMP服务 #smitty clstop 2. 主/备机修改en0 IP #smitty chinet 3. 主/备机编辑/etc/hosts 4. 修改HACMP配置 #smitty hacmp->Extended Configuration-> Extended Topology Configuration->ConfigureHACMP Communication Interfaces/Devices->Change/Show CommunicationInterfaces/Devices 依次选择主/备机节点的en0/en0,不做修改,直接回车即可,自动会更新boot1IP地址 5. 显示网络接口配置 #/usr/es/sbin/cluster/utilities/cllsif 1. 主/备机节点停止HACMP服务 #smitty clstop 2. 主/备机编辑/etc/hosts 3. 修改HACMP配置 #smitty hacmp->Initialization and Standard Configuration-> ConfigureResources to Make Highly Available-> Configure Service IPLabels/Addresses->Change/Show a Service IP Label/Address * IPLabel/Address nodesvc * NewIP Label/Address [nodesvc] 这里再写一遍 *Network Name [net_ether_01] 4. 显示网络接口配置 #/usr/es/sbin/cluster/utilities/cllsif 5. 验证后启动HACMP #smitty hacmp-> Extended Configuration-> Verification andSynchronization odmgetHACMPcluster odmdelete-o HACMPcluster 1. 内存参数配置: 内存被分为两类,一种为工作区,用于存放进程数据、堆栈、核心Kernal数据以及共享内存,工作区的数据如果需要换页,只会交换到paging space。 另一类为持久存储区,主要是文件数据在内存中的缓冲,当持久存储区的数据需要换页,则会交换到其所归属的文件。持久存储区的页又被分成Client pages和Non-clientpages,其中,None-client page只能缓冲Journaled File System (JFS)文件系统的文件数据,而Clientpage缓冲所有其它类型文件系统的数据,如:NFS。 vmo -p-o lru_file_repage=0 #设置该参数为0,使得内存中文件页面数大于minperm后,只偷取文件页面 #默认为1 vmo -p-o minperm%=5 #当文件页面占用的实际内存的百分比低于minperm%时, # 页面替换算法将同时替换文件页面及计算页面 vmo -p-o maxperm%=90 #当文件页面占用的实际内存的百分比高于maxperm%时, # 页面替换算法仅替换文件页面 #在AIX5.3 ML1前尚未采用lru_file_repage参数, # 因此需要将maxperm设置得很低,避免计算页面被偷取 vmo -p-o maxclient%=90 #jfs2与nfs最大文件系统cache使用 一般当使用裸设备时,设置minperm%=5,lru_file_repage=0,maxperm%可以使用默认值。 2. 查看内存参数配置 # vmo-L minperm% -L maxperm% -L maxclient% -L lru_file_repage NAME CUR DEF BOOT MIN MAX UNIT TYPE DEPENDENCIES -------------------------------------------------------------------------------- lru_file_repage 0 1 0 0 1 boolean D -------------------------------------------------------------------------------- maxclient% 90 80 90 1 100 % memory D maxperm% minperm% -------------------------------------------------------------------------------- maxperm% 90 80 90 1 100 % memory D minperm% maxclient% -------------------------------------------------------------------------------- minperm% 5 20 5 1 100 % memory D maxperm% maxclient% -------------------------------------------------------------------------------- 3. 网络参数配置: /usr/sbin/no-r -o ipqmaxlen=512 /usr/sbin/no-p -o sb_max=1310720 /usr/sbin/no-p -o udp_sendspace=65536 /usr/sbin/no-p -o udp_recvspace=655360 /usr/sbin/no-p -o tcp_sendspace=65536 /usr/sbin/no-p -o tcp_recvspace=65536 /usr/sbin/no-p -o rfc1323=1 4. 查看配置的参数: # cat/etc/tunables/nextboot ...... vmo: maxclient% = "90" maxperm% = "90" minperm% = "5" lru_file_repage = "0" no: rfc1323 = "1" tcp_recvspace = "65536" tcp_sendspace = "65536" udp_recvspace = "655360" udp_sendspace = "65536" sb_max = "1310720" ipqmaxlen = "512" 5. 恢复默认配置 注意:会恢复所有参数 # vmo-p -D 1. 创建文件系统 # crfs-v jfs2 -g rootvg -a size=20480M -m /oracle -A yes 2. 增加swap大小 # chps-s 30 hd6 增加的是pp个数,如pp size为256m,增加30个就是增加7268M 3. 增加/tmp大小,前提要所有卷组有空闲空间 # chfs-a size=+'1792M' /tmp 4. 未建vg情况下,查看pv大小 #bootinfo -s updisk0 307200 5. pv管理 chdev-l hdisk2 -a pv=clear chdev-l hdisk2 -a pv=yes 6. 创建卷组(Scalablevg) /usr/sbin/mkvg-S -y'oradatavg' -f -P128 -s256 hdisk2 -P:vg中最大可以有的pp数:128*1024=131072个 -s:在每个物理分区中设置兆字节数 -t:缺省每个pv可以有1016 个物理分区,如果1016 * pp.size < pv盘大小, 那就需要-t来指定pv中,每物理卷的物理分区数的限制, 这会影响加入到vg中的pv数,MAXPVS/factor 注意:Scalable VG的 max pps per pv不是1016,比这个大很多。 MAXPVS 普通VG为32,大VG为128,Scable VG为1024 #/usr/sbin/mkvg -S -y'bakvg' -f -P128 -s256 updisk0 # crfs-v jfs2 -g bakvg -a size=20480M -m /backup -A yes # chown-R oracle:oinstall /backup 7. 向卷组中添加PV extendvgoradatavg hdisk8 extendvg-f oradatavg hdisk8 fsck /dev/fslv00 -- 先检测,发现问题,再使用-p fsck –p/dev/fslv00 # lsattr-El aio0 # mkdev-l aio0 aio0Available #chdev-P -l aio0 -a autoconfig='available' aio0changed 通过如下命令可以查询系统当前aioserver的个数,如果已经达到或者接近maxservers,则需要考虑增加该参数。 #pstat-a | grep aios | wc -l # truss-faeio /tmp/truss.out -p 962704 # cat/tmp/truss.out | egrep -v "times|_poll" # tail-f /tmp/truss.out | egrep -v "times|_poll" 1. 获取系统中的HBA卡; # lsdev-C | grep fcs fcs0 Available 00-08 FC Adapter fcs1 Available 05-08 FC Adapter 2. 获取HBA属性 # lscfg-vl fcs0 以下就是WWN号 NetworkAddress.............10000000C97E5FFA #cfgmgr -v errpt-d H -T PERM -- 硬件错误 errpt-aj XXXXXX | more -- 查看某个错误的详细信息 /usr/sbin/instfix-a -ivk IV09580 /bin/oslevel–s /usr/sbin/instfix-i | grep ML #vmstat 2 10 id CPU 空闲时间或无I/O等待时间的百分比; wa CPUI/O 等待时间的百分比; r 运行队列中的线程数;运行队列中等待的线程数越多,系统性能受到的影响越大。 如果 id 和wa 的值持续为接近0的值,sample明CPU此时处于繁忙状态。 # tprof-x sleep 30 此命令运行30秒钟,生成sleep.prof文件 # netpmon-o /tmp/netpmon.out -O cpu -v; sleep 30; trcstop 此命令运行30 秒钟,并在/tmp目录下生成文件 netpmon.out。 CPUTime 为进程使用CPU的时间总值 CPU%对应其百分比 NetworkCPU% 为进程中与网络有关的代码所占用的CPU百分比。 # filemon-u -O all -o /tmp/fmon.out; sleep 30; trcstop filemon命令用于查看哪些文件/逻辑卷/硬盘处于繁忙状态。 1. 最为繁忙的虚拟内存段(segment)、逻辑卷和物理卷; 2. 对页空间(paging space)的读写次数,确认硬盘I/O 是由于应用程序操作还是频繁的换页操作; 3. 最为活跃的文件或逻辑卷,如果它们存放在繁忙的物理卷上,可以考虑将数据移至相对空闲 的硬盘上,这样有助于提高系统性能。最为繁忙的段的报sample里列示出最繁忙的文件所对应的文件系统和i节点。文件系统的安装点(mount point)及文件的i节点(inode)可与命令ncheck一起使用,来找出相对应的文件。这个报sample可用来判断该I/O操作是针对文件系统、JFS Log 还是系统页空间的。 通过检查字段"reads"和"read sequences"的值,可以判断该操作是顺序读取还是随机读取。当"read sequences"接近"reads"时,对该文件的访问则以随机读取方式居多。 # svmon-Pu 10 | more 这条命令显示出使用内存最多的前十个进程,并针对每个进程给出了一个详细列 在每个进程的列sample中,找出Type为"work"及Description为"private"的相应行,查看Pgspace显示的使用页面数(每个页面大小为4096字节),该数值为此段(segment)在虚拟内存空间中占用的工作页面(working pages)数,即占用的页空间中的页面数。如果Pgspace值持续增长,系统则有可能存在内存泄漏(memoryleak)。当应用程序不能正常释放内存时会出现内存泄漏的问题。 1. 创建逻辑卷 以mb为单位,此处为100G # lvcreate-L 102400 -n oradatalv1 /dev/vg04 条带化处理 # i: 跨多少个PV # I: 条带大小,单位KB DB一般为64KB或128KB,随机读的更小 # lvcreate–i 2 –I 64 –n lvdata –L 1000 /dev/vgdata 2. 创建文件系统 # newfs-F vxfs -o largefiles /dev/vg04/roradatalv1 3. mount文件系统 # mount/dev/vg04/oradatalv1 /vg04/webuser 4. 在/etc/fstab中添加条目 /dev/vg04/oradatalv1/vg04/webuser vxfs rw,suid,largefiles,delaylog,datainlog 0 2 5. 删除逻辑卷 # lvremove/dev/vg04/oradatalv1 6. 文件系统扩容 umount/db lvextend–L 8000 /dev/vg03/dblv # 护展到8000M extendfs/dev/vg03/dblv mount/db 1. 创建VG mkdir /dev/vg01 mknod/dev/vg01/group c 64 0x010000 vgcreate-s 16 vg01 /dev/dsk/cxtxdx 2. 查看卷组、卷组所包括的逻辑卷以及该卷组所包括的物理磁盘 vgdisplay-v vg00 3. 查看硬盘大小信息 diskinfo/dev/rdsk/c2t5d0 4. 导出卷组 vgchange-a n vg01 vgexport-m vg01.map /dev/vg01 -- 本机移动 #vgchange -a n /dev/vg01 # vgexport/dev/vg01 移动硬盘位置 #vgscan -v # mkdir/dev/vg01 # mknod/dev/vg01/group c 64 0x040000 #vgimport /dev/vg01 /dev/dsk/c1t0d0 #vgchange –a y /dev/vg01 -- 跨机移动 #vgchange -a n /dev/vg_planning -- 先备份 # vgexport-p -s -m /tmp/vg_exp_bak/vg_plan.map /dev/vg_planning # vgexport/dev/vg_planning #ioscan -fun -C disk # mkdir/dev/vg_planning # mknod/dev/vg_planning/group c 64 0x040000 #vgimport -v -m plan_map /dev/vg_planning #vgchange -a y /dev/vg_planning -m参数作用:map文件到底有什么用处?什么情况下需要它?如果你export的时候没有生成map文件,当你import的时候lv的名字就变成了default的 lvol1、lvol2....原来的名字将丢失。在配置MC的时候,我们在一台主机上配置好vg,lv,fs之后,export到其他的节点上即可。 -s参数作用:export导出的map文件信息是vgcfgbackup导出信息的一部分,vgcfgbackup所导出的包含了更多的卷组信息的二进制文件。 /var/adm/syslog/syslog.log 1. 查看集群和包状态 jsalarm1[/]#cmviewcl-v #cmruncl 启动集群cluster #cmhaltcl 关闭集群cluster #cmrunnode #cmhaltnode-f #cmrunpkg #cmhaltpkg #cmviewcl-v 查看cluster和cluster里节点的信息 #cmmodpkg-e #cmmonpkg-e -n 人为切换。有两种办法来实现 方法1: 使用cmhaltnode 命令来停止运行有程序包的节点。 cmhaltnode-f nodename 这样,在停止该节点的同时,应用程序包会自动迁移到另外一个节点上。 方法2: 使用cmhaltpkg 来启停应用程序包 1) 用 cmhaltpkg 命令来停止需要迁移的程序包 cmhaltpkgpkgname 2) 用cmrunpkg 来重新在别的节点上运行程序包 cmrunpkg-n nodename pkgname 3) 用 cmmodpkg 修改切换属性 cmmodpkg-e pkgname 1. 查看特定补丁包 /usr/sbin/swlist-l patch | grep 33498 /usr/sbin/swlist-l bundle | grep "GOLD" 2. 查看替代补丁包 /usr/sbin/swlist-l patch -a supersedes | grep -in 31097 1. 查看硬件整体情况 /opt/ignite/bin/print_manifest 2. 查看内存 /usr/sbin/swapinfo-atm Mb Mb Mb PCT START/ Mb TYPE AVAIL USED FREE USED LIMIT RESERVE PRI NAME dev 4096 832 3264 20% 0 - 1 /dev/vg00/lvol2 dev 36864 1071 35793 3% 0 - 1 /dev/vg00/lvswap reserve - 36115 -36115 memory 75619 30980 44639 41% total 116579 68998 47581 59% - 0 - 1. 查看内核参数 11.11命令: hajf2#[/]kmtune |egrep "shmmni|shmseg|msgmnb|msgmni|msgtql" msgmnb 4194304 Y 4194304 msgmni 32020 - 32020 msgtql 20000 - 20000 shmmni 8192 - 8192 shmseg 3072 Y 3072 11.23,11.31命令: $/usr/sbin/kctune /usr/sbin/sysdef $/usr/sbin/kctune -v nfile Tunable nfile Description Maximum number of file descriptors(system-wide) Module fs_filedscrp CurrentValue 0 [Default] Valueat Next Boot 0 [Default] Valueat Last Boot 0 DefaultValue 0 Constraints nfile == 0 or nfile >= 2048 nfile <= 2147483647 CanChange Immediately or at NextBoot 文件系统缓存大小,在11.23版本中参数为: dbc_max_pct dbc_min_pct 在11.11版本中参数为: filecache_max filecache_min 2. 查看下次启动是否有内核参数变动 11.31命令: /usr/sbin/kconfig-D # sam KernelConfiguration -> Configurable Parameters -> 选中相应的参数,在Action菜单中 ModifyConfigurable Parameter... 全部修改完毕后,在Action菜单中 ProcessNew Kernel 系统将会重启 /usr/sbin/kctune-h –B nproc=7000 以下是调整参数为默认值 kctunenproc= kctune nproc=default shutdown-y -h 0 关闭系统,然后关闭电源 shutdown-y -r 关闭系统后重新启动 -- 识别新硬盘 # ioscan-fCdisk ipcs -a或ipc 显示当前系统中共享内存段、信号量集、消息队列的使用情况; ipcs -m显示共享内存段的使用情况; ipcs -s显示信号量集的使用情况; ipcs -q显示消息队列的使用情况; ipcrm-s semid 删除对应的信号量集 ipcrm-m shmid 删除对应的共享内存段 ipcrm-q msqid 删除对应的消息队列 # ./kmeminfo-arena|more -- 内存信息 # ./kmeminfo-u -- 进程内存使用信息 # pmap18174 -- 查看进程使用的库文件 1、查看tusc是否安装在默认目录 引用 # whichtusc 2、下载tusc http://hpux.connect.org.uk/hppd/hpux/Sysadmin/tusc-8.0/ 3、上传至服务器,并将其解压 引用 #gunzip tusc-8.0-ia64-11.23.depot.gz # lltusc-8.0-ia64-11.23.depot* -rw-r–r– 1 root sys 983040 Jun 13 15:55tusc-8.0-ia64-11.23.depot 4、对于depot文件有两种安装方法推荐用第一种 1)使用swinstall安装 引用 #swinstall-s rx2600:/tmp/tusc-8.0-ia64-11.23.depot 2)直接解压安装 引用 #tarxvf /tmp/tusc-8.0-ia64-11.23.depot 5、安装完成之后即可调用tusc进行进程调度跟踪 引用 #/tmp/tusc/tusc-RUN/usr/local/bin/tusc-aefo /tmp/tusc.out sqlplus “/as sysdba” (1) #/usr/sbin/setprivgrp dba MLOCK (2) #vi /etc/privgroup dbaMLOCK RTSCHED RTPRIO 查看是否启用异步IO # fuser/dev/async 1. pv管理 #pvcreate /dev/sda #pvdisplay # pvremove/dev/sdb1 2. vg管理 #vgcreate oravg /dev/sda #vgcreate -s 32m oravg /dev/sda --创建指定extent大小的的vg #vgdisplay #vgremove oravg # vgchange-a n vg01 LV的LV Status为“NOT available” #vgchange -a y vg01 LV的LV Status为“available” 3. vg扩容 # vgextendVolGroup00 /dev/sdb -- 将新pv添加到vg中 #vgreduce VolGroup00 /dev/sdb --从vg中去除掉pv 4. lv管理 #lvcreate -l 25599 -n lvoradata01 oravg --创建lv #lvdisplay #lvremove -f /dev/oravg/lvoradata01 --删除lv # lvchange-an /dev/mapper/VolGroup01-LogVol00 --lv置为不激活 # lvcreate-m 1 -l 2 -n lv_01_control02 vg01 --lv镜像 # lvconvert-m 1 vg01/lv_01_control01 --非镜像的LV转为镜像 条带化: -i:此处写lv用到的pv的数量,不能超过所在vg的pv数量,一般设置与vg的pv个数相同 -I:条带单元大小,单位Kb -L:lv的大小,默认为Mb,可带单位G,M,K -l:小写L,分配给lv的PE个数,对应VG中的PE,在上条vgdisplay的输出中可看到VG中一共有1533个PE。 -n:自定义lv的名字,默认从lvol0开始往下排。 # lvcreate-i 3 -I 64 -l 2 -n lv_01_control03 vg01 查看LV # lvmlvscan ACTIVE '/dev/VGExaDb/LVDbSys1' [30.00 GB]inherit ACTIVE '/dev/VGExaDb/LVDbSwap1' [24.00 GB]inherit ACTIVE '/dev/VGExaDb/LVDbOra1' [100.00 GB]inherit ACTIVE '/dev/VGExaDb/LVDbSys2' [30.00 GB]inherit 5. 文件系统管理 #mkfs.ext3 /dev/oravg/lvoradata01 --格式化文件系统 1. 跟踪1: [oracle@sybase~]$ ps -ef | grep dbwr oracle 3442 3373 0 13:30 pts/6 00:00:00 grep dbwr [oracle@sybase~]$ strace -p 3295 2>&1|grep -v time Process3295 attached - interrupt to quit 在另一个会话 SQL>alter diskgroup data mount; Diskgroupaltered. strace的跟踪结果: [oracle@sybase~]$ strace -p 3295 2>&1|grep -v time Process3295 attached - interrupt to quit semctl(360448,15, SETVAL, 0x1) = 0 statfs("/dev/raw/raw4",{f_type=0x1021994, f_bsize=4096, f_blocks=128360, f_bfree=128330,f_bavail=128330, f_files=128360, f_ffree=128079, f_fsid={0, 0}, f_namelen=255,f_frsize=4096}) = 0 open("/dev/raw/raw4",O_RDWR|O_NONBLOCK|O_SYNC) = 15 fcntl(15,F_SETFD, FD_CLOEXEC) = 0 io_submit(46912530558976,1, {{0x2aaaac6962b8, 0, 1, 0, 15}}) = 1 io_getevents(46912530558976,1, 1024, {{0x2aaaac6962b8, 0x2aaaac6962b8, 4096, 0}}, {1, 40000000}) = 1 io_submit(46912530558976,1, {{0x2aaaac6962b8, 0, 1, 0, 15}}) = 1 io_getevents(46912530558976,1, 1024, {{0x2aaaac6962b8, 0x2aaaac6962b8, 4096, 0}}, {1, 40000000}) = 1 open("/proc/3289/stat",O_RDONLY) = 20 read(20,"3289 (oracle) S 1 3289 3289 0 -1"..., 999) = 221 [oracle@sybasefd]$ ll 总计 0 lr-x------1 oracle oinstall 64 01-31 13:26 0 -> /dev/null lr-x------1 oracle oinstall 64 01-31 13:26 1 -> /dev/null lr-x------1 oracle oinstall 64 01-31 13:26 10 -> /dev/zero lr-x------1 oracle oinstall 64 01-31 13:26 11 -> /dev/zero lr-x------1 oracle oinstall 64 01-31 13:26 12 ->/oracle/app/oracle/product/10.2.0/db_1/rdbms/mesg/oraus.msb lrwx------1 oracle oinstall 64 01-31 13:26 13 ->/oracle/app/oracle/product/10.2.0/db_1/dbs/hc_+ASM.dat lrwx------1 oracle oinstall 64 01-31 13:26 14 ->/oracle/app/oracle/product/10.2.0/db_1/dbs/lk+ASM lrwx------1 oracle oinstall 64 01-31 13:26 15 -> /dev/raw/raw4 “2>&1”的意思 在shell中,文件描述符通常是:STDIN,STDOUT,STDERR,即:0,1,2, 在输出过程中产生的错误信息也放在了STDOUT,即:1中 2. 跟踪2: [oracle@owb~]$ strace -cp 5561 Process5561 attached - interrupt to quit Process5561 detached %time seconds usecs/call calls errors syscall ----------------- ----------- --------- --------- ---------------- 41.83 0.001021 204 5 read 20.81 0.000508 508 1 shmdt 14.83 0.000362 91 4 write 6.39 0.000156 4 41 getrusage 6.19 0.000151 14 11 close 4.71 0.000115 38 3 munmap 3.48 0.000085 4 20 times 1.27 0.000031 2 18 rt_sigaction 0.33 0.000008 8 1 1 open 0.16 0.000004 2 2 rt_sigprocmask ----------------- ----------- --------- --------- ---------------- 100.00 0.002441 106 1 total [root@sybase~]# /usr/sbin/sestatus -v SELinuxstatus: disabled [root@sybase~]# service iptables status 防火墙已停 [oracle@sybasebdump]$ pstack 3866 #1 0x0000000003d8b862 in sskgpwwait () #2 0x0000000003d6e2ba in skgpwwait () #3 0x00000000007acfc7 in kslges () -- 获取latch的过程 #4 0x00000000007ac930 in kslgetl () #5 0x00000000023d2e07 in kqrpre1 () #6 0x00000000023d2c8b in kqrpre () #7 0x00000000030a3386 in seqcre () #8 0x00000000030e62af in opiexe () #9 0x0000000003692c7e in opiosq0 () #100x00000000032270e4 in kpooprx () #110x00000000032252a7 in kpoal8 () #120x00000000013b2a60 in opiodr () #130x0000000003c3043a in ttcpip () #140x00000000013adf94 in opitsk () #150x00000000013b0f3c in opiino () #160x00000000013b2a60 in opiodr () #170x00000000013a4148 in opidrv () #180x0000000001f9a7de in sou2o () #190x000000000072d26b in opimai_real () #200x000000000072d1bc in main () [oracle@sybase ~]$ gdb $ORACLE_HOME/bin/oracle break bt --打印堆栈 clear infobreak -- 查看断点 frame 1 -- 切换到1号帧,也可以使用up或down命令 infolocals -- 在当前帧中打印局部变量 infoframe -- 获取有关帧的进一步信息,如它的地址和程序语言 /proc/sys/net/ipv4/ -- 网络参数目录 /proc/sys/vm --虚拟内存目录 vm.lower_zone_protection= 400 -- 保护低内存 --分配内存时优先考虑从highmemory分配 vm.min_free_kbytes= 204800 -- 保留的内存 200M --值越高,内核越早开始回收内存,空闲内存越高 vm.swappiness = 40 --设置内存不倾向用swap,默认是60 echo 1> /proc/sys/vm/drop_caches -- 可以暂时清空cache和buffer echo"0" > /proc/sys/vm/oom-kill --关闭oom-kill grep"CONFIG_MAGIC_SYSRQ" /boot/config-`uname -r` -- 验证系统是否支持sysrq kernel.sysrq= 1 --启动sysrq echo"b"> /proc/sysrq-trigger --直接重启系统 Alt +PrintScreen + [Command] -- 在终端上 安全重启的方法如下: R-E-I-S-U-B,注意不要一气呵成地按下这六个键 R - 把键盘设置为 ASCII 模式 R – 1 秒 E 会向除 init 以外所有进程发送可捕获的 SIGTERM 信号 E – 30 秒 I - 向除 init 以外所有进程发送 SIGKILL 信号 I – 10 秒 S - 磁盘缓冲区同步 S– 5 秒 U - 重新挂载为只读模式 U– 5 秒 B - 立即重启系统 T - 打印进程列表 W - 打印 CPU 信息 free–lm iostat-x -d -m 2 iostat-x -d -m 2 /dev/sda* ls -l--color=never # 取消颜色闪烁 #partprobe # kernel重新读取分区信息,避免重启系统 #netstat -ant|awk '$6 ~ /[A-Z]$/{a[$6]++}END{for(i in a)print i,a[i]}' LISTEN1 ESTABLISHED2 Foreign1 -- 检查网络流量 sar -nDEV 4 100 IFACE:LAN接口 rxpck/s:每秒钟接收的数据包 txpck/s:每秒钟发送的数据包 rxbyt/s:每秒钟接收的字节数 txbyt/s:每秒钟发送的字节数 raw –qa #sosreport 1、查看内存槽数、那个槽位插了内存,大小是多少 #dmidecode|grep -P -A5 "Memory\s+Device"|grep Size|grep -v Range 2、查看最大支持内存数 #dmidecode|grep -P 'Maximum\s+Capacity' 3、查看槽位上内存的速率,没插就是unknown。 # dmidecode|grep-A16 "Memory Device"|grep 'Speed' pkginfo-l --显示长格式 pkginfo -- 显示短格式 pkginfo-i SUNWarc SUNWbtoolSUNWlibC pkgchk-l -p /usr/bin/showrev --查看文件属于哪个包 pkgchk-l SUNWexplo|grep "^Pathname" --查看包中的所有文件 --Solaris 11 使用以下命令在系统信息库中为软件包编写目录并更新搜索索引 # pkgset-publisher -G '*' -g /media/SOL11REPO_FULL/repo solaris 使用在线repo # pkgset-publisher -Pe -O http://pkg.oracle.com/solaris/release solaris 检查repo # pkgpublisher PUBLISHER TYPE STATUS URI solaris origin online file:///media/SOL11REPO_FULL/repo/ -G'*' 删除solaris 发布者的所有现有源。 -g 将新建的本地系统信息库的 URI 添加为 solaris 发布者的新源 安装桌面 # pkginstall solaris-desktop showrev -p prstat-J 例:监控某个进程 prstat-Lp 21501 dispadmin-l -- 识别硬件 # devfsadm -- 查看硬盘 #iostat –En -- 查看内核 #isainfo sparcv9sparc -- 查看周边设备 prtconf --查看内核模式 #isainfo -b 64 -- 查看CPU cd/usr/pl*/sun4u/sbin/prtdiag -v |more -- 查看内存 #prtconf | grep 'Memory' Memorysize: 32768 Megabytes # shutdown-i5 -g0 -y 可以查看进程处理的记录数。
列出当天以来处理的有关gg.test表的所有记录数。
需要通过查看抽取进程参数文件来得到,如下:
1.3 进程说明
1.3.1业务系统侧部署说明
1.3.2数据中心侧部署说明
单表重新同步方案
新增或者删除表方案
1.1 删除表
1.2 新增表
源系统表结构变更说明
主机操作系统
AIX操作系统
HACMP管理
修改boot IP
修改Service IP
删除HACMP配置
AIX 5.3内核参数设置
AIX5.3文件系统
检测修复文件系统
AIO
truss跟踪
硬件查看
识别硬件
错误查看
查看补丁
资源检测
HP-UX操作系统
HP-UX 文件系统
卷组管理
操作系统日志
SC/MC管理
补丁管理
查看系统硬件
内核参数
调整内核参数
关机命令
识别硬件
共享内存段管理
查看内存情况
安装TUSC
异步IO设置
LINUX
1.1. Linux文件系统
1.2. Strace跟踪
1.3. 检查SeLinux状态
1.4. 查看堆栈
1.5. 内核参数:
1.6. sysrq 指令
1.7. 内存管理
1.8. 检查IO
1.9. 操作技巧
1.10. 检查网络状态
1.11. 祼设备管理
1.12. 搜集硬件报错
Solaris
1.13. 软件包管理
1.14. 补丁管理
1.15. 监控CPU
1.16. 系统调度类
1.17. 识别硬件
1.18. 关机断电
你可能感兴趣的:(Oracle,oracle)