重启后,两节点的数据库可以正常启动了,但是数据文件的坏块还存在
SQL> run
1* select name,status from v$datafile
NAME STATUS
-------------------- --------------------
/dev/rlv_system_8g SYSTEM
/dev/rlv_undot11_8g ONLINE
/dev/rlv_sysaux_8g ONLINE
/dev/rlv_user_8g ONLINE
/dev/rlv_undot12_8g ONLINE
/dev/rlv_raw37_16g RECOVER
在 zhyw2上做恢复操作,对这个坏块尝试恢复:
SQL> recover datafile '/dev/rlv_raw37_16g';
ORA-00279: change 11318004822236 generated at 08/13/2010 16:42:39 needed for
thread 2
ORA-00289: suggestion : /arch2/bsp1922_2_229_713969898.arc
ORA-00280: change 11318004822236 for thread 2 is in sequence #229
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
auto
ORA-00279: change 11321028506146 generated at 08/17/2010 09:42:36 needed for
thread 2
ORA-00289: suggestion : /arch2/bsp1922_2_230_713969898.arc
ORA-00280: change 11321028506146 for thread 2 is in sequence #230
ORA-00278: log file '/arch2/bsp1922_2_229_713969898.arc' no longer needed for
this recovery
Log applied.
Media recovery complete.
数据文件成功被修复了,这时,它处于offline状态,我敲入下面的命令,把它恢复到online状态:
alter database datafile '/dev/rlv_raw37_16g' online;
客户反映临时表空间老是报空间不足的错误,让我乘着这次停机,也帮他看看,
我看了下,确认当前oravg7上是否有空闲空间,
[root@zhyw2]#lsvg -l oravg7
oravg7:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
lv_raw93_16g raw 512 512 1 open/syncd N/A
lv_raw94_16g raw 512 512 1 open/syncd N/A
lv_raw95_16g raw 512 512 1 open/syncd N/A
lv_raw96_16g raw 512 512 1 open/syncd N/A
lv_raw97_16g raw 512 512 1 closed/syncd N/A
准备使用 lv_raw97_16g 这个lv 做为新增的temp 表空间的一个文件
查看两节点该lv对应的权限:
权限正常
[root@zhyw2]#cd /dev/
[root@zhyw2]#ls -l *raw97*
brw-rw---- 1 root system 106, 5 Mar 26 05:43 lv_raw97_16g
crw-rw---- 1 oracle dba 106, 5 Mar 26 05:43 rlv_raw97_16g
[root@zhyw2]#
[root@zhyw1]#ls -l *raw97*
brw-rw---- 1 root system 106, 5 Mar 16 14:49 lv_raw97_16g
crw-rw-r-- 1 oracle oinstall 106, 5 Mar 16 14:49 rlv_raw97_16g
使用oracle 命令增加temp 表空间
alter tablespace temp add tempfile '/dev/rlv_raw97_16g' size 15872m;
所有的活都干完了,
等黄工他们把业务起起来,确认了应用没有问题之后,
我又在客户现场等待了一段时间。一直状态正常。
我想,我终于可以回家睡个好觉了。
下午4点多,正在迷迷糊糊中,又接到了丁工的电话:“程工,不好意思,我们的生产库又出问题了,他们业务反映查询很慢,你还是过来再看看吧." 哎,看来真是好事多磨呀,这回又有什么问题在等着我呢?(未完待续)
到客户现场后,应用开发商负责的黄工已经在等着我了,他告诉我现在的数据库很不正常,虽然没有出现错误信息但是数据库一做sort操作,就变得很慢。而且一些应用的查询语句频繁的发生错误。
黄工很怀疑EMC的存储还是有坏块问题,并严重怀疑temp 表空间对应的lv 下有坏块。我查看了下当前两节点的IO使用率,感觉还比较正常,再做了下节点的awr report,
信息如下:
zhyw2:
命中率:
Buffer Nowait %: 99.95 Redo NoWait %: 100.00
Buffer Hit %: 99.90 In-memory Sort %: 100.00
Library Hit %: 79.74 Soft Parse %: 65.09
Execute to Parse %: 60.45 Latch Hit %: 98.06
Parse CPU to Parse Elapsd %: 8.98 % Non-Parse CPU: 99.68
等待事件
Top 5 Timed Events
Event Waits Time(s) Avg Wait(ms) % Total Call Time Wait Class
CPU time 1,335,611 68.5
enq: CF - contention 503,175 239,024 475 12.3 Other
enq: TS - contention 388,571 175,924 453 9.0 Other
library cache lock 74,915 31,273 417 1.6 Concurrency
gc cr multi block request 162,634,284 29,927 0 1.5 Cluster
zhyw1:
Buffer Nowait %: 100.00 Redo NoWait %: 100.00
Buffer Hit %: 99.87 In-memory Sort %: 100.00
Library Hit %: 80.86 Soft Parse %: 61.23
Execute to Parse %: 64.13 Latch Hit %: 98.09
Parse CPU to Parse Elapsd %: 12.10 % Non-Parse CPU: 99.56
等待事件:
Top 5 Timed Events
Event Waits Time(s) Avg Wait(ms) % Total Call Time Wait Class
CPU time 942,235 40.9
enq: CF - contention 847,042 404,685 478 17.6 Other
enq: DX - contention 71,048 208,067 2,929 9.0 Other
enq: TS - contention 140,442 63,747 454 2.8 Other
inactive transaction branch 35,518 34,682 976 1.5 Other
我发现,群集实例library cache命中率偏低。
在全局等待事件中,锁资源等待比较突出。
我看了下两个节点的SGA_MAX_SIZE ,SGA_TARGET。他们的当前值都是16G。
考虑到当前的RAC环境下,单节点数据库服务器的内存就有72G,且平时空闲率>40%,可以考虑多划分一点出来给SGA。
我怀疑因为最近业务量增长比较大,系统原有的一些资源也出现了瓶颈,所以才出现了访问缓慢的情况。
但是鉴于 EMC DMX1500之前出过错误,它的影响了系统IO 的可能性也是有的,所以我先建议EMC 工程师检查下划给此套RAC系统的disk 资源是否有坏块。
使用下面的命令,查看磁盘的信息:
# ./symdev list pd
Symmetrix ID: 隐藏
Device Name Directors Device
--------------------------- ------------- -------------------------------------
Cap
Sym Physical SA :P DA :IT Config Attribute Sts (MB)
--------------------------- ------------- -------------------------------------
0022 /dev/rhdisk3 08A:1 01A:C0 2-Way Mir N/Grp'd VCM WD 3
0029 /dev/rhdiskpower0 08A:1 16B:D1 2-Way Mir N/Grp'd RW 3
002A /dev/rhdiskpower1 08A:1 01A:C2 2-Way Mir N/Grp'd RW 3
0059 /dev/rhdiskpower64 09A:1 16B:C2 2-Way Mir N/Grp'd RW 3
005A /dev/rhdiskpower65 09A:1 01C:C2 2-Way Mir N/Grp'd RW 3
0084 /dev/rhdiskpower2 08A:1 16A:D0 2-Way Mir N/Grp'd RW 1024
0085 /dev/rhdiskpower3 08A:1 16B:D3 2-Way Mir N/Grp'd RW 1024
013A /dev/rhdiskpower4 08A:1 16B:CE RDF1+Mir Grp'd (M) RW 49140
013E /dev/rhdiskpower5 08A:1 01C:D5 RDF1+Mir Grp'd (M) RW 49140
0142 /dev/rhdiskpower6 08A:1 16C:D6 RDF1+Mir Grp'd (M) RW 49140
0146 /dev/rhdiskpower7 08A:1 01C:D7 RDF1+Mir Grp'd (M) RW 49140
014A /dev/rhdiskpower8 08A:1 16C:D8 RDF1+Mir Grp'd (M) RW 49140
014E /dev/rhdiskpower9 08A:1 16A:C3 RDF1+Mir Grp'd (M) RW 49140
0152 /dev/rhdiskpower10 08A:1 01A:C2 RDF1+Mir Grp'd (M) RW 49140
0156 /dev/rhdiskpower11 08A:1 16A:C1 RDF1+Mir Grp'd (M) RW 49140
015A /dev/rhdiskpower12 08A:1 01A:D9 RDF1+Mir Grp'd (M) RW 49140
015E /dev/rhdiskpower13 08A:1 16A:D8 RDF1+Mir Grp'd (M) RW 49140
0162 /dev/rhdiskpower14 08A:1 01A:DB RDF1+Mir Grp'd (M) RW 49140
0166 /dev/rhdiskpower15 08A:1 16A:DA RDF1+Mir Grp'd (M) RW 49140
016A /dev/rhdiskpower16 08A:1 01A:D5 RDF1+Mir Grp'd (M) RW 49140
016E /dev/rhdiskpower17 08A:1 16A:D4 RDF1+Mir Grp'd (M) RW 49140
0172 /dev/rhdiskpower18 08A:1 01A:D7 RDF1+Mir Grp'd (M) RW 49140
0176 /dev/rhdiskpower19 08A:1 16A:D6 RDF1+Mir Grp'd (M) RW 49140
017A /dev/rhdiskpower20 08A:1 16C:C3 RDF1+Mir Grp'd (M) RW 49140
017E /dev/rhdiskpower21 08A:1 01C:C2 RDF1+Mir Grp'd (M) RW 49140
0182 /dev/rhdiskpower22 08A:1 16C:C1 RDF1+Mir Grp'd (M) RW 49140
0186 /dev/rhdiskpower23 08A:1 01C:C0 RDF1+Mir Grp'd (M) RW 49140
018A /dev/rhdiskpower24 08A:1 16C:D0 RDF1+Mir Grp'd (M) RW 49140
018E /dev/rhdiskpower25 08A:1 16C:DA RDF1+Mir Grp'd (M) RW 49140
0192 /dev/rhdiskpower26 08A:1 01C:D9 RDF1+Mir Grp'd (M) RW 49140
0196 /dev/rhdiskpower27 08A:1 16C:DC RDF1+Mir Grp'd (M) RW 49140
019A /dev/rhdiskpower28 08A:1 01C:DB RDF1+Mir Grp'd (M) RW 49140
019E /dev/rhdiskpower29 08A:1 16B:CE RDF1+Mir Grp'd (M) RW 49140
01A2 /dev/rhdiskpower30 08A:1 01C:D5 RDF1+Mir Grp'd (M) RW 49140
01A6 /dev/rhdiskpower31 08A:1 16C:D6 RDF1+Mir Grp'd (M) RW 49140
01AA /dev/rhdiskpower32 08A:1 01C:D7 RDF1+Mir Grp'd (M) RW 49140
01AE /dev/rhdiskpower33 08A:1 16C:D8 RDF1+Mir Grp'd (M) RW 49140
01B2 /dev/rhdiskpower34 08A:1 16A:C3 RDF1+Mir Grp'd (M) RW 49140
01B6 /dev/rhdiskpower35 08A:1 01A:C2 RDF1+Mir Grp'd (M) RW 49140
01BA /dev/rhdiskpower36 08A:1 16A:C1 RDF1+Mir Grp'd (M) RW 49140
01BE /dev/rhdiskpower37 08A:1 01A:D9 RDF1+Mir Grp'd (M) RW 49140
01C2 /dev/rhdiskpower38 08A:1 16A:D8 RDF1+Mir Grp'd (M) RW 49140
01C6 /dev/rhdiskpower39 08A:1 01A:DB RDF1+Mir Grp'd (M) RW 49140
01CA /dev/rhdiskpower40 08A:1 16A:DA RDF1+Mir Grp'd (M) RW 49140
01CE /dev/rhdiskpower41 08A:1 01A:D5 RDF1+Mir Grp'd (M) RW 49140
01D2 /dev/rhdiskpower42 08A:1 16A:D4 RDF1+Mir Grp'd (M) RW 49140
01D6 /dev/rhdiskpower43 08A:1 01A:D7 RDF1+Mir Grp'd (M) RW 49140
01DA /dev/rhdiskpower44 08A:1 16A:D6 RDF1+Mir Grp'd (M) RW 49140
01DE /dev/rhdiskpower45 08A:1 16C:C3 RDF1+Mir Grp'd (M) RW 49140
01E2 /dev/rhdiskpower46 08A:1 01C:C2 RDF1+Mir Grp'd (M) RW 49140
01E6 /dev/rhdiskpower47 08A:1 16C:C1 RDF1+Mir Grp'd (M) RW 49140
01EA /dev/rhdiskpower48 08A:1 01C:C0 RDF1+Mir Grp'd (M) RW 49140
01EE /dev/rhdiskpower49 08A:1 16C:D0 RDF1+Mir Grp'd (M) RW 49140
01F2 /dev/rhdiskpower50 08A:1 16C:DA RDF1+Mir Grp'd (M) RW 49140
01F6 /dev/rhdiskpower51 08A:1 01C:D9 RDF1+Mir Grp'd (M) RW 49140
01FA /dev/rhdiskpower52 08A:1 16C:DC RDF1+Mir Grp'd (M) RW 49140
01FE /dev/rhdiskpower53 08A:1 01C:DB RDF1+Mir Grp'd (M) RW 49140
0202 /dev/rhdiskpower54 08A:1 16B:CE RDF1+Mir Grp'd (M) RW 49140
0206 /dev/rhdiskpower55 08A:1 01C:D5 RDF1+Mir Grp'd (M) RW 49140
020A /dev/rhdiskpower56 08A:1 16C:D6 RDF1+Mir Grp'd (M) RW 49140
020E /dev/rhdiskpower57 08A:1 01C:D7 RDF1+Mir Grp'd (M) RW 49140
0212 /dev/rhdiskpower58 08A:1 16C:D8 RDF1+Mir Grp'd (M) RW 49140
0216 /dev/rhdiskpower59 08A:1 16A:C3 RDF1+Mir Grp'd (M) RW 49140
021A /dev/rhdiskpower60 08A:1 01A:C2 RDF1+Mir Grp'd (M) RW 49140
021E /dev/rhdiskpower61 08A:1 16A:C1 RDF1+Mir Grp'd (M) RW 49140
0222 /dev/rhdiskpower62 08A:1 01A:D9 RDF1+Mir Grp'd (M) RW 49140
0226 /dev/rhdiskpower63 08A:1 16A:D8 RDF1+Mir Grp'd (M) RW 49140
我建议EMC的王工检查 ID 为 0022-0226之间的磁盘坏块情况。
同时,当前此套数据库已经搭建好了 从DMX 1500到 DMX950的SRDF关系,但是还没有同步。
因此有必要把他们之间的底层同步启动起来。保证业务数据的安全。我建议王工把SRDF跑起来。
我们选择了下面的方式启动SRDF关系:
EMC工程师做了下面的操作,检查相关信息:
# ./symcfg disc
This operation may take up to a few minutes. Please be patient...
[root@zhyw1]#./symdg list
D E V I C E G R O U P S
Number ofName Type Valid Symmetrix ID Devs GKs BCVs VDEVs TGTs
zhyw_1500_950 RDF1 Yes 隐藏 60 0 0 0 0
[root@zhyw1]#./symrdf -g zhyw_1500_950 query |more
Device Group (DG) Name : zhyw_1500_950
DG's Type : RDF1
DG's Symmetrix ID : 隐藏 (Microcode Version: 5773)
Remote Symmetrix ID : 000290301387 (Microcode Version: 5773)
RDF (RA) Group Number : 2 (01)
Source (R1) View Target (R2) View MODES
-------------------------------- ------------------------ ----- ------------
ST LI ST
Standard A N A
Logical T R1 Inv R2 Inv K T R1 Inv R2 Inv RDF Pair
Device Dev E Tracks Tracks S Dev E Tracks Tracks MDA STATE
-------------------------------- -- ------------------------ ----- ------------
DEV001 013A RW 0 786240 NR 02EB WD 0 786240 C.D Suspended
DEV002 013E RW 0 786240 NR 02EF WD 0 786240 C.D Suspended
DEV003 0142 RW 0 786240 NR 02F3 WD 0 786240 C.D Suspended
DEV004 0146 RW 0 786240 NR 02F7 WD 0 786240 C.D Suspended
DEV005 014A RW 0 786240 NR 02FB WD 0 786240 C.D Suspended
DEV006 014E RW 0 786240 NR 02FF WD 0 786240 C.D Suspended
DEV007 0152 RW 0 786240 NR 0303 WD 0 786240 C.D Suspended
DEV008 0156 RW 0 786240 NR 0307 WD 0 786240 C.D Suspended
DEV009 015A RW 0 786240 NR 030B WD 0 786240 C.D Suspended
DEV010 015E RW 0 786240 NR 030F WD 0 786240 C.D Suspended
DEV011 0162 RW 0 786240 NR 0313 WD 0 786240 C.D Suspended
DEV012 0166 RW 0 786240 NR 0317 WD 0 786240 C.D Suspended
DEV013 016A RW 0 786240 NR 031B WD 0 786240 C.D Suspended
DEV014 016E RW 0 786240 NR 031F WD 0 786240 C.D Suspended
DEV015 0172 RW 0 786240 NR 0323 WD 0 786240 C.D Suspended
DEV016 0176 RW 0 786240 NR 0327 WD 0 786240 C.D Suspended
DEV017 017A RW 0 786240 NR 032B WD 0 786240 C.D Suspended
DEV018 017E RW 0 786240 NR 032F WD 0 786240 C.D Suspended
DEV019 0182 RW 0 786240 NR 0333 WD 0 786240 C.D Suspended
DEV020 0186 RW 0 786240 NR 0337 WD 0 786240 C.D Suspended
DEV021 018A RW 0 786240 NR 033B WD 0 786240 C.D Suspended
DEV022 018E RW 0 786240 NR 033F WD 0 786240 C.D Suspended
DEV023 0192 RW 0 786240 NR 0343 WD 0 786240 C.D Suspended
DEV024 0196 RW 0 786240 NR 0347 WD 0 786240 C.D Suspended
DEV025 019A RW 0 786240 NR 034B WD 0 786240 C.D Suspended
DEV026 019E RW 0 786240 NR 034F WD 0 786240 C.D Suspended
DEV027 01A2 RW 0 786240 NR 0353 WD 0 786240 C.D Suspended
DEV028 01A6 RW 0 786240 NR 0357 WD 0 786240 C.D Suspended
DEV029 01AA RW 0 786240 NR 035B WD 0 786240 C.D Suspended
DEV030 01AE RW 0 786240 NR 035F WD 0 786240 C.D Suspended
DEV031 01B2 RW 0 786240 NR 0363 WD 0 786240 C.D Suspended
DEV032 01B6 RW 0 786240 NR 0367 WD 0 786240 C.D Suspended
DEV033 01BA RW 0 786240 NR 036B WD 0 786240 C.D Suspended
DEV034 01BE RW 0 786240 NR 036F WD 0 786240 C.D Suspended
DEV035 01C2 RW 0 786240 NR 0373 WD 0 786240 C.D Suspended
DEV036 01C6 RW 0 786240 NR 0377 WD 0 786240 C.D Suspended
DEV037 01CA RW 0 786240 NR 037B WD 0 786240 C.D Suspended
DEV038 01CE RW 0 786240 NR 037F WD 0 786240 C.D Suspended
DEV039 01D2 RW 0 786240 NR 0383 WD 0 786240 C.D Suspended
DEV040 01D6 RW 0 786240 NR 0387 WD 0 786240 C.D Suspended
DEV041 01DA RW 0 786240 NR 038B WD 0 786240 C.D Suspended
DEV042 01DE RW 0 786240 NR 038F WD 0 786240 C.D Suspended
DEV043 01E2 RW 0 786240 NR 0393 WD 0 786240 C.D Suspended
DEV044 01E6 RW 0 786240 NR 0397 WD 0 786240 C.D Suspended
DEV045 01EA RW 0 786240 NR 039B WD 0 786240 C.D Suspended
DEV046 01EE RW 0 786240 NR 039F WD 0 786240 C.D Suspended
DEV047 01F2 RW 0 786240 NR 03A3 WD 0 786240 C.D Suspended
DEV048 01F6 RW 0 786240 NR 03A7 WD 0 786240 C.D Suspended
DEV049 01FA RW 0 786240 NR 03AB WD 0 786240 C.D Suspended
DEV050 01FE RW 0 786240 NR 03AF WD 0 786240 C.D Suspended
DEV051 0202 RW 0 786240 NR 03B3 WD 0 786240 C.D Suspended
DEV052 0206 RW 0 786240 NR 03B7 WD 0 786240 C.D Suspended
DEV053 020A RW 0 786240 NR 03BB WD 0 786240 C.D Suspended
DEV054 020E RW 0 786240 NR 03BF WD 0 786240 C.D Suspended
DEV055 0212 RW 0 786240 NR 03C3 WD 0 786240 C.D Suspended
DEV056 0216 RW 0 786240 NR 03C7 WD 0 786240 C.D Suspended
DEV057 021A RW 0 786240 NR 03CB WD 0 786240 C.D Suspended
DEV058 021E RW 0 786240 NR 03CF WD 0 786240 C.D Suspended
DEV059 0222 RW 0 786240 NR 03D3 WD 0 786240 C.D Suspended
DEV060 0226 RW 0 786240 NR 03D7 WD 0 786240 C.D Suspended
Total -------- -------- -------- --------
Track(s) 0 47174400 0 47174400
MB(s) 0 2948400 0 2948400
Legend for MODES:
M(ode of Operation): A = Async, S = Sync, E = Semi-sync, C = Adaptive Copy
D(omino) : X = Enabled, . = Disabled
A(daptive Copy) : D = Disk Mode, W = WP Mode, . = ACp off
采用下面的命令来开始源存储到目标存储的同步操作:
[root@zhyw1]#./symrdf -g zhyw_1500_950 resume
Execute an RDF 'Resume' operation for device
group 'zhyw_1500_950' (y/[n]) ? y
An RDF 'Resume' operation execution is
in progress for device group 'zhyw_1500_950'. Please wait...
Merge device track tables between source and target.......Started.
Devices: 015A-0179 in (3435,002)......................... Merged.
Devices: 013A-0159 in (3435,002)......................... Merged.
Devices: 019A-01B9 in (3435,002)......................... Merged.
Devices: 017A-0199 in (3435,002)......................... Merged.
Devices: 01BA-01D9 in (3435,002)......................... Merged.
Devices: 01DA-01F9 in (3435,002)......................... Merged.
Devices: 021A-0229 in (3435,002)......................... Merged.
Devices: 01FA-0219 in (3435,002)......................... Merged.
Merge device track tables between source and target.......Done.
Resume RDF link(s)........................................Started.
Resume RDF link(s)........................................Done.
The RDF 'Resume' operation successfully executed for
device group 'zhyw_1500_950'.
我们使用了下面的方式来查看srdf的同步情况
[root@zhyw1]#./symrdf -g zhyw_1500_950 query –i 5
我查了下,同步稳定的时候,从源存储到目标存储的同步速度大概在280MB/s。
因为整个同步过程大概需要3小时,我决定趁着这个功夫,去沙发小趟一下。
丁工进监控室后,
发现我躺在沙发上,建议我去他们的休息室休息。我想想也好,养足精深再干。
睡梦中被叫醒,应用系统的许工告诉我同步已经差不多结束了,但是同步到后来,
发现同步速度很慢。
srdf 同步 缓慢:
[root@zhyw1]#./symrdf -g zhyw_1500_950 query –i 5
DEV001 013A RW 0 0 RW 02EB WD 0 0 S.. Synchronized
DEV002 013E RW 0 0 RW 02EF WD 0 0 S.. Synchronized
DEV003 0142 RW 0 0 RW 02F3 WD 0 0 S.. Synchronized
DEV004 0146 RW 0 0 RW 02F7 WD 0 0 S.. Synchronized
DEV005 014A RW 0 31569 RW 02FB WD 0 31569 S.. SyncInProg
DEV006 014E RW 0 0 RW 02FF WD 0 0 S.. Synchronized
DEV007 0152 RW 0 0 RW 0303 WD 0 0 S.. Synchronized
DEV008 0156 RW 0 0 RW 0307 WD 0 0 S.. Synchronized
DEV009 015A RW 0 0 RW 030B WD 0 0 S.. Synchronized
DEV010 015E RW 0 0 RW 030F WD 0 0 S.. Synchronized
DEV011 0162 RW 0 0 RW 0313 WD 0 0 S.. Synchronized
DEV012 0166 RW 0 0 RW 0317 WD 0 0 S.. Synchronized
DEV013 016A RW 0 0 RW 031B WD 0 0 S.. Synchronized
DEV014 016E RW 0 0 RW 031F WD 0 0 S.. Synchronized
DEV015 0172 RW 0 0 RW 0323 WD 0 0 S.. Synchronized
DEV016 0176 RW 0 0 RW 0327 WD 0 0 S.. Synchronized
DEV017 017A RW 0 35641 RW 032B WD 0 35641 S.. SyncInProg
DEV018 017E RW 0 0 RW 032F WD 0 0 S.. Synchronized
DEV019 0182 RW 0 0 RW 0333 WD 0 0 S.. Synchronized
DEV020 0186 RW 0 0 RW 0337 WD 0 0 S.. Synchronized
DEV021 018A RW 0 0 RW 033B WD 0 0 S.. Synchronized
DEV022 018E RW 0 0 RW 033F WD 0 0 S.. Synchronized
DEV023 0192 RW 0 0 RW 0343 WD 0 0 S.. Synchronized
DEV024 0196 RW 0 0 RW 0347 WD 0 0 S.. Synchronized
DEV025 019A RW 0 0 RW 034B WD 0 0 S.. Synchronized
DEV026 019E RW 0 0 RW 034F WD 0 0 S.. Synchronized
DEV027 01A2 RW 0 0 RW 0353 WD 0 0 S.. Synchronized
DEV028 01A6 RW 0 0 RW 0357 WD 0 0 S.. Synchronized
DEV029 01AA RW 0 0 RW 035B WD 0 0 S.. Synchronized
DEV030 01AE RW 0 33371 RW 035F WD 0 33363 S.. SyncInProg
DEV031 01B2 RW 0 0 RW 0363 WD 0 0 S.. Synchronized
DEV032 01B6 RW 0 0 RW 0367 WD 0 0 S.. Synchronized
DEV033 01BA RW 0 0 RW 036B WD 0 0 S.. Synchronized
DEV034 01BE RW 0 0 RW 036F WD 0 0 S.. Synchronized
DEV035 01C2 RW 0 0 RW 0373 WD 0 0 S.. Synchronized
DEV036 01C6 RW 0 0 RW 0377 WD 0 0 S.. Synchronized
DEV037 01CA RW 0 0 RW 037B WD 0 0 S.. Synchronized
DEV038 01CE RW 0 0 RW 037F WD 0 0 S.. Synchronized
DEV039 01D2 RW 0 0 RW 0383 WD 0 0 S.. Synchronized
DEV040 01D6 RW 0 0 RW 0387 WD 0 0 S.. Synchronized
DEV041 01DA RW 0 0 RW 038B WD 0 0 S.. Synchronized
DEV042 01DE RW 0 30881 RW 038F WD 0 30848 S.. SyncInProg
DEV043 01E2 RW 0 0 RW 0393 WD 0 0 S.. Synchronized
DEV044 01E6 RW 0 0 RW 0397 WD 0 0 S.. Synchronized
DEV045 01EA RW 0 0 RW 039B WD 0 0 S.. Synchronized
DEV046 01EE RW 0 0 RW 039F WD 0 0 S.. Synchronized
DEV047 01F2 RW 0 0 RW 03A3 WD 0 0 S.. Synchronized
DEV048 01F6 RW 0 0 RW 03A7 WD 0 0 S.. Synchronized
DEV049 01FA RW 0 0 RW 03AB WD 0 0 S.. Synchronized
DEV050 01FE RW 0 0 RW 03AF WD 0 0 S.. Synchronized
DEV051 0202 RW 0 0 RW 03B3 WD 0 0 S.. Synchronized
DEV052 0206 RW 0 0 RW 03B7 WD 0 0 S.. Synchronized
DEV053 020A RW 0 0 RW 03BB WD 0 0 S.. Synchronized
DEV054 020E RW 0 0 RW 03BF WD 0 0 S.. Synchronized
DEV055 0212 RW 0 30301 RW 03C3 WD 0 30301 S.. SyncInProg
DEV056 0216 RW 0 0 RW 03C7 WD 0 0 S.. Synchronized
DEV057 021A RW 0 0 RW 03CB WD 0 0 S.. Synchronized
DEV058 021E RW 0 0 RW 03CF WD 0 0 S.. Synchronized
DEV059 0222 RW 0 0 RW 03D3 WD 0 0 S.. Synchronized
DEV060 0226 RW 0 0 RW 03D7 WD 0 0 S.. Synchronized
Total -------- -------- -------- --------
Track(s) 0 161763 0 161722
MB(s) 0.0 10110.2 0.0 10107.6
Synchronization rate : 0.0 MB/S
Estimated time to completion : 3 days, 02:53:24
Legend for MODES:
M(ode of Operation): A = Async, S = Sync, E = Semi-sync, C = Adaptive Copy
D(omino) : X = Enabled, . = Disabled
A(daptive Copy) : D = Disk Mode, W = WP Mode, . = ACp off
我看了下,现在同步基本已经hang在那里了.当然,对应的也只有几个disk 没有同步结束了。
Emc工程师解释说,因为现在的磁盘通道变少了,所以同步变慢了。
我想,如果停止对源存储的IO访问,应该可以加快同步的进程,于是我问黄工,现在是不是可以停止数据库应用,他说没问题,申请停机时间一直到凌晨5点,现在还有时间。
于是我很快的停掉了数据库应用。
果然,停止应用后,同步速度又上来了,达到了38M/s.我们都感到很兴奋。
很快同步就只剩下一个DEV了。但是奇怪的是,这个dev的同步一致没有完成。
看到这么慢的速度,EMC工程师也无法解释了,他们马上去检查了阵列的情况,
过了一会,EMC 张工给了我确认,下面的这块盘上有坏块:
DEV017 017A RW 0 1 RW 032B WD 0 1 S.. SyncInProg
我查了下 017A对应的是 hdiskpower20,
017A /dev/rhdiskpower20 09B:1 16C:C3 RDF1+Mir N/Grp'd (M) RW 49140
而这块盘正好对应了temp 表空间所在的 vg oravg7
看来黄工最开始的怀疑是正确的!
磁盘有问题,必须修复,但如何修复值得仔细研究。
我首先查看了下这个磁盘影响到的数据库文件有多少。
使用下面命令查看到磁盘对应的pdisk
./symdev list pd
017A /dev/rhdiskpower20 09B:1 16C:C3 RDF1+Mir N/Grp'd (M) RW 49140
使用下面命令查看到pdisk对应的 vg
Lspv
hdiskpower16 00c450b57c60a7cf oravg7 concurrent
hdiskpower17 00c450b57c56d256 oravg7 concurrent
hdiskpower18 00c450b57c5a2e83 oravg7 concurrent
hdiskpower19 00c450b57c5ee285 oravg7 concurrent
hdiskpower20 00c450b57c530a23 oravg7 concurrent
hdiskpower21 00c450b57c57d497 oravg7 concurrent
hdiskpower22 00c450b57c5bb0e0 oravg7 concurrent
hdiskpower23 00c450b57c5f9003 oravg7 concurrent
查看此vg下的lv信息:
# lsvg -l oravg7
oravg7:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
lv_raw93_16g raw 512 512 1 open/syncd N/A
lv_raw94_16g raw 512 512 1 open/syncd N/A
lv_raw95_16g raw 512 512 1 open/syncd N/A
lv_raw96_16g raw 512 512 1 open/syncd N/A
lv_raw97_16g raw 512 512 1 open/syncd N/A
lv_raw98_16g raw 512 512 1 closed/syncd N/A
lv_raw99_16g raw 512 512 1 closed/syncd N/A
lv_raw100_16g raw 512 512 1 closed/syncd N/A
lv_raw101_16g raw 512 512 1 closed/syncd N/A
lv_raw102_16g raw 512 512 1 closed/syncd N/A
lv_raw104_16g raw 512 512 1 closed/syncd N/A
lv_raw105_16g raw 512 512 1 closed/syncd N/A
lv_raw106_16g raw 512 512 1 closed/syncd N/A
lv_raw107_16g raw 512 512 1 closed/syncd N/A
lv_raw108_16g raw 512 512 1 closed/syncd N/A
lv_raw109_16g raw 512 512 1 closed/syncd N/A
lv_raw110_16g raw 512 512 2 closed/syncd N/A
lv_raw111_16g raw 512 512 2 closed/syncd N/A
lv_raw112_16g raw 512 512 2 closed/syncd N/A
lv_raw113_16g raw 512 512 2 closed/syncd N/A
lv_raw114_16g raw 512 512 2 closed/syncd N/A
lv_raw115_16g raw 512 512 2 closed/syncd N/A
lv_raw116_16g raw 512 512 2 closed/syncd N/A
在数据库层次查看lv使用情况: 对应了表空间 BSPLOG,temp,以及SRADM_TBS,即这些表空间受影响
SQL> run
1* select a.name,b.name from v$tablespace a ,v$datafile b where a.ts#=b.ts#
BSPLOG /dev/rlv_raw94_16g
BSPLOG /dev/rlv_raw93_16g
BSPLOG /dev/rlv_raw95_16g
SRADM_TBS /dev/rlv_raw96_16g
SQL> select name from v$tempfile;
NAME
--------------------------------------------------------------------------------
/dev/rlv_temp_8g
/dev/rlv_raw97_16g
王工和我解释,这个BSPLOG 表空间信息很少,而且容易恢复。而SRADM_TBS是之前流复制应用的表空间,可以忽略,最后一个lv对应了临时表空间的文件,也可以修复。
听到这么解释,我也有了底气:
准备采用下面的方式来恢复:
1>对数据库的BSPLOG表空间用户进行exp备份。
2>调整临时表空间的位置到没有问题的 oravg3,oravg6,再停止数据库
3>EMC工程师恢复底层的disk fracture
4>开启数据库,如果有问题则恢复
5>把BSPLOG表空间上的用户,表空间重建并利用第一步的备份来imp数据
好,可以开始行动了。
1》首先建立lv,供临时表空间使用,并赋予权限
mklv -y 'lv_temp2_14g' -T O -w n -s n -r n -t raw oravg3 448
mklv -y 'lv_temp3_14g' -T O -w n -s n -r n -t raw oravg6 448
mklv -y 'lv_templinshi_14g' -T O -w n -s n -r n -t raw oravg2 448
# chown oracle:oinstall /dev/rlv_temp2_14g
# chown oracle:oinstall /dev/rlv_temp3_14g
[root@zhyw2]#chown oracle:oinstall /dev/rlv_templinshi_1
2》新建temp1 表空间,利用temp1做桥梁重建temp
create temporary tablespace temp1
tempfile '/dev/rlv_templinshi_1 ' size 500M reuse
autoextend on next 100M maxsize unlimited
extent management local uniform size 1M;
SQL> create temporary tablespace temp1
2 tempfile '/dev/rlv_templinshi_1' size 500m reuse
3 autoextend on next 100m maxsize unlimited
4 extent management local uniform size 1m;
Tablespace created.
SQL> alter database default temporary tablespace temp1;
Database altered.
SQL> drop tablespace temp including contents and datafiles;
Tablespace dropped.
SQL> create temporary tablespace temp
2 tempfile '/dev/rlv_temp_8g' size 8000m reuse
3 autoextend on next 100m maxsize unlimited
4 extent management local uniform size 1m;
Tablespace created.
SQL> alter tablespace temp add tempfile '/dev/rlv_temp2_14g' size 13824m;
Tablespace altered.
SQL> alter tablespace temp add tempfile '/dev/rlv_temp3_14g' size 13824m;
Tablespace altered.
alter database default temporary tablespace temp;
drop tablespace temp1 including contents and datafiles;
3》做数据库 BSPLOG 表空间对应的用户的备份 giaplog
nohup exp giaplog/password file='/tmp/exp/giaplog.dmp' owner=giaplog log='/tmp/exp/giaplog.log' &
4》 EMC工程师恢复错误
5》启动数据库,关闭监听后,重建bsplog表空间和上面的用户,并导回数据
6》分析数据
SQL> execute dbms_utility.analyze_schema('GIAPLOG','COMPUTE');
PL/SQL procedure successfully completed
Executed in 6.9 seconds