ORA-01591故障处理

 

早晨到办公室听同事说表被锁了,一试,发现表中某字段为1111111的行都被锁了,SELECT都不行。报错误ORA-01591,打开TOAD的KnowledgeeXpert,描述很少,只是说由于分布式事务错误而造成锁定。询问同事,昨天通过一个调用另一个存储过程出了错误,而后者通过透明网关insert一些数据到数据库。
立即想到打开OEM,谁知道大失所望,进入锁,根本没发现相关的对象被锁定,开始有点郁闷。转而检查会话,该用户有5个会话,都是INACTIVE,不管三七二十一,全部杀掉。结果依旧,并且锁也没有出现。远程登陆上主机,发现和都正常,也没有发现透明网关进程挂死(之前曾发现TG4SQL在无业务量时也会出现25%左右的CPU,挂死)。
突然想到看看alert.log,经过仔细搜索,终于发现:

WedNov1700:00:042004
Errorsinfiled:“oracle“admin“xdcj“udump“xdcj_j006_3020.trc:
ORA-12012:自动执行作业82出错
ORA-01591:锁定已被有问题的分配事务处理6.5.887985挂起
ORA-06512:在line6

这正是出错的地方,往前追溯:

TueNov1617:35:042004
Error28500trappedin2PContransaction6.5.887985.Cleaningup.
Errorstackreturnedtouser:
ORA-02054:事务处理6.5.887985有问题
ORA-28500:连接ORACLE到非Oracle系统时返回此:
[TransparentgatewayforMSSQL]
ORA-02063:紧接着2lines(源于ZSMOS_CRM)
TueNov1617:35:042004
DISTRIBTRANQDCJ.US.ORACLE.COM.5ae32328.6.5.887985
islocaltran6.5.887985
insertpendingpreparedtran,scn=6606197672830
TueNov1617:35:072004
Errorsinfiled:“oracle“admin“xdcj“bdump“xdcj_reco_3024.trc:
ORA-28500:connectionfromORACLEtoanon-Oraclesystemreturnedthismessage:
[TransparentgatewayforMSSQL][Microsoft][ODBCSQLServerDriver][SQLServer]用户'RECOVER'登录失败。
ORA-02063:preceding2linesfromZSMOS_CRM

TueNov1617:35:122004
Errorsinfiled:“oracle“admin“xdcj“bdump“xdcj_reco_3024.trc:
ORA-28500:connectionfromORACLEtoanon-Oraclesystemreturnedthismessage:
[TransparentgatewayforMSSQL][Microsoft][ODBCSQLServerDriver][SQLServer]用户'RECOVER'登录失败。
ORA-02063:preceding2linesfromZSMOS_CRM

这就是事发地点了。看来是昨天下午远程事务失败,但是又没有返回造成分布式事务挂死,从而锁定了行。终于找到了详细的错误ORA-02054,进入TOAD一查,说是要等待或者提交该事务,可是怎么操作呢。还是打开官方文档搜索相关内容,在AdminstratorGuide中发现如下内容:
DiscoveringProblemswithaTwo-PhaseCommit
Theuserapplicationthatcommitsadistributedtransactionisinformedofaproblembyoneofthefollowingerrormessages:

ORA-02050:transactionIDrolledback,
someremotedbsmaybein-doubt
ORA-02051:transactionIDcommitted,
someremotedbsmaybein-doubt
ORA-02054:transactionIDin-doubt


Arobustapplicationshouldsaveinformationaboutatransactionifitreceivesanyoftheaboveerrors.Thisinformationcanbeusedlaterifmanualdistributedtransactionrecoveryisdesired.

Noactionisrequiredbytheadministratorofanynodethathasoneormorein-doubtdistributedtransactionsduetoanetworkorsystemfailure.TheautomaticrecoveryfeaturesofOracletransparentlycompleteanyin-doubttransactionsothatthesameoutcomeoccursonallnodesofasessiontreeafterthenetworkorsystemfailureisresolved.

Inextendedoutages,however,youcanforcethecommitorrollbackofatransactiontoreleaseanylockeddata.Applicationsmustaccountforsuchpossibilities.

DeterminingWhethertoPerformaManualOverride
Overrideaspecificin-doubttransactionmanuallyonlywhenoneofthefollowingsituationsexists:

Thein-doubttransactionlocksdatathatisrequiredbyothertransactions.ThissituationoccurswhentheORA-01591errormessageinterfereswithusertransactions.
Anin-doubttransactionpreventstheextentsofarollbacksegmentfrombeingusedbyothertransactions.Thefirstportionofanin-doubtdistributedtransaction'slocaltransactionIDcorrespondstotheIDoftherollbacksegment,aslistedbythedatadictionaryviewsDBA_2PC_PENDINGandDBA_ROLLBACK_SEGS.
Thefailurepreventingthetwo-phasecommitphasestocompletecannotbecorrectedinanacceptabletimeperiod.Examplesofsuchcasesincludeatelecommunicationnetworkthathasbeendamagedoradamageddatabasethatrequiresalongrecoverytime.
Normally,youshouldmakeadecisiontolocallyforceanin-doubtdistributedtransactioninconsultationwithadministratorsatotherlocations.Awrongdecisioncanleadtodatabaseinconsistenciesthatcanbedifficulttotraceandthatyoumustmanuallycorrect.

Iftheconditionsabovedonotapply,alwaysallowtheautomaticrecoveryfeaturesofOracletocompletethetransaction.Ifanyoftheabovecriteriaaremet,however,consideralocaloverrideofthein-doubttransaction.

看来是建议差不多,后面Oracle总是试图登录SQlServer就是要自动恢复,可是总不成功。察看视图DBA_2PC_PENDING确实发现了该事务的痕迹。要怎样操作呢?

ManuallyCommittinganIn-DoubtTransaction
Beforeattemptingtocommitthetransaction,ensurethatyouhavetheproperprivileges.Notethefollowingrequirements:

Ifthetransactionwascommittedby...Thenyoumusthavethisprivilege...
You
FORCETRANSACTION

Anotheruser
FORCEANYTRANSACTION


CommittingUsingOnlytheTransactionID
ThefollowingSQLstatementcommitsanin-doubttransaction:

COMMITFORCE'transaction_id';


Thevariabletransaction_idistheidentifierofthetransactionasspecifiedineithertheLOCAL_TRAN_IDorGLOBAL_TRAN_IDcolumnsoftheDBA_2PC_PENDINGdatadictionaryview.

Forexample,assumethatyouqueryDBA_2PC_PENDINGanddeterminethatLOCAL_TRAN_IDforadistributedtransactionis1:45.13.

YouthenissuethefollowingSQLstatementtoforcethecommitofthisin-doubttransaction:

COMMITFORCE'1.45.13';

CommittingUsinganSCN
Optionally,youcanspecifytheSCNforthetransactionwhenforcingatransactiontocommit.Thisfeatureallowsyoutocommitanin-doubttransactionwiththeSCNassignedwhenitwascommittedatothernodes.

Consequently,youmaintainthesynchronizedcommittimeofthedistributedtransactionevenifthereisafailure.SpecifyanSCNonlywhenyoucandeterminetheSCNofthesametransactionalreadycommittedatanothernode.

Forexample,assumeyouwanttomanuallycommitatransactionwiththefollowingglobaltransactionID:

SALES.ACME.COM.55d1c563.1.93.29

First,querytheDBA_2PC_PENDINGviewofaremotedatabasealsoinvolvedwiththetransactioninquestion.NotetheSCNusedforthecommitofthetransactionatthatnode.SpecifytheSCNwhencommittingthetransactionatthelocalnode.Forexample,iftheSCNis829381993,issue:

COMMITFORCE'SALES.ACME.COM.55d1c563.1.93.29',829381993;

SeeAlso:
Oracle9iSQLReferenceformoreinformationaboutusingtheCOMMITstatement


ManuallyRollingBackanIn-DoubtTransaction
Beforeattemptingtorollbackthein-doubtdistributedtransaction,ensurethatyouhavetheproperprivileges.Notethefollowingrequirements:

Ifthetransactionwascommittedby...Thenyoumusthavethisprivilege...
You
FORCETRANSACTION

Anotheruser
FORCEANYTRANSACTION


ThefollowingSQLstatementrollsbackanin-doubttransaction:

ROLLBACKFORCE'transaction_id';


Thevariabletransaction_idistheidentifierofthetransactionasspecifiedineithertheLOCAL_TRAN_IDorGLOBAL_TRAN_IDcolumnsoftheDBA_2PC_PENDINGdatadictionaryview.

Forexample,torollbackthein-doubttransactionwiththelocaltransactionIDof2.9.4,usethefollowingstatement:

ROLLBACKFORCE'2.9.4';

于是登陆数据库
COMMITFORCE'6.5.887985';
然后查看DBA_2PC_PENDING发现状态已经改为'COMMITFORCE',SELECT该表相关行,一切正常。至此,故障解决。
总体来看,直接INSERT...TABLENAME@SQLDBLK还是很危险的,遇上不能正常返回就出问题了。Oracle的文档是推荐使用包或者存储过程来解决,此后建议同事改用此方法,目前已经测试通过。

你可能感兴趣的:(ORA-01591故障处理)