本文将模拟一个Deadlock detected并从数据库SQL和告警日志分析定位相关死锁对象。
之后将展示一个生产环境中的死锁的处理过程。
[oracle@gtlions ~]$ sqlplus gtlions/gtlions
SQL*Plus: Release 10.2.0.5.0 - Production on Mon May 7 14:07:03 2012
Copyright (c) 1982, 2010, Oracle. All Rights Reserved.
Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
SQL> create table tm_lock_t1(id int);
Table created.
SQL> create table tm_lock_t2(id int);
Table created.
SQL> insert into tm_lock_t1 values(1);
1 row created.
SQL> insert into tm_lock_t2 values(2);
1 row created.
SQL> commit;
Commit complete.
在会话一执行:
SQL> select * from v$mystat where rownum=1;
SID STATISTIC# VALUE
---------- ---------- ----------
148 0 1
SQL> update tm_lock_t1 set id=1+1;
1 row updated.
在会话二执行:
SQL> select * from v$mystat where rownum=1;
SID STATISTIC# VALUE
---------- ---------- ----------
147 0 1
SQL> update tm_lock_t2 set id=2+1;
1 row updated.
返回在会话一执行:
SQL> update tm_lock_t2 set id=3+1;
这个时候就会产生死锁,我们在会话三查看相关信息:
SQL> set pagesize 150;
SQL> set linesize 150;
Select * From dba_blockers;
HOLDING_SESSION
---------------
147
SQL> Select * From v$lock Where Sid=147;
ADDR KADDR SID TY ID1 ID2 LMODE REQUEST CTIME BLOCK
---------------- ---------------- ---------- -- ---------- ---------- ---------- ---------- ---------- ----------
00000000697A8D70 00000000697A8D98 147 TM 51800 0 3 0 63 0
000000006846DAE0 000000006846DB18 147 TX 196636 225 6 0 63 1 #可以看到147阻塞了其他会话,其他会话数=1,也就是我们测试的会话148。
我们来看下产生死锁的语句的对象信息,首先看下TM锁的P1和P2的含义:
SQL> Select PARAMETER1,PARAMETER2 From v$event_name Where Name Like '%TM%'
2 ;
PARAMETER1 PARAMETER2
---------------------------------------------------------------- ----------------------------------------------------------------
name|mode object #
name|mode QT_OBJ#
果不然是我们的TM_LOCK_T2,可以看到P2是对象ID
SQL> select object_name from dba_objects where object_id=51800;
OBJECT_NAME
--------------------------------------------------------------------------------------------------------------------------------
TM_LOCK_T2
#我们在看看相关TX锁的信息,首先看下TX锁的P1和P2的含义:
SQL> Select PARAMETER1,PARAMETER2 From v$event_name Where Name ='enq: TX - row lock contention';
PARAMETER1 PARAMETER2
---------------------------------------------------------------- ----------------------------------------------------------------
name|mode usn<<16 | slot
#其相关计算关系如下:
SQL> select XIDUSN,XIDSLOT,XIDSQN from V$TRANSACTION where XIDSQN=225;
XIDUSN XIDSLOT XIDSQN
---------- ---------- ----------
3 28 225
SQL> Select 3*power(2,16)+28,196636 From dual;
3*POWER(2,16)+28 196636
---------------- ----------
196636 196636
注意,一直到这里还是没有出现死锁,我们说的死锁是会立即返回说明产生了死锁了的,并中断命令,上述只能说是锁等待,这个一定要注意,如果要再现一个死锁现象,你只需要在会话二再执行如下:
SQL> update tm_lock_t1 set id=1+1;
紧接着,你可以在会话一看到如下的信息,说明发生了死锁:
SQL> update tm_lock_t2 set id=3+1;
update tm_lock_t2 set id=3+1
*
ERROR at line 1:
ORA-00060: deadlock detected while waiting for resource
这个时候你可以去看下告警日志:
[oracle@gtlions bdump]$ tail -f alert_gt10g.log
QMNC started with pid=16, OS id=5866
Mon May 07 13:51:31 CST 2012
Completed: ALTER DATABASE OPEN
Mon May 07 13:53:00 CST 2012
CREATE TABLESPACE gtlions DATAFILE '/u01/oracle/10g/oradata/gt10g/gtlions01.ora' SIZE 10M AUTOEXTEND ON NEXT 5M MAXSIZE 100M
Mon May 07 13:53:02 CST 2012
Completed: CREATE TABLESPACE gtlions DATAFILE '/u01/oracle/10g/oradata/gt10g/gtlions01.ora' SIZE 10M AUTOEXTEND ON NEXT 5M MAXSIZE 100M
Mon May 07 13:53:02 CST 2012
CREATE TEMPORARY TABLESPACE gtlionstmp TEMPFILE '/u01/oracle/10g/oradata/gt10g/gtlionstmp01.ora' SIZE 10M AUTOEXTEND ON NEXT 5M MAXSIZE 100M
Completed: CREATE TEMPORARY TABLESPACE gtlionstmp TEMPFILE '/u01/oracle/10g/oradata/gt10g/gtlionstmp01.ora' SIZE 10M AUTOEXTEND ON NEXT 5M MAXSIZE 100M
Mon May 07 14:26:36 CST 2012
ORA-00060: Deadlock detected. More info in file /u01/oracle/10g/admin/gt10g/udump/gt10g_ora_6087.trc.
告警日志留下了发生死锁的信息,我们可以看下具体的trc日志:
[oracle@gtlions bdump]$ more /u01/oracle/10g/admin/gt10g/udump/gt10g_ora_6087.trc
*** 2012-05-07 14:26:36.072
*** ACTION NAME:() 2012-05-07 14:26:36.072
*** MODULE NAME:(SQL*Plus) 2012-05-07 14:26:36.072
*** SERVICE NAME:(SYS$USERS) 2012-05-07 14:26:36.072
*** SESSION ID:(148.8) 2012-05-07 14:26:36.072 #错误会话的sid=148
DEADLOCK DETECTED ( ORA-00060 )
[Transaction Deadlock]
The following deadlock is not an ORACLE error. It is a
deadlock due to user error in the design of an application
or from issuing incorrect ad-hoc SQL. The following
information may aid in determining the deadlock:
Deadlock graph:
---------Blocker(s)-------- ---------Waiter(s)---------
Resource Name process session holds waits process session holds waits
TX-0001001c-00000109 15 148 X 19 147 X
TX-0003001c-000000e1 19 147 X 15 148 X
session 148: DID 0001-000F-00000011 session 147: DID 0001-0013-00000004
session 147: DID 0001-0013-00000004 session 148: DID 0001-000F-00000011
Rows waited on:
Session 147: obj - rowid = 0000CA57 - AAAMpXAAFAAAAAUAAA #对象ID
(dictionary objn - 51799, file - 5, block - 20, slot - 0) #对象ID、文件ID、块ID
Session 148: obj - rowid = 0000CA58 - AAAMpYAAFAAAAAcAAA #对象ID
(dictionary objn - 51800, file - 5, block - 28, slot - 0) #对象ID、文件ID、块ID
Information on the OTHER waiting sessions: #产生死锁会话的sid=147
Session 147:
sid: 147 ser: 92 audsid: 60013 user: 58/GTLIONS
flags: (0x41) USR/- flags_idl: (0x1) BSY/-/-/-/-/-
flags2: (0x8)
pid: 19 O/S info: user: oracle, term: pts/4, ospid: 6088
如果我们需要根据告警信息来定位呢?也很简单,如上,我们可以换算一下相关的信息:
SQL> Select dbms_rowid.rowid_object('AAAMpXAAFAAAAAUAAA'),dbms_rowid.rowid_object('AAAMpYAAFAAAAAcAAA') From dual;
DBMS_ROWID.ROWID_OBJECT('AAAMPXAAFAAAAAUAAA') DBMS_ROWID.ROWID_OBJECT('AAAMPYAAFAAAAACAAA')
--------------------------------------------- ---------------------------------------------
51799 51800
SQL> Select dbms_rowid.rowid_block_number('AAAMpXAAFAAAAAUAAA') From dual;
DBMS_ROWID.ROWID_BLOCK_NUMBER('AAAMPXAAFAAAAAUAAA')
---------------------------------------------------
20
SQL> Select to_number('0000CA57','xxxxxxxx'),to_number('0000CA58','xxxxxxxx') From dual ;
TO_NUMBER('0000CA57','XXXXXXXX') TO_NUMBER('0000CA58','XXXXXXXX')
-------------------------------- --------------------------------
51799 51800
一次在生产环境中的日常巡检中,在告警日志中发现了大量如下信息:
Global Enqueue Services Deadlock detected. More info in file
/opt/oracle/product/admin/..../.../_ora_6087.trc.
Global Enqueue Services Deadlock detected. More info in file
/opt/oracle/product/admin/..../.../_ora_60xx.trc.
Global Enqueue Services Deadlock detected. More info in file
/opt/oracle/product/admin/..../.../_ora_6xxx.trc.
首先询问了开发人员,开是否有正在执行的相关未提交的SQL语句,说是没有,于是继续查看trc文件,发现如下信息:
user session for deadlock lock 0xc00000023b0d4d08
pid=179 serial=50640 audsid=19934825 user: 59/RBT
O/S info: user: root, term: unknown, ospid: 1234, machine: tomcat108
program: JDBC Thin Client
application name: JDBC Thin Client, hash value=2546894660
Current SQL Statement:
delete from tab1 where ID=:1
上面可以看到产生死锁的会话来源和语句。
继续往下看:
----------resource 0xc00000020bc3ecc0----------------------
resname : [0x225f6][0x0],[TM]
这里说明了相关的资源的信息:
0x225f6是相关的对象ID,其实这次死锁的产生原因是由于上述的tab1有个键被别的表tab2创建为外键,但是在tab2没有创建相关的索引造成的,可以使用以下语句快速定位相关主外键的信息:
select uc.constraint_name, uc.owner child_owner, uc.table_name child_table, ucc.column_name child_column
,ucp.owner parent_owner, ucp.table_name parent_table, ucp.column_name parent_column
from dba_constraints uc, dba_cons_columns ucc, dba_cons_columns ucp
where uc.owner = 'TRSDATA'
and uc.table_name = 'TABLEB'
and ucc.constraint_name = uc.constraint_name
and ucp.constraint_name = uc.r_constraint_name
在本例中,在tab2上创建下外键的索引就解决了问题,在实际操作过程中,由于业务在不间断的运行,因此创建索引的时候加上了online参数。
-The End-