http://yong321.freeshell.org/computer/deadlocks.txt
There're two common deadlocks. ORA-60, enqueue deadlocks; ORA-4020, library
cache deadlocks.
********** ORA-60 **********
Reproduction:
This is very easy to reproduce. Just follow this procedure:
In sessionA, update tb set col = 'x' where pk = 'aa'
In sessionB, update tb set col = 'x' where pk = 'bb'
In sessionA, update tb set col = 'x' where pk = 'bb'
In sessionB, update tb set col = 'x' where pk = 'aa'
If neither session commits, 3 seconds after the last statement shown above is
issued, sessionA will throw error ORA-60, with the corresponding deadlock graph
generated in the udump trace file:
---------Blocker(s)-------- ---------Waiter(s)---------
Resource Name process session holds waits process session holds waits
TX-00010017-000000ba 8 7 X 9 8 X
TX-00020012-000000af 9 8 X 8 7 X
session 7: DID 0001-0008-00000002 session 8: DID 0001-0009-00000002
session 8: DID 0001-0009-00000002 session 7: DID 0001-0008-00000002
Rows waited on:
Session 8: obj - rowid = 00000CBB - AAAAzjAADAAAAFDAAA
Session 7: obj - rowid = 00000CBB - AAAAzjAADAAAAFDAAB
Interpretation:
Resource here simply means lock as in v$resource (technically it should be the
locked resource). TX stands for transaction, usually row lock. The numbers that
follow are v$lock.id1, v$lock.id2, respectively, in hex radix for TX enqueue.
E.g., 20012[hex] is 131090; ba[hex] is 175. Anjo Kolk's Wait Event paper
(Description of Oracle7 Wait Events and Enqueues, used to be at
www.dbatoolbox.com/WP2001/dbamisc/events.pdf) tells us that id1 is "undo
segment number << 16 | slot". Indeed we find v$transaction.xidusn = 2 and
x$transaction.xidslot = 18. So 10[bin] left shifted 16 bits ORed with 18 equals
131090. id2 should be "sequence"; v$transaction.xidsqn = 175. Process is
Oracle process ID (not to be confused with the OS level process ID in UNIX or
thread ID in Windows); it is v$process.pid where v$process.addr matches
v$session.paddr. Session is v$session.sid. X means it's holding or waiting for
an exclusive lock. DID is deadlock ID (see Note:6145177.8); the first number is
always 1; second number is Oracle process ID; I don't know the third number.
For sessions waiting for row locks, additional
information about the rows is given. CBB[hex] is dba_objects.object_id from
which you'll know the table. Then the rowid is given so that you can quickly
find the row by SELECT * FROM thetable WHERE ROWID = 'rowid'. If the "Rows
waited on" is followed by "no rows", then it's waiting on a table lock, not a
row lock, like in this case:
---------Blocker(s)-------- ---------Waiter(s)---------
Resource Name process session holds waits process session holds waits
TX-00060004-00003155 41 45 X 15 26 S
TX-0012000b-00002802 15 26 X 41 45 S
session 45: DID 0001-0029-00000002 session 26: DID 0001-000F-00000002
session 26: DID 0001-000F-00000002 session 45: DID 0001-0029-00000002
Rows waited on:
Session 26: no row
Session 45: no row
Following the deadlock graph is process state dump, which shows state object
(SO) hierarchy in properly indented form, with the root being a process SO.
Remedy:
If the "Session <SID>" line gives row IDs, simply wait and rerun the
application later. And consider application design to see if you have two
programs updating the same table but in different order. [note1]
If the Session line says "no row", the error could be triggered when multiple
sessions try to update the same block while the free space for the block is
already too small for Oracle to automatically increase ITLs (interested
transaction lists), regardless the value of MAXTRANS. You can try rebuilding
the table with a higher INITRANS (initial ITL entries) and setting a higher
value of PCTFREE. Beginning with 8i, the easiest way to do this is ALTER TABLE
yourtable MOVE STORAGE (INITRANS xxx).
Another scenario which does happen often is that you don't have indexes on
child table foreign key columns and multiple sessions are updating the parent
table primary key columns or deleting rows from the parent table. Both sessions
will attempt to acquire a shared table lock on the child table and if they
already hold row locks (on different rows), a deadlock results. In this case,
you may build indexes on foreign keys in the child table. Numerous sites
provide scripts to find foreign keys that are missing indexes
(osi.oracle.com/~tkyte/unindex/ and www.jlcomp.demon.co.uk/faq/fk_ind.html). If
you don't do this (possibly for legitimate reason such as child table DML
performance), make sure you never or rarely delete rows from the parent table
(updating parent table primary keys is very rare). Or as Steve Adams suggested,
ALTER TABLE childtable DISABLE TABLE LOCK.
Further reading: Metalink Note:62365.1
********** ORA-4020 **********
Reproduction:
SQL> create procedure p1 as begin p2; end;
2 /
Warning: Procedure created with compilation errors.
SQL> create procedure p2 as begin p1; end;
2 /
Warning: Procedure created with compilation errors.
SQL> alter procedure p1 compile;
alter procedure p1 compile
*
ERROR at line 1:
ORA-04020: deadlock detected while trying to lock object YONG.P1
The above case is modified from an example in Connor McDonald's "Mastering
Oracle PL/SQL: Practical Solutions". Obviously, you can't have mutual library
cache locks in each other's code.
More commonly, this error is thrown due to invalid objects especially triggers,
or due to Oracle bugs. E.g., Jonathan Lewis in "Practical Oracle8i" reported
that query throws ORA-4020 if partitioned IOT maintenance is going on. An
ORA-4020 deadlock graph looks like the following:
ORA-04020: deadlock detected while trying to lock object YONG.P1
--------------------------------------------------------
object waiting waiting blocking blocking
handle session lock mode session lock mode
-------- -------- -------- ---- -------- -------- ----
797992F8 7AA06C48 79FA5554 X 7AA06C48 79F8B3D0 X
--------------------------------------------------------
Note that in this case, the waiting session is the same as the blocking session
because they have the same session address.
Remedy:
If the deadlock is caused by the same session that experiences the deadlock,
make sure you don't have any invalid objects on the table reported in the
deadlock trace file, such as invalid triggers. If you do, manually compile
them. With this deadlock, it's helpful to get more insight by tracing the event
4020 as follows: alter session set events '4020 trace name processstate
forever, level 10'; or if you don't know or can't access the session, put
event='4020 trace name processstate forever, level 10' in init.ora file and
bounce the database. The trace file thus created when the deadlock occurs next
time may have strings like "status=INVL". Look at the corresponding object name
after "name=". If you do experience ORA-4020 and it happens fairly often, search
on Metalink or open a tar.
********** Deadlocks in RAC **********
There's no error number reported but both ORA-60 and ORA-4020 are possible. In
alert.log, we only see "Global Enqueue Services Deadlock detected. More info in
filen /home/oracle/admin/<SID>/bdump/<SID>_lmd0_<pid>.trc." ["filen" is not
typo; it means "file name".]
LMD trace file has:
user session for deadlock lock 0x7553ab14
pid=33 serial=37657 audsid=6450105 user: 246/<LinuxUsername>
O/S info: user: <WindowsUsername>, term: MISFBB10-54, ospid: 4148:5156, machine: <NTdomain>\<PCname>
program: sqlplus.exe
application name: SQL*Plus, hash value=3669949024
Current SQL Statement:
update test set a = :"SYS_B_0" where a = :"SYS_B_1"
Global Wait-For-Graph(WFG) at ddTS[0.28] :
BLOCKED 0x7553ab14 5 [0xf001d][0x8353],[TX] [2162689,7995] 0
BLOCKER 0x69cabb5c 5 [0xf001d][0x8353],[TX] [1769474,4478] 1
BLOCKED 0x69cabc30 5 [0x70014][0xe908],[TX] [1769474,4478] 1
BLOCKER 0x7e1424e8 5 [0x70014][0xe908],[TX] [2162689,7995] 0
That was Oracle 9.2.0.7.0 on Linux accessed from a Windows PC. According to
Note:262226.1, the first process in instance 0 (on the first line) requested for
a TX lock in mode 5 or exclusive [note2] and was blocked by the second process in
instance 1 (on the second line), which requested for another TX lock (third
line) and was blocked by the first process (fourth line). The note interprets
the numbers after the lock name, e.g. [2162689,7995], as process ID. But if
there're two numbers in square brackets, those two numbers are transaction_id0
and transaction_id1 of v$ges_blocking_enqueue, respectively, or equivalently
the same columns in v$dlm_locks. (But I don't know what the transaction number
means; it doesn't seem to be concatenation of xidusn.xidslot.xidsqn of
v$transaction.) In addition, the hex numbers following "BLOCKED" or "BLOCKER"
are handles (lock pointers) of v$ges_blocking_enqueue, and the two hex numbers
after lock mode are id1 and id2 in v$lock (refer to
http://yong321.freeshell.org/oranotes/LocksWhere2FindMoreInfo.txt on how to
find their meanings).
The above trace shows the SQL involved in the deadlock. But in some cases, the
SQL is missing,[note3] as in this example:
Global Wait-For-Graph(WFG) at ddTS[0.15] :
BLOCKED 0x70d17058 5 [0xf4dd2a46][0x57e4077],[LB] 30268 0
BLOCKER 0x70d16594 5 [0xf4dd2a46][0x57e4077],[LB] 30268 0
This deadlock was caused by B-namespace library cache lock requested and
blocked by itself. The process PID is 30268 (a single number, not in brackets).
Unfortunately it's missing the SQL. Note:262226.1 acknowledges that it would be
more helpful to add the SQL to the WFG trace. In the meantime, if the deadlock
can be reproduced with one instance (as in this case because the last column,
instance number, is the same number), then do so by setting cluster_database
to false; a deadlock graph in a non-cluster database is much more useful than a
global WFG, which is created even if all involved processes are on one node.
Yong Huang
[email protected]
________________________
[note1]
It's often suggested that applications should be written in a way to ensure the
same order in running DMLs on the table. This means ProgramI should update
tblX then tblY and ProgramII should also update them in this order. But
there're many other cases than this rule should apply. Consider this case:
if conditionA, update tblX
if conditionB, update tblY
if conditionC, update tblX
if conditionD, update tblY
If this single program (no ProgramII) is run by multiple processes, depending
on the conditions met in each process, a deadlock could occur.
[note2]
According to
http://www.rachelp.nl/index_kb.php?menu=articles&actie=show&id=15
these mode numbers start with 0. So mode 5 here means exclusive, which would
be mode 6 in non-RAC lock mode.
[note3]
According to Dusan Bolek (message 10 of
http://groups.google.com/group/comp.databases.oracle.server/browse_frm/thread/a1e9aacfc4e16e25/),
the SQL is present if the deadlock occurs between instances and missing if
on the same instance.
ORA-12829
ORA-99
12345678901234567890123456789012345678901234567890123456789012345678901234567890