先说说什么是死锁(Deadlock),关于死锁的定义google、baidu可以轻易查到,我也不想引经据典,我用一个简单故事来说明一下死锁。
话说一个风和日丽的下午,小明和小强打架,由于情节严重被老师批评教育不说还得放学后写悔过书,大家知道写悔过书要纸和笔,恰巧在刚才那场大战中小明将小 强全部作业本和草稿纸撕碎,而小强将小明所有笔摔个稀巴烂,现在两人势如水火,别说借东西,话都不想说,于是就这么等啊等,等到老师来看他们的悔过书写得 怎么样,结果当然是“还没开始写”,于是老师说:“小明你把纸借给小强,让小强写”。
哈哈整个故事有点牵强,但这就是一个经典死锁场景,如果老师不出来调停,他两就只能这样等下去,俗语叫“等死”了。
说回正题,Oracle 特殊的锁管理模式使发生死锁的几率大大减少,但是,要相信“一切皆有可能”,以后几篇博客对会分析导致 ORA-00060 的各种场景和处理方法。
今天说说第一个场景,也是网上能找到最多的场景,我都不太好意思在标题上加个“[原]”标志了。
还是使用经典的 scott demo 吧:
开两个会话,看看各自的sid:
1
|
select sid from v$mystat where rownum=1;
|
我这里两个会话的 sid 分别是 126 和128 。看看示例数据:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
|
scott$mydb@test02 SQL> set pagesize 50 ;
scott$mydb@test02 SQL> select * from emp ;
EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO
---------- ---------- --------- ---------- ------------------- ---------- ---------- ----------
7369 SMITH CLERK 7902 1980-12-17 00:00:00 800 20
7499 ALLEN SALESMAN 7698 1981-02-20 00:00:00 1600 300 30
7521 WARD SALESMAN 7698 1981-02-22 00:00:00 1250 500 30
7566 JONES MANAGER 7839 1981-04-02 00:00:00 2975 20
7654 MARTIN SALESMAN 7698 1981-09-28 00:00:00 1250 1400 30
7698 BLAKE MANAGER 7839 1981-05-01 00:00:00 2850 30
7782 CLARK MANAGER 7839 1981-06-09 00:00:00 2450 10
7788 SCOTT ANALYST 7566 1987-04-19 00:00:00 3000 20
7839 KING PRESIDENT 1981-11-17 00:00:00 5000 10
7844 TURNER SALESMAN 7698 1981-09-08 00:00:00 1500 0 30
7876 ADAMS CLERK 7788 1987-05-23 00:00:00 1100 20
7900 JAMES CLERK 7698 1981-12-03 00:00:00 950 30
7902 FORD ANALYST 7566 1981-12-03 00:00:00 3000 20
7934 MILLER CLERK 7782 1982-01-23 00:00:00 1300 10
|
我想为工资(SAL)最低的两个人(ENAME:Smith、James EMPNO:7369、7900)加工资1元(我承认这次工资的涨幅最不上CPI),而另一位毫不知情的 Manager Blake 也想给他们加1元的工资,我在 sid 为 126 进行操作,而 Blake 在 sid 为 128 的会话中操作,执行顺序如下表:
+--------------------------------------------+--------------------------------------------+ | Session 1 (sid=126) | Session 2 (sid=128) | +--------------------------------------------+--------------------------------------------+ | update emp set sal=sal+1 where empno=7369; | | +--------------------------------------------+--------------------------------------------+ | | update emp set sal=sal+1 where empno=7900; | +--------------------------------------------+--------------------------------------------+ | update emp set sal=sal+1 where empno=7900; | | +--------------------------------------------+--------------------------------------------+ | | update emp set sal=sal+1 where empno=7369; | +--------------------------------------------+--------------------------------------------+ | ORA-00060: deadlock detected | | | while waiting for resource | | +--------------------------------------------+--------------------------------------------+
这样我们就成功地触发了一个ORA-00060,从aler文件中可以看到一条类似如下的信息:
1
|
ORA-00060: Deadlock detected. More info in file /u01/app/admin/mydb/udump/mydb_ora_7531.trc.
|
我们看看 trc 文件,其中最有用的一部分是 Deadlock graph :
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
|
[Transaction Deadlock]
Current SQL statement for this session:
update emp set sal=sal+1 where empno=7900
The following deadlock is not an ORACLE error. It is a
deadlock due to user error in the design of an application
or from issuing incorrect ad-hoc SQL. The following
information may aid in determining the deadlock:
Deadlock graph:
---------Blocker(s)-------- ---------Waiter(s)---------
Resource Name process session holds waits process session holds waits
TX-00010013-00005f96 32 126 X 34 128 X
TX-00030001-000065ed 34 128 X 32 126 X
session 126: DID 0001-0020-00003C79 session 128: DID 0001-0022-0000772F
session 128: DID 0001-0022-0000772F session 126: DID 0001-0020-00003C79
Rows waited on:
Session 128: obj - rowid = 0000C7CF - AAAMfPAAEAAAAAgAAA
(dictionary objn - 51151, file - 4, block - 32, slot - 0)
Session 126: obj - rowid = 0000C7CF - AAAMfPAAEAAAAAgAAL
(dictionary objn - 51151, file - 4, block - 32, slot - 11)
Information on the OTHER waiting sessions:
Session 128:
pid=34 serial=31980 audsid=319634 user: 54/SCOTT
O/S info: user: oracle, term: pts/8, ospid: 22771, machine: test02
program: sqlplus@test02 (TNS V1-V3)
application name: SQL*Plus, hash value=3669949024
Current SQL Statement:
update emp set sal=sal+1 where empno=7369
End of information on OTHER waiting sessions.
|
大家可以看到引起死锁的语句(一个巴掌拍不响,一般死锁都要2条或以上的语句才能引起死锁),还有注意产生的object,这里的objn是51151,正是emp表:
1
2
3
4
5
|
scott$mydb@test02 SQL> select object_id,object_name from user_objects where object_name='EMP';
OBJECT_ID OBJECT_NAME
---------- ------------------------------
51151 EMP
|
如果八卦一点,还可以看看引起死锁的相关者都在等什么而“等死”了:
1
2
3
4
5
|
Rows waited on:
Session 128: obj - rowid = 0000C7CF - AAAMfPAAEAAAAAgAAA
(dictionary objn - 51151, file - 4, block - 32, slot - 0)
Session 126: obj - rowid = 0000C7CF - AAAMfPAAEAAAAAgAAL
(dictionary objn - 51151, file - 4, block - 32, slot - 11)
|
session 128 (也就是session 2,sid=128) 等着要ROWID=AAAMfPAAEAAAAAgAAA的行锁,而session 126 (也就是session 1,sid=126)等着要ROWID=AAAMfPAAEAAAAAgAAL的行,验证一下:
1
2
3
4
5
6
|
scott$mydb@test02 SQL> select rowid,empno from emp where empno in (7369,7900) ;
ROWID EMPNO
------------------ ----------
AAAMfPAAEAAAAAgAAA 7369
AAAMfPAAEAAAAAgAAL 7900
|
对照一下上面的表格,注意一下 update 语句的顺序,你就明白了。
大家可以类比一下之前所说的故事,假设empno为 7369 和 7900 的行是纸和笔,session 1 和 session 2是小明和小强,最后老师Oracle跑出来调停。
解决死锁的方法之一是给资源编号,然后按照固定的顺序进行访问,简单来说,就是先改编号小的再改编号大的(翻过来亦然),如下表:
+--------------------------------------------+--------------------------------------------+ | Session 1 (sid=126) | Session 2 (sid=128) | +--------------------------------------------+--------------------------------------------+ | update emp set sal=sal+1 where empno=7369; | | +--------------------------------------------+--------------------------------------------+ | | update emp set sal=sal+1 where empno=7369; | | | Waiting.... | +--------------------------------------------+--------------------------------------------+ | update emp set sal=sal+1 where empno=7900; | | +--------------------------------------------+--------------------------------------------+ | commit/rollback; | | +--------------------------------------------+--------------------------------------------+ | | update emp set sal=sal+1 where empno=7900; | +--------------------------------------------+--------------------------------------------+
大家可以看到加了两次工资,而且session 2 被 session 1 阻塞了。对于用户感受不好,如果session 1 一直结束事务(commit/rollback),session 2 只能一直等啊等,比deadlock后,oracle出面调停还要糟糕,那怎么办呢?
可以 select … for update nowait 语句测试一下需要更改的行是否被锁定,如果没有被锁定那这个语句会马上给这行加锁,如果该已经被锁定那就马上返回 ORA-00054: resource busy and acquire with NOWAIT specified ,如下表所示:
+--------------------------------------------+--------------------------------------------+ | Session 1 (sid=126) | Session 2 (sid=128) | +--------------------------------------------+--------------------------------------------+ | select * from emp where empno in(7369,7900)| | | for update nowait ; | | +--------------------------------------------+--------------------------------------------+ | | select * from emp where empno in(7369,7900)| | | for update nowait ; | | | ORA-00054: resource busy and acquire | | | with NOWAIT specified | +--------------------------------------------+--------------------------------------------+ | update emp set sal=sal+1 where empno=7369; | | +--------------------------------------------+--------------------------------------------+ | update emp set sal=sal+1 where empno=7900; | | +--------------------------------------------+--------------------------------------------+
有人可能会说,这样做我的程序改动太大了,毕竟要引入一个select … for update nowait 和 ORA-00054 的判断,有没有更好的办法呢?
有,更经典的处理死锁的算法——“鸵鸟算法”,简单来说就是“不管”,反正Oracle最终会出来调停的。
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/27042095/viewspace-759451/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/27042095/viewspace-759451/