select for update语句造成ORA-00060 deadlock死锁问题分析

测试环境出现问题

今天 ,收到项目组通知说发生了死锁,让我查一下死锁出现的原因

首先 ,登录数据库,查看trace日志所在路径

  1. SYS @ cams > show parameter dump ;

  2. NAME TYPE VALUE

  3. ------------------------------------ ----------- ------------------------------

  4. background_core_dump      string  partial

  5. background_dump_dest      string / u01 / app / oracle / diag / rdbms / cams / cams / trace

  6. core_dump_dest      string / u01 / app / oracle / diag / rdbms / cams / cams / cdump

  7. max_dump_file_size      string  unlimited

  8. shadow_core_dump      string  PARTIAL

  9. user_dump_dest      string / u01 / app / oracle / diag / rdbms / cams / cams / trace

查看alert 日志,搜索 00060

  1. [ oracle@db trace ] $ cd / u01 / app / oracle / diag / rdbms / cams / cams / trace /

  2. [ oracle@db trace ] $ vi alert_cams . log

找到 报错trace日志如下:

  1. ORA-00060 : Deadlock detected . More info in file / u01 / app / oracle / diag / rdbms / cams / cams / trace / cams_ora_12850 . trc .

  2. Tue Sep 12 13 : 28 : 31 2017

  3. ORA-00060 : Deadlock detected . More info in file / u01 / app / oracle / diag / rdbms / cams / cams / trace / cams_ora_12854 . trc .

  4. Tue Sep 12 13 : 28 : 36 2017

  5. ORA-00060 : Deadlock detected . More info in file / u01 / app / oracle / diag / rdbms / cams / cams / trace / cams_ora_12822 . trc .

  6. Tue Sep 12 13 : 31 : 12 2017

打开 其中一个 trace文件,部分内容如下

  1. Trace file / u01 / app / oracle / diag / rdbms / cams / cams / trace / cams_ora_12850 . trc

  2. Oracle Database 11g Enterprise Edition Release 11 . 2 . . 4 . 0 - 64bit Production

  3. With the Partitioning , OLAP , Data Mining and Real Application Testing options

  4. ORACLE_HOME = / u01 / app / oracle / product / 11 . 2 . / db_1

  5. System name : Linux

  6. Node name : db

  7. Release : 2 . 6 . 32-431 . el6 . x86_64

  8. Version : # 1 SMP Sun Nov 10 22 : 19 : 54 EST 2013

  9. Machine : x86_64

  10. VM name : VMWare Version : 6

  11. Instance name : cams

  12. Redo thread mounted by this instance : 1

  13. Oracle process number : 293

  14. Unix process pid : 12850 , image : oracle@db

  15. * * * 2017-09-12 13 : 28 : 28 . 401

  16. * * * SESSION ID : ( 1139 . 47765 ) 2017-09-12 13 : 28 : 28 . 401

  17. * * * CLIENT ID : ( ) 2017-09-12 13 : 28 : 28 . 401

  18. * * * SERVICE NAME : ( SYS$USERS ) 2017-09-12 13 : 28 : 28 . 401

  19. * * * MODULE NAME : ( JDBC Thin Client ) 2017-09-12 13 : 28 : 28 . 401

  20. * * * ACTION NAME : ( ) 2017-09-12 13 : 28 : 28 . 401

  21.  

  22.  

  23. * * * 2017-09-12 13 : 28 : 28 . 401

  24. DEADLOCK DETECTED ( ORA-00060 )

  25.  

  26. [ Transaction Deadlock ]

  27.   这里指出死锁不是Oracle自身的错误,是用户的设计造成的问题

  28. The following deadlock is not an ORACLE error . It is a

  29. deadlock due to user error in the design of an application

  30. or from issuing incorrect ad-hoc SQL . The following

  31. information may aid in determining the deadlock :

  32.  

  33. 这个死锁图表明两个session各持有一份资源,等待对方持有的资源

  34. Deadlock graph :

  35.                        ---------Blocker ( s ) - - - - - - - - ---------Waiter ( s ) - - - - - - - - -

  36. Resource Name process session holds waits  process session holds waits

  37. TX-00160018-00000adc       293    1139     X            384      65           X

  38. TX-0004000f-000064d6       384      65     X            293    1139           X

  39.  

  40. session 1139 : DID 0001-0125-000074A5 session 65 : DID 0001-0180-000037EC

  41. session 65 : DID 0001-0180-000037EC session 1139 : DID 0001-0125-000074A5

  42.  

  43. Rows waited on :

  44.   Session 1139 : obj - rowid = 00021C7B - AAAhx7AAHAAARMrAAB

  45.    ( dictionary objn - 138363 , file - 7 , block - 70443 , slot - 1 )

  46.   Session 65 : obj - rowid = 00021C7B - AAAhx7AAHAAARMtAAD

  47.    ( dictionary objn - 138363 , file - 7 , block - 70445 , slot - 3 )

  48.  

  49. ----- Information for the OTHER waiting sessions -----

  50. Session 65 :

  51.   sid : 65 ser : 27869 audsid : 3026187 user : 111 / CAMS_CORE

  52.     flags : ( 0x45 ) USR / - flags_idl : ( 0x1 ) BSY / - / - / - / - / -

  53.     flags2 : ( 0x40009 ) - / - / INC

  54.   pid : 384 O / S info : user : oracle , term : UNKNOWN , ospid : 12854

  55.     image : oracle@db

  56.   client details :

  57.     O / S info : user : cams , term : unknown , ospid : 1234

  58.     machine : yy program : JDBC Thin Client

  59. application name : JDBC Thin Client , hash value = 2546894660

  60.   current SQL :

  61. 导致死锁的SQL语句1

  62.   select

  63.  

  64. SEQ_NO , ACO_AC , AC_DTE , CCY , BAL , FRZ_AMT ,

  65. CRE_TLR , CRE_DTE ,

  66. CRE_BR , CRE_BK , UPD_TLR , UPD_DTE , UPD_BR , UPD_BK , TSL_AMT , VPS_BAL

  67.  

  68. FROM CD_ACBALANCE

  69.  WHERE  ACO_AC = : 1

  70. FOR UPDATE

  71.  

  72. ----- End of information for the OTHER waiting sessions -----

  73.  

  74. Information for THIS session :

  75.  

  76. ----- Current SQL Statement for this session ( sql_id = cfy88pmyts0fn ) -----

  77. 导致死锁的SQL语句2

  78. select

  79.  

  80. SEQ_NO , ACO_AC , AC_DTE , CCY , BAL , FRZ_AMT ,

  81. CRE_TLR , CRE_DTE ,

  82. CRE_BR , CRE_BK , UPD_TLR , UPD_DTE , UPD_BR , UPD_BK , TSL_AMT , VPS_BAL

  83.  

  84. FROM CD_ACBALANCE

  85.  WHERE  ACO_AC = : 1

  86. FOR UPDATE

  87. = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =

这里可以明确的是 select for update语句造成了死锁。但是考虑到环境为测试环境,没开启归档,不能通过logminer挖掘故障期间的SQL事物,而且开发人员已经通过kill session的方式处理了死锁,所以需要在别的环境中重现该问题,并进行分析。

重现死锁问题

首先连上数据库,启用HR用户

  1. [ oracle@prod ~ ] $ sqlplus / as sysdba

  2. SQL * Plus : Release 11 . 2 . . 4 . 0 Production on Wed Sep 13 03 : 14 : 17 2017

  3. Copyright ( c ) 1982 , 2013 , Oracle . All rights reserved .

  4. Connected to an idle instance .

  5. SYS@PROD > startup ;

  6. ORACLE instance started .

  7. Total System Global Area 1202556928 bytes

  8. Fixed Size 2252704 bytes

  9. Variable Size 771752032 bytes

  10. Database Buffers 419430400 bytes

  11. Redo Buffers 9121792 bytes

  12. Database mounted .

  13. Database opened .

  14. SYS@PROD > conn hr / hr

  15. ERROR :

  16. ORA-28000 : the account is locked

  17. Warning : You are no longer connected to ORACLE .

  18. @ > conn / as sysdba

  19. Connected .

  20. SYS@PROD > alter user hr identified by hr account unlock ;

  21. User altered .

  22. SYS@PROD > conn hr / hr

  23. Connected .

选中COUNTRIES表作为测试对象

  1. HR@PROD > desc countries ;

  2.   Name Null ? Type

  3.  ----------------------------------------- -------- ----------------------------

  4.  COUNTRY_ID NOT NULL CHAR ( 2 )

  5.  COUNTRY_NAME     VARCHAR2 ( 40 )

  6.  REGION_ID     NUMBER

  7. HR@PROD > select count ( * ) from countries ;

  8.   COUNT ( * )

  9. ----------

  10. 25

  11. HR@PROD > select count ( distinct ( COUNTRY_ID ) ) from countries ;

  12. COUNT ( DISTINCT ( COUNTRY_ID ) )

  13. ---------------------------

  14.  25


经过验证,COUNTRIES表中的COUNTRY_ID字段是唯一的。

用COUNTRIES表新建一个test表,在test表中做测试

  1. HR@PROD > create table test as select * from countries ;

  2. Table created .

  3. HR@PROD > desc test

  4.   Name Null ? Type

  5.  ----------------------------------------- -------- ----------------------------

  6.  COUNTRY_ID NOT NULL CHAR ( 2 )

  7.  COUNTRY_NAME     VARCHAR2 ( 40 )

  8.  REGION_ID     NUMBER

  9. HR@PROD > select country_id , country_name , region_id from countries ;

  10. CO COUNTRY_NAME      REGION_ID

  11. -- ---------------------------------------- ----------

  12. AR Argentina      2

  13. AU Australia      3

  14. BE Belgium      1

  15. BR Brazil      2

  16. CA Canada      2

  17. CH Switzerland      1

  18. CN China      3

  19. DE Germany      1

  20. DK Denmark      1

  21. EG Egypt      4

  22. FR France      1

  23. CO COUNTRY_NAME      REGION_ID

  24. -- ---------------------------------------- ----------

  25. IL Israel      4

  26. IN India      3

  27. IT Italy      1

  28. JP Japan      3

  29. KW Kuwait      4

  30. ML Malaysia      3

  31. MX Mexico      2

  32. NG Nigeria      4

  33. NL Netherlands      1

  34. SG Singapore      3

  35. UK United Kingdom      1

  36. CO COUNTRY_NAME      REGION_ID

  37. -- ---------------------------------------- ----------

  38. US United States of America      2

  39. ZM Zambia      4

  40. ZW Zimbabwe      4

  41. 25 rows selected .


情形1:

两个select for update语句想要同时锁定一条语句。

Session1:

  1. HR@PROD > select country_id , country_name , region_id from test where country_id = 'AU' for update ;

  2. CO COUNTRY_NAME      REGION_ID

  3. -- ---------------------------------------- ----------

  4. AU Australia      3


Sessions2:

  1. HR@PROD > select country_id , country_name , region_id from test where country_id = 'AU' for update ;


这时,session2会一直处于锁等待状态,而不会出现死锁。当把第一个事物commit或者rollback之后,第二个事物会继续执行。

Session1:

  1. HR@PROD > rollback ;

  2. Rollback complete .


Session2:

  1. HR@PROD > select country_id , country_name , region_id from test where country_id = 'AU' for update ;

  2. CO COUNTRY_NAME      REGION_ID

  3. -- ---------------------------------------- ----------

  4. AU Australia


显然,两个select for update语句想要同时锁定一条语句,并不会出现死锁,而会出现锁等待的现象。

情形2:

第一个select for update事物锁定A row后,想要再锁定B row;第二个select for update事物锁定B row后,想要锁定A row。

Session1:

  1. HR@PROD > select country_id , country_name , region_id from test where country_id = 'AU' for update ;

  2. CO COUNTRY_NAME      REGION_ID

  3. -- ---------------------------------------- ----------

  4. AU Australia      3


Session 2:

  1. HR@PROD > select country_id , country_name , region_id from test where country_id = 'UK' for update ;

  2. CO COUNTRY_NAME      REGION_ID

  3. -- ---------------------------------------- ----------

  4. UK United Kingdom      1


如果Session1想要继续持有Session2正在持有的row:

Session1:

  1. HR@PROD > select country_id , country_name , region_id from test where country_id = 'UK' for update ;


这时,会出现锁等待的现场,和上一场景类似。如果Session2也想持有Session1正在持有的row:

Session2:

  1. HR@PROD > select country_id , country_name , region_id from test where country_id = 'AU' for update ;


这时,Session2还处于等待状态,但是Session1出现死锁:

Session1:

  1. HR@PROD > select country_id , country_name , region_id from test where country_id = 'UK' for update ;

  2. select country_id , country_name , region_id from test where country_id = 'UK' for update

  3. *

  4. ERROR at line 1 :

  5. ORA-00060 : deadlock detected while waiting for resource


这时,死锁的现象已经重现,而且,场景2从头到尾只使用了一个select for update语句,只是换了参数而已。

定位死锁语句

通过检查数据库表,能够检查出是哪一条语句被死锁,产生死锁的机器是哪一台。

1)用dba用户执行以下语句

  1. select c . owner , c . object_name , c . object_type , b . sid ,

  2. b . serial# , b . lockwait , b . status , b . osuser , b . machine , b . process , b . program

  3. from v$locked_object a ,

  4. v$ session b ,

  5. dba_objects c

  6. where b . sid = a . session_id

  7. and a . object_id = c . object_id ;


如果有输出的结果,则说明有死锁,且能看到死锁的机器是哪一台。字段说明:

OWNER:死锁语句所用的数据库用户,这里是HR用户。

OBJECT_NAME:产生死锁的对象,这里是TEST表。

OBJECT_TYPE:产生死锁的对象类型,这里是TABLE。

SID:SESSION标识,常用于连接 其它

SERIAL#: SID有可能会重复,当两个session的SID重复时,SERIAL#用来区别session

LOCKWAIT: 可以通过这个字段查询出当前正在等待的锁的相关信息 ,如果有内容表示被死锁或者有锁等待事件。

STATUS: 用来判断session状态。 ACTIVE :正执行 SQL语句 INACTIVE :等待操作。 KILLED :被标注为杀死。

OSUSER:客户端操作系统用户名。

MACHINE:客户端操作系统的机器名。

PROCESS:客户端进程的ID。

PROGRAM:客户端执行的应用程序。

2)用dba用户执行以下语句,可以查看到被死锁的语句。

  1. select sql_text from v$sql where ( address , hash_value ) in

  2. ( select sql_address , sql_hash_value from v$ session where sid in

  3. ( select session_id from v$locked_object ) ) ;


或者


  1. select sql_text from v$sql where ( address , hash_value ) in

  2. ( select sql_address , sql_hash_value from v$ session where lockwait is not null ) ;


这里查出来出现死锁的语句和之前的测试结果一致。

3) 分析trace日志文件

死锁发生时,可以再trace日志文件中找到如下语句:

  1. ORA-00060 : Deadlock detected . More info in file / u01 / app / oracle / diag / rdbms / prod / PROD / trace / PROD_ora_4852 . trc .


打开trace文件,可以找到死锁的语句:

  1. ----- Information for the OTHER waiting sessions -----

  2. Session 31 :

  3.   sid : 31 ser : 25 audsid : 110280 user : 84 / HR

  4.     flags : ( 0x45 ) USR / - flags_idl : ( 0x1 ) BSY / - / - / - / - / -

  5.     flags2 : ( 0x40009 ) - / - / INC

  6.   pid : 23 O / S info : user : oracle , term : UNKNOWN , ospid : 4853

  7.     image : oracle@prod ( TNS V1-V3 )

  8.   client details :

  9.     O / S info : user : oracle , term : pts / 1 , ospid : 3149

  10.     machine : prod program : sqlplus@prod ( TNS V1-V3 )

  11.     application name : SQL * Plus , hash value = 3669949024

  12.   current SQL :

  13.   select country_id , country_name , region_id from test where country_id = 'AU' for update

  14.  

  15. ----- End of information for the OTHER waiting sessions -----

  16.  

  17. Information for THIS session :

  18.  

  19. ----- Current SQL Statement for this session ( sql_id = 6n5kjs2twrwwq ) -----

  20. select country_id , country_name , region_id from test where country_id = 'UK' for update

  21. = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =


解决死锁问题

方法一(杀进程)

1)查找死锁的进程:


  1. select sid , serial# , username , command , lockwait , osuser from v$ session where lockwait is not null ;


2)kill掉这个死锁的进程:


  1. alter system kill session 'sid, serial#' ;


这里执行语句为:alter system kill session '29,69';

原先造成死锁的进程被killed。

select for update语句造成ORA-00060 deadlock死锁问题分析_第1张图片

3) 如果还不能解决,使用杀系统进程的方式处理:

这里为了测试,再次重现了死锁,并使用杀进程方式进行处理。


  1. select p . spid from v$ session s , v$process p where s . sid = XXX and s . paddr = p . addr ;


其中,XXX使用第一步查出来的SID参数替代

这里查出来的进程号为:4257


  1. [ oracle@prod ~ ] $ ps -ef | grep 4257

  2. oracle     4257   3149  0 07 : 08 ? 00 : 00 : 00 oraclePROD ( DESCRIPTION = ( LOCAL = YES ) ( ADDRESS = ( PROTOCOL = beq ) ) )

  3. oracle     4321   4000  0 07 : 21 pts / 4    00 : 00 : 00 grep 4257

  4. [ oracle@prod ~ ] $ kill -9 4257

  5. [ oracle@prod ~ ] $ ps -ef | grep 4257

  6. oracle     4327   4000  0 07 : 21 pts / 4    00 : 00 : 00 grep 4257


杀完进程之后,造成死锁的进程被杀死

select for update语句造成ORA-00060 deadlock死锁问题分析_第2张图片

查不到死锁进程

方法二(重启库)


  1. SYS@PROD > shutdown immediate ;

  2. Database closed .

  3. Database dismounted .

  4. ORACLE instance shut down .

  5. SYS@PROD > startup ;

  6. ORACLE instance started .

  7. Total System Global Area 1202556928 bytes

  8. Fixed Size     2252704 bytes

  9. Variable Size   771752032 bytes

  10. Database Buffers   419430400 bytes

  11. Redo Buffers     9121792 bytes

  12. Database mounted .

  13. Database opened .


查看死锁进程:

方法三(commit || rollback)

Session1:


  1. HR@PROD > select country_id , country_name , region_id from test where country_id = 'AU' for update ;

  2. CO COUNTRY_NAME      REGION_ID

  3. -- ---------------------------------------- ----------

  4. AU Australia      3


Session2:


  1. HR@PROD > select country_id , country_name , region_id from test where country_id = 'UK' for update ;

  2. CO COUNTRY_NAME      REGION_ID

  3. -- ---------------------------------------- ----------

  4. UK United Kingdom      1


Session1:


  1. HR@PROD > select country_id , country_name , region_id from test where country_id = 'UK' for update ;

  2. Waiting……


Session2:


  1. HR@PROD > select country_id , country_name , region_id from test where country_id = 'AU' for update ;

  2. Waiting……


Session1:


  1. HR@PROD > select country_id , country_name , region_id from test where country_id = 'UK' for update ;

  2. select country_id , country_name , region_id from test where country_id = 'UK' for update

  3.                                                *

  4. ERROR at line 1 :

  5. ORA-00060 : deadlock detected while waiting for resource


select for update语句造成ORA-00060 deadlock死锁问题分析_第3张图片

这里Session1出现死锁,只要执行commit或者rollback就可以解除死锁,只不过事务中第一个SQL执行成功,第二个SQL执行失败。

Session1:


  1. HR@PROD > commit ;

  2. Commit complete .


Session2:


  1. HR@PROD > select country_id , country_name , region_id from test where country_id = 'AU' for update ;

  2. CO COUNTRY_NAME      REGION_ID

  3. -- ---------------------------------------- ----------

  4. AU Australia      3


此时,死锁状态解除:

问题总结

最后,通过与开发人员交流,得出的结论是出现该问题的原因不是程序设计的问题,而是在开发环境中,有人通过debug模式连到服务器上进行代码调试,有人通过客户端的形式访问服务器上的应用,当两者同时调试某一功能时(通过select for update的语句进行一张表中的数据访问),在特殊的情况下,出现了死锁的情况。

当然,在开发环境中比较容易出现这类情况,如果需要避免,就需要使用该开发环境的开发人员与测试人员做好沟通了。对于容易出现锁表的功能,可以要求测试人员在测试环境中测试,除特殊情况,尽量减少在开发环境中测试的次数,以免锁表影响开发测试进度。

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/31394774/viewspace-2144941/,如需转载,请注明出处,否则将追究法律责任。

转载于:http://blog.itpub.net/31394774/viewspace-2144941/

你可能感兴趣的:(select for update语句造成ORA-00060 deadlock死锁问题分析)