今天 ,收到项目组通知说发生了死锁,让我查一下死锁出现的原因 。
首先 ,登录数据库,查看trace日志所在路径
SYS @ cams > show parameter dump ;
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
background_core_dump string partial
background_dump_dest string / u01 / app / oracle / diag / rdbms / cams / cams / trace
core_dump_dest string / u01 / app / oracle / diag / rdbms / cams / cams / cdump
max_dump_file_size string unlimited
shadow_core_dump string PARTIAL
user_dump_dest string / u01 / app / oracle / diag / rdbms / cams / cams / trace
查看alert 日志,搜索 00060
[ oracle@db trace ] $ cd / u01 / app / oracle / diag / rdbms / cams / cams / trace /
[ oracle@db trace ] $ vi alert_cams . log
找到 报错trace日志如下:
ORA-00060 : Deadlock detected . More info in file / u01 / app / oracle / diag / rdbms / cams / cams / trace / cams_ora_12850 . trc .
Tue Sep 12 13 : 28 : 31 2017
ORA-00060 : Deadlock detected . More info in file / u01 / app / oracle / diag / rdbms / cams / cams / trace / cams_ora_12854 . trc .
Tue Sep 12 13 : 28 : 36 2017
ORA-00060 : Deadlock detected . More info in file / u01 / app / oracle / diag / rdbms / cams / cams / trace / cams_ora_12822 . trc .
Tue Sep 12 13 : 31 : 12 2017
打开 其中一个 trace文件,部分内容如下 :
Trace file / u01 / app / oracle / diag / rdbms / cams / cams / trace / cams_ora_12850 . trc
Oracle Database 11g Enterprise Edition Release 11 . 2 . . 4 . 0 - 64bit Production
With the Partitioning , OLAP , Data Mining and Real Application Testing options
ORACLE_HOME = / u01 / app / oracle / product / 11 . 2 . / db_1
System name : Linux
Node name : db
Release : 2 . 6 . 32-431 . el6 . x86_64
Version : # 1 SMP Sun Nov 10 22 : 19 : 54 EST 2013
Machine : x86_64
VM name : VMWare Version : 6
Instance name : cams
Redo thread mounted by this instance : 1
Oracle process number : 293
Unix process pid : 12850 , image : oracle@db
* * * 2017-09-12 13 : 28 : 28 . 401
* * * SESSION ID : ( 1139 . 47765 ) 2017-09-12 13 : 28 : 28 . 401
* * * CLIENT ID : ( ) 2017-09-12 13 : 28 : 28 . 401
* * * SERVICE NAME : ( SYS$USERS ) 2017-09-12 13 : 28 : 28 . 401
* * * MODULE NAME : ( JDBC Thin Client ) 2017-09-12 13 : 28 : 28 . 401
* * * ACTION NAME : ( ) 2017-09-12 13 : 28 : 28 . 401
* * * 2017-09-12 13 : 28 : 28 . 401
DEADLOCK DETECTED ( ORA-00060 )
[ Transaction Deadlock ]
这里指出死锁不是Oracle自身的错误,是用户的设计造成的问题
The following deadlock is not an ORACLE error . It is a
deadlock due to user error in the design of an application
or from issuing incorrect ad-hoc SQL . The following
information may aid in determining the deadlock :
这个死锁图表明两个session各持有一份资源,等待对方持有的资源
Deadlock graph :
---------Blocker ( s ) - - - - - - - - ---------Waiter ( s ) - - - - - - - - -
Resource Name process session holds waits process session holds waits
TX-00160018-00000adc 293 1139 X 384 65 X
TX-0004000f-000064d6 384 65 X 293 1139 X
session 1139 : DID 0001-0125-000074A5 session 65 : DID 0001-0180-000037EC
session 65 : DID 0001-0180-000037EC session 1139 : DID 0001-0125-000074A5
Rows waited on :
Session 1139 : obj - rowid = 00021C7B - AAAhx7AAHAAARMrAAB
( dictionary objn - 138363 , file - 7 , block - 70443 , slot - 1 )
Session 65 : obj - rowid = 00021C7B - AAAhx7AAHAAARMtAAD
( dictionary objn - 138363 , file - 7 , block - 70445 , slot - 3 )
----- Information for the OTHER waiting sessions -----
Session 65 :
sid : 65 ser : 27869 audsid : 3026187 user : 111 / CAMS_CORE
flags : ( 0x45 ) USR / - flags_idl : ( 0x1 ) BSY / - / - / - / - / -
flags2 : ( 0x40009 ) - / - / INC
pid : 384 O / S info : user : oracle , term : UNKNOWN , ospid : 12854
image : oracle@db
client details :
O / S info : user : cams , term : unknown , ospid : 1234
machine : yy program : JDBC Thin Client
application name : JDBC Thin Client , hash value = 2546894660
current SQL :
导致死锁的SQL语句1
select
SEQ_NO , ACO_AC , AC_DTE , CCY , BAL , FRZ_AMT ,
CRE_TLR , CRE_DTE ,
CRE_BR , CRE_BK , UPD_TLR , UPD_DTE , UPD_BR , UPD_BK , TSL_AMT , VPS_BAL
FROM CD_ACBALANCE
WHERE ACO_AC = : 1
FOR UPDATE
----- End of information for the OTHER waiting sessions -----
Information for THIS session :
----- Current SQL Statement for this session ( sql_id = cfy88pmyts0fn ) -----
导致死锁的SQL语句2
select
SEQ_NO , ACO_AC , AC_DTE , CCY , BAL , FRZ_AMT ,
CRE_TLR , CRE_DTE ,
CRE_BR , CRE_BK , UPD_TLR , UPD_DTE , UPD_BR , UPD_BK , TSL_AMT , VPS_BAL
FROM CD_ACBALANCE
WHERE ACO_AC = : 1
FOR UPDATE
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
这里可以明确的是 select for update语句造成了死锁。但是考虑到环境为测试环境,没开启归档,不能通过logminer挖掘故障期间的SQL事物,而且开发人员已经通过kill session的方式处理了死锁,所以需要在别的环境中重现该问题,并进行分析。
首先连上数据库,启用HR用户
[ oracle@prod ~ ] $ sqlplus / as sysdba
SQL * Plus : Release 11 . 2 . . 4 . 0 Production on Wed Sep 13 03 : 14 : 17 2017
Copyright ( c ) 1982 , 2013 , Oracle . All rights reserved .
Connected to an idle instance .
SYS@PROD > startup ;
ORACLE instance started .
Total System Global Area 1202556928 bytes
Fixed Size 2252704 bytes
Variable Size 771752032 bytes
Database Buffers 419430400 bytes
Redo Buffers 9121792 bytes
Database mounted .
Database opened .
SYS@PROD > conn hr / hr
ERROR :
ORA-28000 : the account is locked
Warning : You are no longer connected to ORACLE .
@ > conn / as sysdba
Connected .
SYS@PROD > alter user hr identified by hr account unlock ;
User altered .
SYS@PROD > conn hr / hr
Connected .
选中COUNTRIES表作为测试对象
HR@PROD > desc countries ;
Name Null ? Type
----------------------------------------- -------- ----------------------------
COUNTRY_ID NOT NULL CHAR ( 2 )
COUNTRY_NAME VARCHAR2 ( 40 )
REGION_ID NUMBER
HR@PROD > select count ( * ) from countries ;
COUNT ( * )
----------
25
HR@PROD > select count ( distinct ( COUNTRY_ID ) ) from countries ;
COUNT ( DISTINCT ( COUNTRY_ID ) )
---------------------------
25
经过验证,COUNTRIES表中的COUNTRY_ID字段是唯一的。
用COUNTRIES表新建一个test表,在test表中做测试
HR@PROD > create table test as select * from countries ;
Table created .
HR@PROD > desc test
Name Null ? Type
----------------------------------------- -------- ----------------------------
COUNTRY_ID NOT NULL CHAR ( 2 )
COUNTRY_NAME VARCHAR2 ( 40 )
REGION_ID NUMBER
HR@PROD > select country_id , country_name , region_id from countries ;
CO COUNTRY_NAME REGION_ID
-- ---------------------------------------- ----------
AR Argentina 2
AU Australia 3
BE Belgium 1
BR Brazil 2
CA Canada 2
CH Switzerland 1
CN China 3
DE Germany 1
DK Denmark 1
EG Egypt 4
FR France 1
CO COUNTRY_NAME REGION_ID
-- ---------------------------------------- ----------
IL Israel 4
IN India 3
IT Italy 1
JP Japan 3
KW Kuwait 4
ML Malaysia 3
MX Mexico 2
NG Nigeria 4
NL Netherlands 1
SG Singapore 3
UK United Kingdom 1
CO COUNTRY_NAME REGION_ID
-- ---------------------------------------- ----------
US United States of America 2
ZM Zambia 4
ZW Zimbabwe 4
25 rows selected .
情形1:
两个select for update语句想要同时锁定一条语句。
Session1:
HR@PROD > select country_id , country_name , region_id from test where country_id = 'AU' for update ;
CO COUNTRY_NAME REGION_ID
-- ---------------------------------------- ----------
AU Australia 3
Sessions2:
HR@PROD > select country_id , country_name , region_id from test where country_id = 'AU' for update ;
这时,session2会一直处于锁等待状态,而不会出现死锁。当把第一个事物commit或者rollback之后,第二个事物会继续执行。
Session1:
HR@PROD > rollback ;
Rollback complete .
Session2:
HR@PROD > select country_id , country_name , region_id from test where country_id = 'AU' for update ;
CO COUNTRY_NAME REGION_ID
-- ---------------------------------------- ----------
AU Australia
显然,两个select for update语句想要同时锁定一条语句,并不会出现死锁,而会出现锁等待的现象。
情形2:
第一个select for update事物锁定A row后,想要再锁定B row;第二个select for update事物锁定B row后,想要锁定A row。
Session1:
HR@PROD > select country_id , country_name , region_id from test where country_id = 'AU' for update ;
CO COUNTRY_NAME REGION_ID
-- ---------------------------------------- ----------
AU Australia 3
Session 2:
HR@PROD > select country_id , country_name , region_id from test where country_id = 'UK' for update ;
CO COUNTRY_NAME REGION_ID
-- ---------------------------------------- ----------
UK United Kingdom 1
如果Session1想要继续持有Session2正在持有的row:
Session1:
HR@PROD > select country_id , country_name , region_id from test where country_id = 'UK' for update ;
这时,会出现锁等待的现场,和上一场景类似。如果Session2也想持有Session1正在持有的row:
Session2:
HR@PROD > select country_id , country_name , region_id from test where country_id = 'AU' for update ;
这时,Session2还处于等待状态,但是Session1出现死锁:
Session1:
HR@PROD > select country_id , country_name , region_id from test where country_id = 'UK' for update ;
select country_id , country_name , region_id from test where country_id = 'UK' for update
*
ERROR at line 1 :
ORA-00060 : deadlock detected while waiting for resource
这时,死锁的现象已经重现,而且,场景2从头到尾只使用了一个select for update语句,只是换了参数而已。
通过检查数据库表,能够检查出是哪一条语句被死锁,产生死锁的机器是哪一台。
1)用dba用户执行以下语句
select c . owner , c . object_name , c . object_type , b . sid ,
b . serial# , b . lockwait , b . status , b . osuser , b . machine , b . process , b . program
from v$locked_object a ,
v$ session b ,
dba_objects c
where b . sid = a . session_id
and a . object_id = c . object_id ;
如果有输出的结果,则说明有死锁,且能看到死锁的机器是哪一台。字段说明:
OWNER:死锁语句所用的数据库用户,这里是HR用户。
OBJECT_NAME:产生死锁的对象,这里是TEST表。
OBJECT_TYPE:产生死锁的对象类型,这里是TABLE。
SID:SESSION标识,常用于连接 其它 列 。
SERIAL#: SID有可能会重复,当两个session的SID重复时,SERIAL#用来区别session
LOCKWAIT: 可以通过这个字段查询出当前正在等待的锁的相关信息 ,如果有内容表示被死锁或者有锁等待事件。
STATUS: 用来判断session状态。 ACTIVE :正执行 SQL语句 。 INACTIVE :等待操作。 KILLED :被标注为杀死。
OSUSER:客户端操作系统用户名。
MACHINE:客户端操作系统的机器名。
PROCESS:客户端进程的ID。
PROGRAM:客户端执行的应用程序。
2)用dba用户执行以下语句,可以查看到被死锁的语句。
select sql_text from v$sql where ( address , hash_value ) in
( select sql_address , sql_hash_value from v$ session where sid in
( select session_id from v$locked_object ) ) ;
或者
select sql_text from v$sql where ( address , hash_value ) in
( select sql_address , sql_hash_value from v$ session where lockwait is not null ) ;
这里查出来出现死锁的语句和之前的测试结果一致。
3) 分析trace日志文件
死锁发生时,可以再trace日志文件中找到如下语句:
ORA-00060 : Deadlock detected . More info in file / u01 / app / oracle / diag / rdbms / prod / PROD / trace / PROD_ora_4852 . trc .
打开trace文件,可以找到死锁的语句:
----- Information for the OTHER waiting sessions -----
Session 31 :
sid : 31 ser : 25 audsid : 110280 user : 84 / HR
flags : ( 0x45 ) USR / - flags_idl : ( 0x1 ) BSY / - / - / - / - / -
flags2 : ( 0x40009 ) - / - / INC
pid : 23 O / S info : user : oracle , term : UNKNOWN , ospid : 4853
image : oracle@prod ( TNS V1-V3 )
client details :
O / S info : user : oracle , term : pts / 1 , ospid : 3149
machine : prod program : sqlplus@prod ( TNS V1-V3 )
application name : SQL * Plus , hash value = 3669949024
current SQL :
select country_id , country_name , region_id from test where country_id = 'AU' for update
----- End of information for the OTHER waiting sessions -----
Information for THIS session :
----- Current SQL Statement for this session ( sql_id = 6n5kjs2twrwwq ) -----
select country_id , country_name , region_id from test where country_id = 'UK' for update
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
1)查找死锁的进程:
select sid , serial# , username , command , lockwait , osuser from v$ session where lockwait is not null ;
2)kill掉这个死锁的进程:
alter system kill session 'sid, serial#' ;
这里执行语句为:alter system kill session '29,69';
原先造成死锁的进程被killed。
3) 如果还不能解决,使用杀系统进程的方式处理:
这里为了测试,再次重现了死锁,并使用杀进程方式进行处理。
select p . spid from v$ session s , v$process p where s . sid = XXX and s . paddr = p . addr ;
其中,XXX使用第一步查出来的SID参数替代
这里查出来的进程号为:4257
[ oracle@prod ~ ] $ ps -ef | grep 4257
oracle 4257 3149 0 07 : 08 ? 00 : 00 : 00 oraclePROD ( DESCRIPTION = ( LOCAL = YES ) ( ADDRESS = ( PROTOCOL = beq ) ) )
oracle 4321 4000 0 07 : 21 pts / 4 00 : 00 : 00 grep 4257
[ oracle@prod ~ ] $ kill -9 4257
[ oracle@prod ~ ] $ ps -ef | grep 4257
oracle 4327 4000 0 07 : 21 pts / 4 00 : 00 : 00 grep 4257
杀完进程之后,造成死锁的进程被杀死
查不到死锁进程
SYS@PROD > shutdown immediate ;
Database closed .
Database dismounted .
ORACLE instance shut down .
SYS@PROD > startup ;
ORACLE instance started .
Total System Global Area 1202556928 bytes
Fixed Size 2252704 bytes
Variable Size 771752032 bytes
Database Buffers 419430400 bytes
Redo Buffers 9121792 bytes
Database mounted .
Database opened .
查看死锁进程:
Session1:
HR@PROD > select country_id , country_name , region_id from test where country_id = 'AU' for update ;
CO COUNTRY_NAME REGION_ID
-- ---------------------------------------- ----------
AU Australia 3
Session2:
HR@PROD > select country_id , country_name , region_id from test where country_id = 'UK' for update ;
CO COUNTRY_NAME REGION_ID
-- ---------------------------------------- ----------
UK United Kingdom 1
Session1:
HR@PROD > select country_id , country_name , region_id from test where country_id = 'UK' for update ;
Waiting……
Session2:
HR@PROD > select country_id , country_name , region_id from test where country_id = 'AU' for update ;
Waiting……
Session1:
HR@PROD > select country_id , country_name , region_id from test where country_id = 'UK' for update ;
select country_id , country_name , region_id from test where country_id = 'UK' for update
*
ERROR at line 1 :
ORA-00060 : deadlock detected while waiting for resource
这里Session1出现死锁,只要执行commit或者rollback就可以解除死锁,只不过事务中第一个SQL执行成功,第二个SQL执行失败。
Session1:
HR@PROD > commit ;
Commit complete .
Session2:
HR@PROD > select country_id , country_name , region_id from test where country_id = 'AU' for update ;
CO COUNTRY_NAME REGION_ID
-- ---------------------------------------- ----------
AU Australia 3
此时,死锁状态解除:
最后,通过与开发人员交流,得出的结论是出现该问题的原因不是程序设计的问题,而是在开发环境中,有人通过debug模式连到服务器上进行代码调试,有人通过客户端的形式访问服务器上的应用,当两者同时调试某一功能时(通过select for update的语句进行一张表中的数据访问),在特殊的情况下,出现了死锁的情况。
当然,在开发环境中比较容易出现这类情况,如果需要避免,就需要使用该开发环境的开发人员与测试人员做好沟通了。对于容易出现锁表的功能,可以要求测试人员在测试环境中测试,除特殊情况,尽量减少在开发环境中测试的次数,以免锁表影响开发测试进度。
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/31394774/viewspace-2144941/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/31394774/viewspace-2144941/