数据库死锁严重引发中间件连接池满故障诊断

1、故障现象

      前台系统应用无法登陆,weblogic服务器应用程序的运行状态显示为overload,线程连接池满。

2、故障原因分析

       根据上述故障现象,分析基础可以确定为是Weblogic有过多的连接连到数据库,因为会话一直保持未释放,将连接池占满后,导致新的连接无法请求到连接池。在此关键是分析为什么会有大量的会话占满连接池而不释放。

3、问题分析过程

3.1 session数超过1000

 

Snap Id

Snap Time

Sessions

Cursors/Session

Begin Snap:

38194

2014/12/10 8:00

1010

13.8

End Snap:

38196

2014/12/10 10:00

1004

14.4

Elapsed:

 

119.65 (mins)

 

 

DB Time:

 

20,686.23 (mins)

 

 

3.2 每秒执行次数超过15000多次

  

Per Second

Per Transaction

Redo size:

18,144,642.95

459,254.26

Logical reads:

199,155.23

5,040.77

Block changes:

100,209.94

2,536.39

Physical reads:

5,241.51

132.67

Physical writes:

2,182.48

55.24

User calls:

16,134.45

408.37

Parses:

177.18

4.48

Hard parses:

69.02

1.75

Sorts:

106.97

2.71

Logons:

0.58

0.01

Executes:

15,955.55

403.85

Transactions:

39.51

 

 

3.3 94%的等待时间都是花在“enq: TX -row lock contention”

Event

Waits

Time(s)

Avg Wait(ms)

% Total Call Time

Wait Class

enq: TX - row lock contention

2,419,702

1,174,344

485

94.6

Application

CPU time

 

28,795

 

2.3

 

db file sequential read

2,507,054

13,011

5

1

User I/O

db file scattered read

2,358,449

11,234

5

0.9

User I/O

db file parallel write

475,780

7,760

16

0.6

System I/O

 

3.4 消耗时间最长的SQL语句

Elapsed Time (s)

CPU Time (s)

Executions

Elap per Exec (s)

% Total DB Time

SQL Id

SQL Text

744,943

51

226

3296.21

60.02

2d0msz9tv9z5c

DELETE FROM PUB_XXLINE WHERE U...

429,987

29

483

890.24

34.64

1yucnh7p2mjdk

DELETE FROM PUB_XXLINE WHERE S...

2d0msz9tv9z5c文本如下:

  DELETE FROM PUB_XXLINE WHERE USER_ID=:1

1yucnh7p2mjdk文本如下:

  DELETE FROM PUB_XXLINE WHERE SESSION_ID=:1

以上两条SQL语句产生严重行争用,并且在会话级产生死锁

3.5 数据库内验证会话死锁

(1)查看被阻塞的sid:800会话是被哪个会话所阻塞

select sid,serial#,sql_id,status,blocking_session from v$sessionwhere sid=800;

sid

serial#

sql_id

status

blocking_session

800

28777

2d0msz9tv9z5c

ACTIVE

347

从上面看来,sid:800执行的SQL语句的ID是2d0msz9tv9z5c,它被session id 347所阻塞

(2)查看session id 347是否被阻塞

select sid,serial#,sql_id,status,blocking_session from v$sessionwhere sid=347;

sid

serial#

sql_id

status

blocking_session

347

15367

2d0msz9tv9z5c

ACTIVE

800

Session 347执行的语句的ID也是2d0msz9tv9z5c,它被sid 800所阻塞,在这里已经形成死锁。

3.6 数据库日志中发现大量死锁信息

Wed Dec 10 09:55:05 2014

Global Enqueue Services Deadlock detected. More info in file

 /oracle/admin/XXTH/bdump/XXth1_lmd0_1541770.trc.

Wed Dec 10 09:55:36 2014

Thread 1 advanced to log sequence 493665 (LGWR switch)

 Current log# 16 seq# 493665 mem# 0: +REDOGROUP/XXth/onlinelog/group_16.263.837169415

Wed Dec 10 09:56:04 2014

Thread 1 advanced to log sequence 493666 (LGWR switch)

 Current log# 14 seq# 493666 mem# 0: +REDOGROUP/XXth/onlinelog/group_14.261.837169357

Wed Dec 10 09:56:05 2014

Global Enqueue Services Deadlock detected. More info in file

 /oracle/admin/XXTH/bdump/XXth1_lmd0_1541770.trc.

 

4、问题原因诊断总结

       语句DELETEFROM PUB_XXLINE WHERE USER_ID=:1每次操作的数据行有多行,多个会话同时执行该语句,在数据库内形成死锁,并且玥塞单次删除一行数据的语句DELETEFROM PUB_XXLINE WHERE SESSION_ID=:1从而使Weblogic(中间件)的连接池消耗殆尽。无法新增连接。

 

 

本文作者:黎俊杰(网名:踩点),从事”系统架构、操作系统、存储设备、数据库、中间件、应用程序“六个层面系统性的性能优化工作

欢迎加入 系统性能优化专业群,共同探讨性能优化技术。群号:258187244

你可能感兴趣的:(ORACLE,性能调优,ORACLE维护管理)