首页 > 代码库 > 数据库死锁严重引发中间件连接池满故障诊断
数据库死锁严重引发中间件连接池满故障诊断
1、故障现象
前台系统应用无法登陆,weblogic服务器应用程序的运行状态显示为overload,线程连接池满。
2、故障原因分析
根据上述故障现象,分析基础可以确定为是Weblogic有过多的连接连到数据库,因为会话一直保持未释放,将连接池占满后,导致新的连接无法请求到连接池。在此关键是分析为什么会有大量的会话占满连接池而不释放。
3、问题分析过程
3.1 session数超过1000
| Snap Id | Snap Time | Sessions | Cursors/Session |
Begin Snap: | 38194 | 2014/12/10 8:00 | 1010 | 13.8 |
End Snap: | 38196 | 2014/12/10 10:00 | 1004 | 14.4 |
Elapsed: |
| 119.65 (mins) |
|
|
DB Time: |
| 20,686.23 (mins) |
|
|
3.2 每秒执行次数超过15000多次
| Per Second | Per Transaction |
Redo size: | 18,144,642.95 | 459,254.26 |
Logical reads: | 199,155.23 | 5,040.77 |
Block changes: | 100,209.94 | 2,536.39 |
Physical reads: | 5,241.51 | 132.67 |
Physical writes: | 2,182.48 | 55.24 |
User calls: | 16,134.45 | 408.37 |
Parses: | 177.18 | 4.48 |
Hard parses: | 69.02 | 1.75 |
Sorts: | 106.97 | 2.71 |
Logons: | 0.58 | 0.01 |
Executes: | 15,955.55 | 403.85 |
Transactions: | 39.51 |
|
3.3 94%的等待时间都是花在“enq: TX -row lock contention”
Event | Waits | Time(s) | Avg Wait(ms) | % Total Call Time | Wait Class |
enq: TX - row lock contention | 2,419,702 | 1,174,344 | 485 | 94.6 | Application |
CPU time |
| 28,795 |
| 2.3 |
|
db file sequential read | 2,507,054 | 13,011 | 5 | 1 | User I/O |
db file scattered read | 2,358,449 | 11,234 | 5 | 0.9 | User I/O |
db file parallel write | 475,780 | 7,760 | 16 | 0.6 | System I/O |
3.4 消耗时间最长的SQL语句
Elapsed Time (s) | CPU Time (s) | Executions | Elap per Exec (s) | % Total DB Time | SQL Id | SQL Text |
744,943 | 51 | 226 | 3296.21 | 60.02 | 2d0msz9tv9z5c | DELETE FROM PUB_XXLINE WHERE U... |
429,987 | 29 | 483 | 890.24 | 34.64 | 1yucnh7p2mjdk | DELETE FROM PUB_XXLINE WHERE S... |
2d0msz9tv9z5c文本如下:
DELETE FROM PUB_XXLINE WHERE USER_ID=:1
1yucnh7p2mjdk文本如下:
DELETE FROM PUB_XXLINE WHERE SESSION_ID=:1
以上两条SQL语句产生严重行争用,并且在会话级产生死锁
3.5 数据库内验证会话死锁
(1)查看被阻塞的sid:800会话是被哪个会话所阻塞
select sid,serial#,sql_id,status,blocking_session from v$sessionwhere sid=800;
sid | serial# | sql_id | status | blocking_session |
800 | 28777 | 2d0msz9tv9z5c | ACTIVE | 347 |
从上面看来,sid:800执行的SQL语句的ID是2d0msz9tv9z5c,它被session id 347所阻塞
(2)查看session id 347是否被阻塞
select sid,serial#,sql_id,status,blocking_session from v$sessionwhere sid=347;
sid | serial# | sql_id | status | blocking_session |
347 | 15367 | 2d0msz9tv9z5c | ACTIVE | 800 |
Session 347执行的语句的ID也是2d0msz9tv9z5c,它被sid 800所阻塞,在这里已经形成死锁。
3.6 数据库日志中发现大量死锁信息
Wed Dec 10 09:55:05 2014 Global Enqueue Services Deadlock detected. More info in file /oracle/admin/XXTH/bdump/XXth1_lmd0_1541770.trc. Wed Dec 10 09:55:36 2014 Thread 1 advanced to log sequence 493665 (LGWR switch) Current log# 16 seq# 493665 mem# 0: +REDOGROUP/XXth/onlinelog/group_16.263.837169415 Wed Dec 10 09:56:04 2014 Thread 1 advanced to log sequence 493666 (LGWR switch) Current log# 14 seq# 493666 mem# 0: +REDOGROUP/XXth/onlinelog/group_14.261.837169357 Wed Dec 10 09:56:05 2014 Global Enqueue Services Deadlock detected. More info in file /oracle/admin/XXTH/bdump/XXth1_lmd0_1541770.trc. |
4、问题原因诊断总结
语句DELETEFROM PUB_XXLINE WHERE USER_ID=:1每次操作的数据行有多行,多个会话同时执行该语句,在数据库内形成死锁,并且玥塞单次删除一行数据的语句DELETEFROM PUB_XXLINE WHERE SESSION_ID=:1从而使Weblogic(中间件)的连接池消耗殆尽。无法新增连接。
本文作者:黎俊杰(网名:踩点),从事”系统架构、操作系统、存储设备、数据库、中间件、应用程序“六个层面系统性的性能优化工作
欢迎加入 系统性能优化专业群,共同探讨性能优化技术。群号:258187244
数据库死锁严重引发中间件连接池满故障诊断