首页 > 代码库 > ORACLE 11g 由新特性引发lsnrctl hang住卡死迷雾的详细剖析历程
ORACLE 11g 由新特性引发lsnrctl hang住卡死迷雾的详细剖析历程
1、问题描述
同事说卡住了,连接oracle数据库很慢,需要很久,连上了做一个简单的查询也非常慢,感觉像是hang主了一般。
2、分析oracle服务器负载
一开始登录进去,查看oracle服务器,负载很低,服务器毫无压力,感觉不是服务器卡的问题了:
[root@pldb236 data]# w 19:59:47 up 122 days, 4:32, 4 users, load average: 0.65, 0.71, 0.59 USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT root tty1 :0 17Jun16 122days 5:00 5:00 /usr/bin/Xorg :0 -nr -verbose -audit 4 -auth /var/run/gdm/auth-for-gdm-YoRXlJ/database -nolisten tcp vt1 root pts/0 192.168.120.154 12:49 4:42m 0.04s 0.04s -bash root pts/1 192.168.120.154 12:49 0.00s 0.13s 0.05s w root pts/2 192.168.120.154 13:41 6:09m 0.06s 0.02s -bash [root@pldb236 data]# |
3、分析oracle服务器lsnrctl监听
既然不是oracle服务器负载的问题,那么就再看看lsnrctl监听状态,比较慢,而且看到最后有异常信息:
[oracle@pldb236 admin]$ lsnrctl status
LSNRCTL for Linux: Version 11.2.0.1.0 - Production on 17-OCT-2016 13:27:51
Copyright (c) 1991, 2009, Oracle. All rights reserved.
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=10.10.10.104)(PORT=1521)))
# 然后卡住了,卡住了,hang主了,没有任何动弹,需要很久很久才能释放出来。
# 再看下tnsping下服务,也比较慢,发现有错误问题出现,如下所示 [oracle@pldb236 admin]$ tnsping PD236
TNS Ping Utility for Linux: Version 11.2.0.1.0 - Production on 17-OCT-2016 13:21:24
Copyright (c) 1997, 2009, Oracle. All rights reserved.
Used parameter files:
Used TNSNAMES adapter to resolve the alias Attempting to contact (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = 10.10.10.104)(PORT = 1521))) (CONNECT_DATA = http://www.mamicode.com/(SERVICE_NAME = powerdes))) TNS-12547: TNS:lost contact [oracle@pldb236 admin]$
|
4、分析监听lsnrctl的trace跟踪文件
进去lsnrctl管理界面看trace文件目录:
(1)进去lsnrctl管理命令行 [oracle@pldb236 oradata]$ lsnrctl
LSNRCTL for Linux: Version 11.2.0.1.0 - Production on 17-OCT-2016 20:36:34
Copyright (c) 1991, 2009, Oracle. All rights reserved.
Welcome to LSNRCTL, type "help" for information.
LSNRCTL> (2)查看命令 LSNRCTL> show The following operations are available after show An asterisk (*) denotes a modifier or extended command:
rawmode displaymode rules trc_file trc_directory trc_level log_file log_directory log_status current_listener inbound_connect_timeout startup_waittime snmp_visible save_config_on_stop dynamic_registration enable_global_dynamic_endpoint oracle_home pid
(3)查看跟踪文件已经文件目录 LSNRCTL> show trc_file Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=10.10.10.104)(PORT=1521))) LISTENER parameter "trc_file" set to ora_2994_139957275965184.trc The command completed successfully LSNRCTL> LSNRCTL> show trc_directory Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=10.10.10.104)(PORT=1521))) LISTENER parameter "trc_directory" set to /oracle/app/oracle/diag/tnslsnr/pldb236/listener/trace The command completed successfully LSNRCTL> |
然后去看后台的日志,看到显示正常,没有发现路径错误之类的问题:
[root@pldb236 ~]# tail -f /oracle/app/oracle/diag/tnslsnr/pldb236/listener/trace/ora_2994_139957275965184.trc 2016-10-17 13:42:38.443831 : nsbrfr:nsbfs at 0x25d94c0, data at 0x25aa940. 2016-10-17 13:42:38.443843 : nsbrfr:nsbfs at 0x256f3b0, data at 0x2576610. 2016-10-17 13:42:38.443855 : nsbrfr:nsbfs at 0x25a4ac0, data at 0x25af850. 2016-10-17 13:42:38.443867 : nsbrfr:nsbfs at 0x25a4b70, data at 0x25b52c0. 2016-10-17 13:42:38.443880 : nsbrfr:nsbfs at 0x25b1fb0, data at 0x25b8a10. 2016-10-17 13:42:38.443891 : nsbrfr:nsbfs at 0x2568f10, data at 0x2566670. 2016-10-17 13:42:38.443903 : nsbrfr:nsbfs at 0x256afe0, data at 0x256b090. 2016-10-17 13:42:38.443916 : nsbrfr:nsbfs at 0x250fd00, data at 0x2568fc0. 2016-10-17 13:42:38.443937 : nlse_term_audit:entry 2016-10-17 13:42:38.443951 : nlse_term_audit:exit
|
5、出绝招lsnrctl重启
绝招无效,还是很慢,甚至lsnrctl stop以及lsnrctl start都非常慢。
# stop很慢,卡住了,至少需要50秒才能完成 LSNRCTL> stop Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=10.10.10.104)(PORT=1521))) ……卡呀卡……卡到外婆家 The command completed successfully LSNRCTL> start # start会快一些,快到10秒到20秒就完成了。 Starting /oracle/app/oracle/product/11.2.0/dbhome_1/bin/tnslsnr: please wait...
TNSLSNR for Linux: Version 11.2.0.1.0 - Production System parameter file is /oracle/app/oracle/product/11.2.0/dbhome_1/network/admin/listener.ora Log messages written to /oracle/app/oracle/diag/tnslsnr/pldb236/listener/alert/log.xml Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.10.10.104)(PORT=1521))) Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC0)))
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=10.10.10.104)(PORT=1521))) STATUS of the LISTENER ------------------------ Alias LISTENER Version TNSLSNR for Linux: Version 11.2.0.1.0 - Production Start Date 17-OCT-2016 20:46:58 Uptime 0 days 0 hr. 0 min. 0 sec Trace Level off Security ON: Local OS Authentication SNMP OFF Listener Parameter File /oracle/app/oracle/product/11.2.0/dbhome_1/network/admin/listener.ora Listener Log File /oracle/app/oracle/diag/tnslsnr/pldb236/listener/alert/log.xml Listening Endpoints Summary... (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.10.10.104)(PORT=1521))) (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC0))) Services Summary... Service "powerdes" has 1 instance(s). Instance "powerdes", status UNKNOWN, has 1 handler(s) for this service... The command completed successfully LSNRCTL>
PS:重启之后,远程连接oracle实例还是非常慢的,所以重启也没有啥效果 |
6、开启lsnrctl的trace 16分析
尝试开启更高一级别的trace16,看看,这回stop快了许多,但是远程连接还是非常慢的,开启过程如下:
LSNRCTL> trace 16 Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=10.10.10.104)(PORT=1521))) Opened trace file: /oracle/app/oracle/diag/tnslsnr/pldb236/listener/trace/ora_14662_140432957417216.trc The command completed successfully LSNRCTL> PS:然后去查看这个日志,也没有发现异常信息,看来此途径无效了。 |
7、迷雾中的灵光一闪
各种地方都考虑检查到了,问题还是没有解决,仔细想来,这种情况也就是做了rman恢复之后的事情,在没有做rman恢复之前,都是正常的;
Rman恢复、rman恢复、rman恢复、rman恢复……
对了,rman恢复的是2015年的数据,那么数据库用户名密码也是2015年的,2016年数据库用户密码修改过了,想起了rman恢复是恢复了所有的东西包括环境变量博包括系统参数等等。那么意味着现在应用连接这个数据库所使用的用户名密码是连接不上的,连接不上就会报错啊,但是alert日志后台没有看到,那会是啥问题呢,突然脑中亮光一闪,oracle11g新特性密码验证延迟啊,会卡住数据库,用户连接不上去了。
赶紧去屏蔽密码延迟验证:
SQL> ALTER SYSTEM SET EVENT = ‘28401 TRACE NAME CONTEXT FOREVER, LEVEL 1‘ SCOPE = SPFILE;
System altered.
SQL> create pfile from spfile;
File created.
SQL> shutdown immediate; Database closed. Database dismounted. ORACLE instance shut down. SQL> startup ORACLE instance started.
Total System Global Area 6680915968 bytes Fixed Size 2213936 bytes Variable Size 4898949072 bytes Database Buffers 1744830464 bytes Redo Buffers 34922496 bytes Database mounted. Database opened. SQL>
|
然后皆大欢喜,连接数据库也不卡了,上去查询个数据也不卡了,一切都很快速了。因此分析是就是oracle11g的新特性密码验证延迟引发lsnrctl hang住了,以前也遇到类似的由于oracle11g新特性引发的问题,当时花了漫长的时间去分析,去求证,而且详情参见我以前整理得博客blog记录,里面针对oracle11g特性有比较深入的探讨,blog连接地址为:http://blog.csdn.net/mchdba/article/details/51794443
另外trace参考文章地址:http://blog.itpub.net/17203031/viewspace-713587/
ORACLE 11g 由新特性引发lsnrctl hang住卡死迷雾的详细剖析历程