首页 > 代码库 > ORACLE 11g 由新特性引发lsnrctl hang住卡死迷雾的详细剖析历程

ORACLE 11g 由新特性引发lsnrctl hang住卡死迷雾的详细剖析历程

 

 

1、问题描述

同事说卡住了,连接oracle数据库很慢,需要很久,连上了做一个简单的查询也非常慢,感觉像是hang主了一般。

 

 

2、分析oracle服务器负载

一开始登录进去,查看oracle服务器,负载很低,服务器毫无压力,感觉不是服务器卡的问题了:

[root@pldb236 data]# w

 19:59:47 up 122 days,  4:32,  4 users,  load average: 0.65, 0.71, 0.59

USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU WHAT

root     tty1     :0               17Jun16 122days  5:00   5:00  /usr/bin/Xorg :0 -nr -verbose -audit 4 -auth /var/run/gdm/auth-for-gdm-YoRXlJ/database -nolisten tcp vt1

root     pts/0    192.168.120.154  12:49    4:42m  0.04s  0.04s -bash

root     pts/1    192.168.120.154  12:49    0.00s  0.13s  0.05s w

root     pts/2    192.168.120.154  13:41    6:09m  0.06s  0.02s -bash

[root@pldb236 data]#

 

 

 

 

3、分析oracle服务器lsnrctl监听

既然不是oracle服务器负载的问题,那么就再看看lsnrctl监听状态,比较慢,而且看到最后有异常信息:

[oracle@pldb236 admin]$ lsnrctl status

 

LSNRCTL for Linux: Version 11.2.0.1.0 - Production on 17-OCT-2016 13:27:51

 

Copyright (c) 1991, 2009, Oracle.  All rights reserved.

 

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=10.10.10.104)(PORT=1521)))

 

# 然后卡住了,卡住了,hang主了,没有任何动弹,需要很久很久才能释放出来。

 

 

# 再看下tnsping下服务,也比较慢,发现有错误问题出现,如下所示

[oracle@pldb236 admin]$ tnsping PD236

 

TNS Ping Utility for Linux: Version 11.2.0.1.0 - Production on 17-OCT-2016 13:21:24

 

Copyright (c) 1997, 2009, Oracle.  All rights reserved.

 

Used parameter files:

 

 

Used TNSNAMES adapter to resolve the alias

Attempting to contact (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = 10.10.10.104)(PORT = 1521))) (CONNECT_DATA = http://www.mamicode.com/(SERVICE_NAME = powerdes)))

TNS-12547: TNS:lost contact

[oracle@pldb236 admin]$

 

 

 

 

 

 

 

 

4、分析监听lsnrctltrace跟踪文件

进去lsnrctl管理界面看trace文件目录:

1)进去lsnrctl管理命令行

[oracle@pldb236 oradata]$ lsnrctl

 

LSNRCTL for Linux: Version 11.2.0.1.0 - Production on 17-OCT-2016 20:36:34

 

Copyright (c) 1991, 2009, Oracle.  All rights reserved.

 

Welcome to LSNRCTL, type "help" for information.

 

LSNRCTL>

2)查看命令

LSNRCTL> show

The following operations are available after show

An asterisk (*) denotes a modifier or extended command:

 

rawmode                            displaymode                       

rules                              trc_file                          

trc_directory                      trc_level                         

log_file                           log_directory                     

log_status                         current_listener                  

inbound_connect_timeout            startup_waittime                  

snmp_visible                       save_config_on_stop               

dynamic_registration               enable_global_dynamic_endpoint    

oracle_home                        pid                               

 

3)查看跟踪文件已经文件目录

LSNRCTL> show trc_file

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=10.10.10.104)(PORT=1521)))

LISTENER parameter "trc_file" set to ora_2994_139957275965184.trc

The command completed successfully

LSNRCTL>

LSNRCTL> show trc_directory

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=10.10.10.104)(PORT=1521)))

LISTENER parameter "trc_directory" set to /oracle/app/oracle/diag/tnslsnr/pldb236/listener/trace

The command completed successfully

LSNRCTL>

 

 

 

然后去看后台的日志,看到显示正常,没有发现路径错误之类的问题:

[root@pldb236 ~]# tail -f /oracle/app/oracle/diag/tnslsnr/pldb236/listener/trace/ora_2994_139957275965184.trc

2016-10-17 13:42:38.443831 : nsbrfr:nsbfs at 0x25d94c0, data at 0x25aa940.

2016-10-17 13:42:38.443843 : nsbrfr:nsbfs at 0x256f3b0, data at 0x2576610.

2016-10-17 13:42:38.443855 : nsbrfr:nsbfs at 0x25a4ac0, data at 0x25af850.

2016-10-17 13:42:38.443867 : nsbrfr:nsbfs at 0x25a4b70, data at 0x25b52c0.

2016-10-17 13:42:38.443880 : nsbrfr:nsbfs at 0x25b1fb0, data at 0x25b8a10.

2016-10-17 13:42:38.443891 : nsbrfr:nsbfs at 0x2568f10, data at 0x2566670.

2016-10-17 13:42:38.443903 : nsbrfr:nsbfs at 0x256afe0, data at 0x256b090.

2016-10-17 13:42:38.443916 : nsbrfr:nsbfs at 0x250fd00, data at 0x2568fc0.

2016-10-17 13:42:38.443937 : nlse_term_audit:entry

2016-10-17 13:42:38.443951 : nlse_term_audit:exit

 

 

5、出绝招lsnrctl重启

绝招无效,还是很慢,甚至lsnrctl stop以及lsnrctl start都非常慢。

# stop很慢,卡住了,至少需要50秒才能完成

LSNRCTL> stop

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=10.10.10.104)(PORT=1521)))

……卡呀卡……卡到外婆家

The command completed successfully

LSNRCTL> start   # start会快一些,快到10秒到20秒就完成了。

Starting /oracle/app/oracle/product/11.2.0/dbhome_1/bin/tnslsnr: please wait...

 

TNSLSNR for Linux: Version 11.2.0.1.0 - Production

System parameter file is /oracle/app/oracle/product/11.2.0/dbhome_1/network/admin/listener.ora

Log messages written to /oracle/app/oracle/diag/tnslsnr/pldb236/listener/alert/log.xml

Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.10.10.104)(PORT=1521)))

Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC0)))

 

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=10.10.10.104)(PORT=1521)))

STATUS of the LISTENER

------------------------

Alias                     LISTENER

Version                   TNSLSNR for Linux: Version 11.2.0.1.0 - Production

Start Date                17-OCT-2016 20:46:58

Uptime                    0 days 0 hr. 0 min. 0 sec

Trace Level               off

Security                  ON: Local OS Authentication

SNMP                      OFF

Listener Parameter File   /oracle/app/oracle/product/11.2.0/dbhome_1/network/admin/listener.ora

Listener Log File         /oracle/app/oracle/diag/tnslsnr/pldb236/listener/alert/log.xml

Listening Endpoints Summary...

  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.10.10.104)(PORT=1521)))

  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC0)))

Services Summary...

Service "powerdes" has 1 instance(s).

  Instance "powerdes", status UNKNOWN, has 1 handler(s) for this service...

The command completed successfully

LSNRCTL>

 

 

PS:重启之后,远程连接oracle实例还是非常慢的,所以重启也没有啥效果

 

 

 

6、开启lsnrctltrace 16分析

尝试开启更高一级别的trace16,看看,这回stop快了许多,但是远程连接还是非常慢的,开启过程如下:

LSNRCTL> trace 16

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=10.10.10.104)(PORT=1521)))

Opened trace file: /oracle/app/oracle/diag/tnslsnr/pldb236/listener/trace/ora_14662_140432957417216.trc

The command completed successfully

LSNRCTL>

PS:然后去查看这个日志,也没有发现异常信息,看来此途径无效了。

 

 

 

7、迷雾中的灵光一闪

各种地方都考虑检查到了,问题还是没有解决,仔细想来,这种情况也就是做了rman恢复之后的事情,在没有做rman恢复之前,都是正常的;

 

Rman恢复、rman恢复、rman恢复、rman恢复……

 

对了,rman恢复的是2015年的数据,那么数据库用户名密码也是2015年的,2016年数据库用户密码修改过了,想起了rman恢复是恢复了所有的东西包括环境变量博包括系统参数等等。那么意味着现在应用连接这个数据库所使用的用户名密码是连接不上的,连接不上就会报错啊,但是alert日志后台没有看到,那会是啥问题呢,突然脑中亮光一闪,oracle11g新特性密码验证延迟啊,会卡住数据库,用户连接不上去了。

 

赶紧去屏蔽密码延迟验证:

SQL> ALTER SYSTEM SET EVENT = ‘28401 TRACE NAME CONTEXT FOREVER, LEVEL 1‘ SCOPE = SPFILE;

 

System altered.

 

SQL> create pfile from spfile;

 

File created.

 

SQL> shutdown immediate;

Database closed.

Database dismounted.

ORACLE instance shut down.

SQL> startup

ORACLE instance started.

 

Total System Global Area 6680915968 bytes

Fixed Size              2213936 bytes

Variable Size              4898949072 bytes

Database Buffers    1744830464 bytes

Redo Buffers                34922496 bytes

Database mounted.

Database opened.

SQL>

 

 

然后皆大欢喜,连接数据库也不卡了,上去查询个数据也不卡了,一切都很快速了。因此分析是就是oracle11g的新特性密码验证延迟引发lsnrctl hang住了,以前也遇到类似的由于oracle11g新特性引发的问题,当时花了漫长的时间去分析,去求证,而且详情参见我以前整理得博客blog记录,里面针对oracle11g特性有比较深入的探讨,blog连接地址为:http://blog.csdn.net/mchdba/article/details/51794443

 

 

 

另外trace参考文章地址:http://blog.itpub.net/17203031/viewspace-713587/

 

ORACLE 11g 由新特性引发lsnrctl hang住卡死迷雾的详细剖析历程