首页 > 代码库 > 对Oracle10g rac ons服务的一些理解

对Oracle10g rac ons服务的一些理解

1.什么是ONS

     ONS(Oracle Notification Service)是Oracle Clusterware 实现FAN Event Push模型的基础。
     在传统模型中,客户端需要定期检索服务器来判断服务端的状态,本质上是一个PULL模型。ORACLE10
引入了一种全新的PUSH机制--FAN(Fast Application Notification),当服务端发生某些事件时,服务器
会主动的通知客户端这种变化,这样客户端就能尽早得知服务器端变化。而这种机制就是依赖ONS实现的。
通常使用onsctl命令管理配置ONS,使用onsctl命令之前,需要先配置ONS服务。

 

 

2.OSN配置内容

      需要注意的是在RAC环境中,使用的是$CRS_HOME下的ONS,而不是$ORACLE_HOME下的ONS。
配置文件位于$CRS_HOME/opmn/conf/ons.config。

 

[root@rac3 conf]# pwd/opt/ora10g/product/10.2.0/crs_1/opmn/conf[root@rac3 conf]# lsons.config[root@rac3 conf]# cat ons.config localport=6100 remoteport=6200 loglevel=3useocr=on

 

我们对这个文件的参数进行说明:

<1>localport:这个参数代表本地监听端口,这里的"本地"特指127.0.0.1这个回环地址,用来和运行在本地的客户端进行通信。
<2>remoteport:这个参数代表的远程监听端口,也就是除了127.0.0.1以外的所有本机IP地址,用来和远程的客户端进行通信。
<3>loglevel:Oracle允许跟踪ONS进程的运行,并把日志记录到本地文件中。这个参数用来定义ONS进程要记录的日志级别,    从1~9,缺省值为3。
<4>logfile:这个参数和loglevel参数一起使用,用于定义ONS进程日志文件的位置,缺省是 $CRS_HOME/opmn/logs/opmn.log。
<5>nodes和useocr:这两个参数共同决定了本机的ONS daemon要和哪些节点上的ONS daemon进行通信。

 

在这些参数中,localport和remoteport两个参数是必须的。可以通过netstat命令来比较一下这两个端口的使用方式:

[root@rac3 bin]# netstat -ano|grep 6100tcp        0      0 127.0.0.1:6100              0.0.0.0:*                   LISTEN      off (0.00/0/0)         tcp        0      0 127.0.0.1:6100              127.0.0.1:32852             ESTABLISHED off (0.00/0/0)tcp        0      0 127.0.0.1:32840             127.0.0.1:6100              ESTABLISHED keepalive (7063.32/0/0)tcp        0      0 127.0.0.1:32852             127.0.0.1:6100              ESTABLISHED keepalive (7188.42/0/0)tcp        0      0 127.0.0.1:6100              127.0.0.1:32840             ESTABLISHED off (0.00/0/0)udp        0      0 192.168.2.103:61008         0.0.0.0:*                               off (0.00/0/0)0/0)[root@rac3 bin]# netstat -ano|grep 6200tcp        0      0 0.0.0.0:6200                0.0.0.0:*                   LISTEN      off (0.00/0/0)tcp        0      0 192.168.1.103:32836         192.168.1.104:6200          ESTABLISHED off (0.00/0/0)

对比可以看到Oracle在127.0.0.1这个地址上监听6100这个端口,而在0.0.0.0(即所其他地址)上监听6200端口,这正好对应了我们/opt/ora10g/product/10.2.0/crs_1/opmn/conf/ons.config中的配置

 

在这里还需要注意的是useocr参数,该参数取值为ON或OFF。如果useocr是ON,说明与ONS进行通信的远程节点信息就保存在OCR中,如果是OFF,说明与ONS进行通信的远程节点信息就取nodes中的配置。
   nodes参数值格式: hostname/ip:port[,hostname/ip:port]  例如:nodes=dbs:6200,dbp:6200
  
当useocr参数为ON时,与ONS进行通信的远程节点信息就保存在OCR中,那么这个信息就保存在OCR的DATABASE.ONS_HOSTS这个键下。


   我们可以把这个键导出来:

[root@rac3 bin]# ./ocrdump -xml /home/oracle/ons_info.xml -keyname DATABASE.ONS_HOSTS[root@rac3 bin]# cat /home/oracle/ons_info.xml <OCRDUMP><TIMESTAMP>01/28/2015 10:46:35</TIMESTAMP><COMMAND>./ocrdump.bin -xml /home/oracle/ons_info.xml -keyname DATABASE.ONS_HOSTS </COMMAND><KEY><NAME>DATABASE.ONS_HOSTS</NAME><VALUE_TYPE>UNDEF</VALUE_TYPE><VALUE><![CDATA[]]></VALUE><USER_PERMISSION>PROCR_ALL_ACCESS</USER_PERMISSION><GROUP_PERMISSION>PROCR_READ</GROUP_PERMISSION><OTHER_PERMISSION>PROCR_READ</OTHER_PERMISSION><USER_NAME>oracle</USER_NAME><GROUP_NAME>oinstall</GROUP_NAME><KEY><NAME>DATABASE.ONS_HOSTS.rac3</NAME>   --节点<VALUE_TYPE>ORATEXT</VALUE_TYPE><VALUE><![CDATA[rac3]]></VALUE><USER_PERMISSION>PROCR_ALL_ACCESS</USER_PERMISSION><GROUP_PERMISSION>PROCR_READ</GROUP_PERMISSION><OTHER_PERMISSION>PROCR_READ</OTHER_PERMISSION><USER_NAME>oracle</USER_NAME><GROUP_NAME>oinstall</GROUP_NAME><KEY><NAME>DATABASE.ONS_HOSTS.rac3.PORT</NAME>   --节点对应的端口<VALUE_TYPE>ORATEXT</VALUE_TYPE><VALUE><![CDATA[6200]]></VALUE><USER_PERMISSION>PROCR_ALL_ACCESS</USER_PERMISSION><GROUP_PERMISSION>PROCR_READ</GROUP_PERMISSION><OTHER_PERMISSION>PROCR_READ</OTHER_PERMISSION><USER_NAME>oracle</USER_NAME><GROUP_NAME>oinstall</GROUP_NAME></KEY></KEY><KEY><NAME>DATABASE.ONS_HOSTS.rac4</NAME>    --节点<VALUE_TYPE>ORATEXT</VALUE_TYPE><VALUE><![CDATA[rac4]]></VALUE><USER_PERMISSION>PROCR_ALL_ACCESS</USER_PERMISSION><GROUP_PERMISSION>PROCR_READ</GROUP_PERMISSION><OTHER_PERMISSION>PROCR_READ</OTHER_PERMISSION><USER_NAME>oracle</USER_NAME><GROUP_NAME>oinstall</GROUP_NAME><KEY><NAME>DATABASE.ONS_HOSTS.rac4.PORT</NAME>   --端口<VALUE_TYPE>ORATEXT</VALUE_TYPE><VALUE><![CDATA[6200]]></VALUE><USER_PERMISSION>PROCR_ALL_ACCESS</USER_PERMISSION><GROUP_PERMISSION>PROCR_READ</GROUP_PERMISSION><OTHER_PERMISSION>PROCR_READ</OTHER_PERMISSION><USER_NAME>oracle</USER_NAME><GROUP_NAME>oinstall</GROUP_NAME></KEY></KEY></KEY></OCRDUMP>

 

 

3.配置ONS

配置ONS时我们可以直接编辑ONS的配置文件来修改配置(useocr=OFF时),如果ONS节点通信的配置信息放在了OCR中(useocr=ON时),可以使用root身份执行racgons命令进行配置。

注意:racgons命令必须用root身份执行,如果使用oracle身份执行这个命令,不会提示任何错误信息,但是也不会更改任何配置。

 

---添加配置:

[root@rac3 bin]# ./racgons add_config rac3:6300 rac4:6300  [root@rac3 bin]# ./ocrdump -xml /home/oracle/ons_info2.xml -keyname DATABASE.ONS_HOSTS[root@rac3 bin]# cat /home/oracle/ons_info2.xml <OCRDUMP><TIMESTAMP>01/28/2015 10:56:30</TIMESTAMP><COMMAND>./ocrdump.bin -xml /home/oracle/ons_info2.xml -keyname DATABASE.ONS_HOSTS </COMMAND><KEY><NAME>DATABASE.ONS_HOSTS</NAME><VALUE_TYPE>UNDEF</VALUE_TYPE><VALUE><![CDATA[]]></VALUE><USER_PERMISSION>PROCR_ALL_ACCESS</USER_PERMISSION><GROUP_PERMISSION>PROCR_READ</GROUP_PERMISSION><OTHER_PERMISSION>PROCR_READ</OTHER_PERMISSION><USER_NAME>oracle</USER_NAME><GROUP_NAME>oinstall</GROUP_NAME><KEY><NAME>DATABASE.ONS_HOSTS.rac3</NAME><VALUE_TYPE>ORATEXT</VALUE_TYPE><VALUE><![CDATA[rac3]]></VALUE><USER_PERMISSION>PROCR_ALL_ACCESS</USER_PERMISSION><GROUP_PERMISSION>PROCR_READ</GROUP_PERMISSION><OTHER_PERMISSION>PROCR_READ</OTHER_PERMISSION><USER_NAME>oracle</USER_NAME><GROUP_NAME>oinstall</GROUP_NAME><KEY><NAME>DATABASE.ONS_HOSTS.rac3.PORT</NAME><VALUE_TYPE>ORATEXT</VALUE_TYPE><VALUE><![CDATA[6200 6300]]></VALUE>  --可以看到增加了6300端口<USER_PERMISSION>PROCR_ALL_ACCESS</USER_PERMISSION><GROUP_PERMISSION>PROCR_READ</GROUP_PERMISSION><OTHER_PERMISSION>PROCR_READ</OTHER_PERMISSION><USER_NAME>oracle</USER_NAME><GROUP_NAME>oinstall</GROUP_NAME></KEY></KEY><KEY><NAME>DATABASE.ONS_HOSTS.rac4</NAME><VALUE_TYPE>ORATEXT</VALUE_TYPE><VALUE><![CDATA[rac4]]></VALUE><USER_PERMISSION>PROCR_ALL_ACCESS</USER_PERMISSION><GROUP_PERMISSION>PROCR_READ</GROUP_PERMISSION><OTHER_PERMISSION>PROCR_READ</OTHER_PERMISSION><USER_NAME>oracle</USER_NAME><GROUP_NAME>oinstall</GROUP_NAME><KEY><NAME>DATABASE.ONS_HOSTS.rac4.PORT</NAME><VALUE_TYPE>ORATEXT</VALUE_TYPE><VALUE><![CDATA[6200 6300]]></VALUE>  --可以看到增加了6300端口<USER_PERMISSION>PROCR_ALL_ACCESS</USER_PERMISSION><GROUP_PERMISSION>PROCR_READ</GROUP_PERMISSION><OTHER_PERMISSION>PROCR_READ</OTHER_PERMISSION><USER_NAME>oracle</USER_NAME><GROUP_NAME>oinstall</GROUP_NAME></KEY></KEY></KEY></OCRDUMP>

 

 

----删除配置

 

[root@rac3 bin]# ./racgons remove_config rac3:6300 rac4:6300racgons: Existing key value on rac3 = 6200 6300.racgons: rac3:6300 removed from OCR.racgons: Existing key value on rac4 = 6200 6300.racgons: rac4:6300 removed from OCR.[root@rac3 bin]# ./ocrdump -xml /home/oracle/ons_info3.xml -keyname DATABASE.ONS_HOSTS[root@rac3 bin]# cat /home/oracle/ons_info3.xml <OCRDUMP><TIMESTAMP>01/28/2015 11:01:13</TIMESTAMP><COMMAND>./ocrdump.bin -xml /home/oracle/ons_info3.xml -keyname DATABASE.ONS_HOSTS </COMMAND><KEY><NAME>DATABASE.ONS_HOSTS</NAME><VALUE_TYPE>UNDEF</VALUE_TYPE><VALUE><![CDATA[]]></VALUE><USER_PERMISSION>PROCR_ALL_ACCESS</USER_PERMISSION><GROUP_PERMISSION>PROCR_READ</GROUP_PERMISSION><OTHER_PERMISSION>PROCR_READ</OTHER_PERMISSION><USER_NAME>oracle</USER_NAME><GROUP_NAME>oinstall</GROUP_NAME><KEY><NAME>DATABASE.ONS_HOSTS.rac3</NAME><VALUE_TYPE>ORATEXT</VALUE_TYPE><VALUE><![CDATA[rac3]]></VALUE><USER_PERMISSION>PROCR_ALL_ACCESS</USER_PERMISSION><GROUP_PERMISSION>PROCR_READ</GROUP_PERMISSION><OTHER_PERMISSION>PROCR_READ</OTHER_PERMISSION><USER_NAME>oracle</USER_NAME><GROUP_NAME>oinstall</GROUP_NAME><KEY><NAME>DATABASE.ONS_HOSTS.rac3.PORT</NAME><VALUE_TYPE>ORATEXT</VALUE_TYPE><VALUE><![CDATA[6200 ]]></VALUE>     --可以看到6300端口已被删除<USER_PERMISSION>PROCR_ALL_ACCESS</USER_PERMISSION><GROUP_PERMISSION>PROCR_READ</GROUP_PERMISSION><OTHER_PERMISSION>PROCR_READ</OTHER_PERMISSION><USER_NAME>oracle</USER_NAME><GROUP_NAME>oinstall</GROUP_NAME></KEY></KEY><KEY><NAME>DATABASE.ONS_HOSTS.rac4</NAME><VALUE_TYPE>ORATEXT</VALUE_TYPE><VALUE><![CDATA[rac4]]></VALUE><USER_PERMISSION>PROCR_ALL_ACCESS</USER_PERMISSION><GROUP_PERMISSION>PROCR_READ</GROUP_PERMISSION><OTHER_PERMISSION>PROCR_READ</OTHER_PERMISSION><USER_NAME>oracle</USER_NAME><GROUP_NAME>oinstall</GROUP_NAME><KEY><NAME>DATABASE.ONS_HOSTS.rac4.PORT</NAME><VALUE_TYPE>ORATEXT</VALUE_TYPE><VALUE><![CDATA[6200 ]]></VALUE>   --可以看到6300端口已被删除<USER_PERMISSION>PROCR_ALL_ACCESS</USER_PERMISSION><GROUP_PERMISSION>PROCR_READ</GROUP_PERMISSION><OTHER_PERMISSION>PROCR_READ</OTHER_PERMISSION><USER_NAME>oracle</USER_NAME><GROUP_NAME>oinstall</GROUP_NAME></KEY></KEY></KEY></OCRDUMP>

 

 


4.onsctl命令

用onsctl命令可以启动、停止、调试ONS,并重新载入配置文件,其命令格式如下:

 

[root@rac3 bin]# ./onsctl -helpusage: ./onsctl start|stop|ping|reconfig|debugstart                            - Start opmn only.stop                             - Stop ons daemonping                             - Test to see if ons daemon is runningdebug                            - Display debug information for the ons daemonreconfig                         - Reload the ons configurationhelp                             - Print a short syntax description (this).detailed                         - Print a verbose syntax description.

 

注意:ONS进程运行,并不一定代表ONS正常工作,需要使用ping命令来确认。

 

<1>在OS级别查看进程状态

 

 [root@rac3 bin]# ps -ef|grep ons |grep -v greporacle   27813     1  0 10:31 ?        00:00:00 /opt/ora10g/product/10.2.0/crs_1/opmn/bin/ons -doracle   27814 27813  0 10:31 ?        00:00:00 /opt/ora10g/product/10.2.0/crs_1/opmn/bin/ons -d

从输出信息可见ONS进程正常运行。

 

<2>确认ONS服务状态

 [root@rac3 bin]# ./onsctl pingNumber of onsconfiguration retrieved, numcfg = 2onscfg[0]   {node = rac3, port = 6200}Adding remote host rac3:6200onscfg[1]   {node = rac4, port = 6200}Adding remote host rac4:6200ons is running ...

从输出信息可见ONS进程正常运行。

 

<3>停止ons服务

 [root@rac3 bin]# ./onsctl stoponsctl: shutting down ons daemon ...Number of onsconfiguration retrieved, numcfg = 2onscfg[0]   {node = rac3, port = 6200}Adding remote host rac3:6200onscfg[1]   {node = rac4, port = 6200}Adding remote host rac4:6200[root@rac3 bin]# [root@rac3 bin]# ./onsctl pingNumber of onsconfiguration retrieved, numcfg = 2onscfg[0]   {node = rac3, port = 6200}Adding remote host rac3:6200onscfg[1]   {node = rac4, port = 6200}Adding remote host rac4:6200ons is not running ...        ---从这里看确认停止成功

 

 

<4>启动ons服务

[root@rac3 bin]# ./onsctl startNumber of onsconfiguration retrieved, numcfg = 2onscfg[0]   {node = rac3, port = 6200}Adding remote host rac3:6200onscfg[1]   {node = rac4, port = 6200}Adding remote host rac4:6200Number of onsconfiguration retrieved, numcfg = 2onscfg[0]   {node = rac3, port = 6200}Adding remote host rac3:6200onscfg[1]   {node = rac4, port = 6200}Adding remote host rac4:6200onsctl: ons started        --启动成功[root@rac3 bin]# [root@rac3 bin]# ./onsctl pingNumber of onsconfiguration retrieved, numcfg = 2onscfg[0]   {node = rac3, port = 6200}Adding remote host rac3:6200onscfg[1]   {node = rac4, port = 6200}Adding remote host rac4:6200ons is running ...       --从这里看确认启动成功

 

<5>使用debug选项查看详细信息

[root@rac3 bin]# ./onsctl debugNumber of onsconfiguration retrieved, numcfg = 2onscfg[0]   {node = rac3, port = 6200}Adding remote host rac3:6200onscfg[1]   {node = rac4, port = 6200}Adding remote host rac4:6200HTTP/1.1 200 OKContent-Length: 1355Content-Type: text/htmlResponse: ======== ONS ========Listeners: NAME    BIND ADDRESS   PORT   FLAGS   SOCKET------- --------------- ----- -------- ------Local   127.000.000.001  6100 00000142      7Remote  192.168.001.103  6200 00000101      8Request     No listenerServer connections:          -----该命令最有意义的是能够显示所有连接。    ID           IP        PORT    FLAGS    SENDQ     WORKER   BUSY  SUBS---------- --------------- ----- -------- ---------- -------- ------ -----         1 192.168.001.104  6200 00010005          0               1     0Client connections:    ID           IP        PORT    FLAGS    SENDQ     WORKER   BUSY  SUBS---------- --------------- ----- -------- ---------- -------- ------ -----Pending connections:    ID           IP        PORT    FLAGS    SENDQ     WORKER   BUSY  SUBS---------- --------------- ----- -------- ---------- -------- ------ -----         0 127.000.000.001  6100 00000812          0               1     0         0 127.000.000.001  6100 00000812          0               1     0         0 127.000.000.001  6100 00020812          0               1     0Worker Ticket: 0/0, Idle: 360   THREAD   FLAGS  -------- --------  f7f86ba0 00000012  f6dd1ba0 00000012  f63d0ba0 00000012Resources:  Notifications:    Received: 0, in Receive Q: 0, Processed: 0, in Process Q: 0  Pools:    Message: 24/25 (1), Link: 25/25 (1), Subscription: 0/0 (0)

 


##===========================================================

延伸:

在对以上ons进行配置测试后,使用crs_stat -t 命令发现集群中一个节点 ons启动不起来

 

[oracle@rac3 ~]$ crs_stat -tName           Type           Target    State     Host        ------------------------------------------------------------ora....SM1.asm application    ONLINE    ONLINE    rac3        ora....C3.lsnr application    ONLINE    ONLINE    rac3        ora.rac3.gsd   application    ONLINE    ONLINE    rac3        ora.rac3.ons   application    ONLINE    OFFLINE               ora.rac3.vip   application    ONLINE    ONLINE    rac3        ora....SM2.asm application    ONLINE    ONLINE    rac4        ora....C4.lsnr application    ONLINE    ONLINE    rac4        ora.rac4.gsd   application    ONLINE    ONLINE    rac4        ora.rac4.ons   application    ONLINE    ONLINE    rac4        ora.rac4.vip   application    ONLINE    ONLINE    rac4        ora.racdb.db   application    ONLINE    ONLINE    rac4        ora....b1.inst application    ONLINE    ONLINE    rac3        ora....b2.inst application    ONLINE    ONLINE    rac4

 


--查看日志

[oracle@rac3 racg]$ tail -f ora.rac3.ons.log ..........................................RCV: Permission deniedCommunication error with the OPMN server local port.Check the OPMN log filesRCV: Permission deniedCommunication error with the OPMN server loca2015-01-28 13:34:25.867: [    RACG][2540408064] [29681][2540408064][ora.rac3.ons]: l port.Check the OPMN log filesRCV: Permission denied    -----一直提示权限被拒绝Communication error with the OPMN server local port.Check the OPMN log filesNumber of onsconfiguration retrieved, numcfg = 2onscfg[0]   {node = rac3, port = 6200}Adding remote host rac3:6200o2015-01-28 13:34:25.867: [    RACG][2540408064] [29681][2540408064][ora.rac3.ons]: nscfg[1]   {node = rac4, port = 6200}Adding remote host rac4:6200onsctl: ons failed to start   --导致ons启动失败,但onsctl ping显示ons正在运行2015-01-28 13:34:26.077: [    RACG][2540408064] [29681][2540408064][ora.rac3.ons]: RCV: Permission deniedCommunication error with the OPMN server local port.Check the OPMN log files

 

--但是确认ons服务已启动

[root@rac3 bin]# ./onsctl pingNumber of onsconfiguration retrieved, numcfg = 2onscfg[0]   {node = rac3, port = 6200}Adding remote host rac3:6200onscfg[1]   {node = rac4, port = 62015-01-28 13:34:26.077: [    RACG][2540408064] [29681][2540408064][ora.rac3.ons]: 200}Adding remote host rac4:6200ons is not running ...

 

重新./onsctl stop 后 ./onsctl start也可以正常关闭和启动,但日志里看到的都是启动不起来

 


--单独启动的时候

[oracle@rac3 ~]$ crs_start ora.rac3.ons Attempting to start `ora.rac1.ons` on member `rac3` Start of `ora.rac3.ons` on member `rac3` failed. rac4 : CRS-1019: Resource ora.rac3.ons (application) cannot run on rac4


验证了ons的配置权限也没有发现问题,重启了虚拟机尝试,发现ons在两个节点正常启动,问题解决。
 现在怀疑可能是权限问题没有检查到或ons进程僵死,启动新的能够启动,日志里还是报错信息。
(一般情况下,暂时的关闭和启动ons资源对系统影响不是太大,因为该资源主要和load balance 、 failover 有关)

 

[oracle@rac3 ~]$ crs_stat -tName           Type           Target    State     Host        ------------------------------------------------------------ora....SM1.asm application    ONLINE    ONLINE    rac3        ora....C3.lsnr application    ONLINE    ONLINE    rac3        ora.rac3.gsd   application    ONLINE    ONLINE    rac3        ora.rac3.ons   application    ONLINE    ONLINE    rac3        ora.rac3.vip   application    ONLINE    ONLINE    rac3        ora....SM2.asm application    ONLINE    ONLINE    rac4        ora....C4.lsnr application    ONLINE    ONLINE    rac4        ora.rac4.gsd   application    ONLINE    ONLINE    rac4        ora.rac4.ons   application    ONLINE    ONLINE    rac4        ora.rac4.vip   application    ONLINE    ONLINE    rac4        ora.racdb.db   application    ONLINE    ONLINE    rac4        ora....b1.inst application    ONLINE    ONLINE    rac3        ora....b2.inst application    ONLINE    ONLINE    rac4  

类似问题itpub上的帖子:http://www.itpub.net/thread-1283253-1-1.html

 ps -ef|grep ons

 

 

致谢:本文档参考了张晓明<<大话Oracle RAC>>

 

对Oracle10g rac ons服务的一些理解