Re: Oracle 12c agent troubleshooting (EM_PING_NOTIF_RESPONSE: BACKOFF::180000)

  • From: Kellyn Pot'vin <kellyn.potvin@xxxxxxxxx>
  • To: "development@xxxxxxxxxxxxxxxxx" <development@xxxxxxxxxxxxxxxxx>, "oracle-l@xxxxxxxxxxxxx" <oracle-l@xxxxxxxxxxxxx>
  • Date: Thu, 27 Oct 2011 08:18:37 -0700 (PDT)

Hi Martin!
Is there any info in the diagnostic directory?
$AGENT_HOME/agent_inst/diag/ofm/emagent/emagent/incident/...
Did you change from the default ping interval?

If not, can you match it to the same as your other server's?  (It should show 
in the gcagent.log)
Setting the Ping Interval

In Release 2.1, the Management Server is designed to ping all targets on a 
pre-defined interval to monitor the state of all managed targets.
To manage the interval between pings, you can use the following property in the 
omsconfig.properties file to set the ping interval:
oms.vdp.ping_interval=<integer; time in minutes>  
Note that the interval set determines the interval at which the Management 
Server tests for node up/down, regardless of what you set in the event which 
contains a node up/down test.
 
Kellyn Pot'Vin
Sr. Database Administrator and Developer
DBAKevlar.com


________________________________
From: Martin Bach <development@xxxxxxxxxxxxxxxxx>
To: "oracle-l@xxxxxxxxxxxxx" <oracle-l@xxxxxxxxxxxxx>
Sent: Thursday, October 27, 2011 8:59 AM
Subject: Oracle 12c agent troubleshooting (EM_PING_NOTIF_RESPONSE: 
BACKOFF::180000)

Good afternoon!

It's been a busy day on the mailing list, and maybe I can benefit from 
this a little :) Before I begin I have to admit that I'm not the best 
agent troubleshooter, and 12.1 hasn't made that easier.

I have 2 agents that are deployed on a 2 node cluster, both have worked 
in the past. After a reboot, both stopped to function. Now I have this:

[oracle@rac11203node1 log]$ emctl status agent
Oracle Enterprise Manager 12c Cloud Control 12.1.0.1.0
Copyright (c) 1996, 2011 Oracle Corporation. All rights reserved.
---------------------------------------------------------------
Agent Version : 12.1.0.1.0
OMS Version : (unknown)
Protocol Version : 12.1.0.1.0
Agent Home : /u01/app/oracle/product/agent_inst
Agent Binaries : /u01/app/oracle/product/core/12.1.0.1.0
Agent Process ID : 13270
Parent Process ID : 13215
Agent URL : https://rac11203node1.localdomain:3872/emd/main/
Repository URL : https://oem12oms.localdomain:4901/empbs/upload
Started at : 2011-10-26 18:30:17
Started by user : oracle
Last Reload : (none)
Last successful upload : (none)
Last attempted upload : (none)
Total Megabytes of XML files uploaded so far : 0
Number of XML files pending upload : 1,858
Size of XML files pending upload(MB) : 8.05
Available disk space on upload filesystem : 49.16%
Collection Status : Collections enabled
Last attempted heartbeat to OMS : 2011-10-27 15:42:47
Last successful heartbeat to OMS : (none)

---------------------------------------------------------------
Agent is Running and Ready

The settings are correct, I have verified that with another, uploading 
and otherwise fine agent.

I have also secured the agent, and 
$AGENT_BASE/agent_inst/sysman/log/secure.log as well as the emctl secure 
agent commands reported normal, successful operation.

Still the stubborn thing doesn't want to talk to the OMS - in the agent 
overview page both agents are listed as "unavailable", but not blocked. 
When I force an upload, I get this:

[oracle@rac11203node1 log]$ emctl upload
Oracle Enterprise Manager 12c Cloud Control 12.1.0.1.0
Copyright (c) 1996, 2011 Oracle Corporation. All rights reserved.
---------------------------------------------------------------
EMD upload error:full upload has failed: uploadXMLFiles skipped :: OMS 
version not checked yet. If this issue persists check trace files for 
ping to OMS related errors. (OMS_DOWN)

However it's not down, I can reach it from another agent (which happens 
to be on the same box as the OMS)

[oracle@oem12oms 12.1.0.1.0]$ $ORACLE_HOME/bin/emctl status agent
Oracle Enterprise Manager 12c Cloud Control 12.1.0.1.0
Copyright (c) 1996, 2011 Oracle Corporation. All rights reserved.
---------------------------------------------------------------
Agent Version : 12.1.0.1.0
OMS Version : 12.1.0.1.0
Protocol Version : 12.1.0.1.0
Agent Home : /u01/gc12.1/agent/agent_inst
Agent Binaries : /u01/gc12.1/agent/core/12.1.0.1.0
Agent Process ID : 2964
Parent Process ID : 2910
Agent URL : https://oem12oms.localdomain:3872/emd/main/
Repository URL : https://oem12oms.localdomain:4901/empbs/upload
Started at : 2011-10-15 21:00:37
Started by user : oracle
Last Reload : (none)
Last successful upload : 2011-10-27 15:46:38
Last attempted upload : 2011-10-27 15:46:38
Total Megabytes of XML files uploaded so far : 137.79
Number of XML files pending upload : 0
Size of XML files pending upload(MB) : 0
Available disk space on upload filesystem : 50.78%
Collection Status : Collections enabled
Last attempted heartbeat to OMS : 2011-10-27 15:48:34
Last successful heartbeat to OMS : 2011-10-27 15:48:34

---------------------------------------------------------------
Agent is Running and Ready

And no, the firewall is turned off and I can connect to the upload from 
any machine in the network:

[oracle@rac11203node1 log]$ wget --no-check-certificate 
https://oem12oms.localdomain:4901/empbs/upload
--2011-10-27 15:55:46-- https://oem12oms.localdomain:4901/empbs/upload
Resolving oem12oms.localdomain... 192.168.99.28
Connecting to oem12oms.localdomain|192.168.99.28|:4901... connected.
WARNING: cannot verify oem12oms.localdomainâ??s certificate, issued by 
â??/O=EnterpriseManager on oem12oms.localdomain/OU=EnterpriseManager on 
oem12oms.localdomain/L=EnterpriseManager on 
oem12oms.localdomain/STÊ/C=US/CN=oem12oms.localdomainâ??:
Self-signed certificate encountered.
HTTP request sent, awaiting response... 200 OK
Length: 314 [text/html]
Saving to: â??upload.1â??

100%[======================================>] 314 --.-K/s in 0s

2011-10-27 15:55:46 (5.19 MB/s) - â??upload.1â?? saved [314/314]

The agent complains about this in gcagent.log:

2011-10-27 15:56:08,947 [37:3F09CD9C] WARN - improper ping interval 
(EM_PING_NOTIF_RESPONSE: BACKOFF::180000)
2011-10-27 15:56:18,471 [167:E3E93C4C] WARN - improper ping interval 
(EM_PING_NOTIF_RESPONSE: BACKOFF::180000)
2011-10-27 15:56:18,472 [167:E3E93C4C] WARN - Ping protocol error
o.s.gcagent.ping.PingProtocolException [OMS sent an invalid response: 
"BACKOFF::180000"]

At least someone in Oracle has some humour when it comes to this :) For 
those who read all of this: have you seen that before? Any pointers 
appreciated.

Martin
--
http://www.linkedin.com/in/martincarstenbach
http://martincarstenbach.wordpress.com
--
//www.freelists.org/webpage/oracle-l

--
//www.freelists.org/webpage/oracle-l


Other related posts: