RE: OEM GC High Availability and Downtime

  • From: "Storey, Robert \(DCSO\)" <RStorey@xxxxxxxxxxxxxxxxxx>
  • To: "Guillermo Alan Bort" <cicciuxdba@xxxxxxxxx>
  • Date: Thu, 24 Jun 2010 10:54:46 -0500

I'm still implementing my GC but nothing to the scale you are. Local,
with only a few databases. 


One of the things we knocked around in class, and I'm sure others will
chip in, is that while your GC may be world wide in its reach, you only
have one repository.  You can have multiple OMS's to load balance the
collection, but all that data flows to one repository.  So, to me, that
takes the RAC distance out of the question.

 

My first thought is that you would have distributed OMS's to do your
collections from the multitude of agents, and then those OMS's would
dump back to your repository.

 

Again, just kinda shooting of the top of my head.  At a given location,
you could have redundant OMS ( I think there is an Active-Active
setup???) to ensure that you can take one box down to patch.

 

Then, at "home" you have the repository on a RAC so that it is HA.

 

So, if you have 4 geographic sites around the world, call them Site A
through Site D, and at each site, you have redundant OMS's to monitor
the Sites targets, then if you need to patch a Site, you take down one
side of the Active to patch, bring it back up, and then take down the
other.  

 

Then, at "home", you put the Repository on a RAC, and keep your HA.

 

So,  to me, this handles any planned downtime.  And you just annotate
that downtime for a site in your Blackout periods of grid and no stats
are expected during that time.

 

I don't' think you can ever do away with all outage. Emergency outage at
a Site might take down your network.  Might lose the network from  Sites
to the OMR.  But, I think that type of a setup would give you the best
HA you could get.

 

List?  Am I way off base?

 

From: alanbort@xxxxxxxxx [mailto:alanbort@xxxxxxxxx] On Behalf Of
Guillermo Alan Bort
Sent: Thursday, June 24, 2010 10:45 AM
To: Storey, Robert (DCSO)
Cc: oracle-l-freelists
Subject: Re: OEM GC High Availability and Downtime

 

Distributed RAC is supported for relatively close locations, and we are
deploying GC around the world. the DataGuard for repository is being
considered, however that only safeguards against emergency outage...
what about planned outage? is there a way to avoid it? (other than never
applying a patch, of course)


Alan.-



On Thu, Jun 24, 2010 at 12:21 PM, Storey, Robert (DCSO)
<RStorey@xxxxxxxxxxxxxxxxxx> wrote:

What about using RAC or Dataguard to provide redundancy to the
repository?

 

Not sure there is anything that you can do for the agents, but, if you
have multiple OMS's talking to a repository that was covered by
Dataguard, you could switchover to the other repository.

 

Or, if you just want constant uptime on it, why not implement a RAC for
the repository.

 

Just first thoughts to pop in my head.

 

 

From: oracle-l-bounce@xxxxxxxxxxxxx
[mailto:oracle-l-bounce@xxxxxxxxxxxxx] On Behalf Of Guillermo Alan Bort
Sent: Thursday, June 24, 2010 9:55 AM
To: oracle-l-freelists
Subject: OEM GC High Availability and Downtime

 

List,

   We are implementing GC for database monitoring for over 2500
databases, however there is concern of the possible downtimes of Grid,
during which the databases will not be monitored.

   I can think of two possible scenarios:

1) Planned outage: in order to apply a patch, the entire OEM needs to be
shut down. 
2) Emergency outage: Media Failure, Hardware Failure, Power Failure,
Oracle Bug, etc.

   The HA strategy is having several OMS across different locations
(three, actually) however I am concerned about the repository.

    It is very unlikely that an emergency outage will affect all three
OMS, however it might affect the repository... and the planned outage in
several situations will require a full shutdown of all OMS, Agents and
Repository.

    My question is: have you ever dealt with this issue? How did you
solve it? did you just accept the outage? did you find an alternative?

TIA
Alan.-

 

Other related posts: