Untested completely theoretical notion: It may actually take longer with the
other instances down. Is it possible that the application will tolerate the
second instance being up but in restricted mode so that only “DBA” authority
can connect?
Since I don’t have the code I can only guess, but it’s possible that only a
ping memory to memory is needed for instances that are up whilst probing the
down instance’s undo and/or redo is required. That might take long enough for
hash waits to pile up.
But first up with a bullet is JL’s suggestion of badly configured sequences,
which dovetails nicely with a vendor lacking sufficient understanding of Oracle
to support multiple instances being up.
And a question: What is the purpose of being RAC in this case? If you are
thinking rapid fail-over, I’d suggest you consider changing your configuration
to standby-recovery either roll your own or Dataguard. With the second instance
normally down, I’d like your odds that a complete recovery failover to the
standby is either faster than or negligibly slower than RAC, and it eliminates
all the RAC overheads for multi-instance coordination. As JL pointed out, some
of the RACTAX™ applies even when only one instance is up. RAC is wonderful if
you really need it, but YPDNR.
mwf
From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx] On ;
Behalf Of Krishnaprasad Yadav
Sent: Monday, October 04, 2021 6:50 AM
To: Jonathan Lewis
Cc: Oracle L
Subject: Re: Event : latch: ges resource hash list
Hi Jonathan ,
Thanks for your mail, I understand the above points , and will try to drive
in a similar direction as you have mentioned .
Regards,
Krishna
On Mon, 4 Oct 2021 at 16:13, Krishnaprasad Yadav <chrishna0007@xxxxxxxxx> wrote:
Hi Jonathan,
Its 2 node rac system , and only one instance is running and the other one is
down .
Regards,
Krishna
On Mon, 4 Oct 2021 at 15:18, Jonathan Lewis <jlewisoracle@xxxxxxxxx> wrote:
GES is the global enqueue service (which isn't about buffer cache), so it looks
as if you are doing something that requires coordination of some locking event.
(And the code path is followed regardless of how many instances are up.)
I would take a couple of snapshots of v$enqueue_stat over a short period of
time to see if any specific enqueue is being acquired very frequently; but some
global enqueue gets don't get recorded in that view - so it may show nothing
interesting. And I would do the same (snapshots) of v$rowcache to see if any if
the dictionary cache objects were subject to a high rate of access. EIther of
these might give you some clue about what's going on.
Historic issues:
sequences being accessed very frequently and declared with NOCACHE (or very
small CACHE) or with ORDER.
Some bugs relating to tablespace handling, undo handling, VPD, the result in
massive overload on dc_tablespaces, dc_users, dc_objects, dc_rollback_segments
(though I can't remember if any of them were still around in 12.2).
Regards
Jonathan Lewis
On Mon, 4 Oct 2021 at 10:23, Krishnaprasad Yadav <chrishna0007@xxxxxxxxx> wrote:
Hi Experts ,
There is a situation around which is causing an event : latch: ges resource
hash list in database . CRS /RDBMS is 12c2 version on solaris
DB is 2 node RAC , but due to application compatibility node 2 always remains
down. however on node 1 we lot of query waiting for latch : ges resource hash
list ,(no specific query is ,but all )
on node 2 ,the complete CRS stack is down , not sure why this event is popping
up on node1 .
Parallely CPU for node 1 also remains higher more than 80% most of the time .
Any light about this event will be helpful .
Regards,
Krishna