What Mark Powell and Tim Gorman wrote are sensible policies.
Call this a “shot-in-the-dark.”: Depending on how you moved the data, you
*could* find yourself in a period where all the data is block mastered on one
node. IF .23 seconds is the new average for the chronic load generator, they
could be the average of about .4something on one node and .05or_so on the other
node.
A similar difference could also easily result from cached or not cached with
respect to the data as well as “I’m in pga, needing undo rollback, so I do it
every time.” THAT can be cured at least temporarily by forcing the non-direct
read and cleanout. IF the need for the undo was an artifact of the move, doing
it once may be enough.
A similar difference could also easily result from a small change in the client
to server network access, especially if arraysize is small (or inadvertently
different in the client configuration of old prod versus new prod.)
Okay, several shots in the dark. All trivial to check.
Likewise, you could have similar issues apart from the chronic load increase
with the was 20 minutes, now is 1 hour job. A diagnostic trivial to do is set
the session to force parallel local and then run the query on each node. The
“bad” one may or may not show up as an increase in gcc traffic that is
significant. Or this might not help. It is an easy test to do to rule in or out
a big chunk of the possible solution spaces.
IF this happens to be a lucky shot at your problem, still listen to Powell and
Gorman. Also, JL’s scratchpad had a decent laundry list of “what the heck
changed” that you can look at in parallel with measuring the old versus new
profile to avoid guessing at all and SEE the actual difference.
Good luck,
mwf
From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx] On ;
Behalf Of paresh.gandhi@xxxxxxxxx
Sent: Friday, August 17, 2018 12:48 PM
To: oracle-l@xxxxxxxxxxxxx
Subject: Re: SQL performance issue
Dear all,
Could someone shed some light on this please?
Thanks
Paresh
On Tue, 14 Aug 2018, 15:48 Paresh Gandhi, <paresh.gandhi@xxxxxxxxx> wrote:
Hello,
I am stuck against a performance issue that was reported recently.
The issue I am currently looking at is about a job that takes 1 hour as opposed
to 20 minutes in the past. This is happening since we switched over from
primary to secondary site. Both sites are two-node rac databases. On primary
site this used to take 20 mins. We also ran this in pre-prod environment and it
takes roughly 20 mins.
I had taken a look at the AWR report from both pre-prod and PROD databases.
There is no single query that is taking long. However, there is a query that
runs for 13,000 times and is taking 0.23 seconds per execution as opposed to
0.05 against pre-prod. This explains the additional time taken in PROD (i.e.
new PROD). Please note that the plan hash value for the SQL_ID in both
environments is exactly the same.
Top wait events in PROD suggests high DB CPU and cluster related wait events.
So I suspect there is something down to getting blocks from the resource master
instance for the blocks in question. Please could someone help?
Database version is 11.2.0.4.
All the details are listed in the attached files.
I would greatly appreciate if you could share your views with me.
Many thanks
Paresh