Solaris T5220 server problem

  • From: Wolfson Larry - lwolfs <lawrence.wolfson@xxxxxxxxxx>
  • To: "oracle-l@xxxxxxxxxxxxx" <oracle-l@xxxxxxxxxxxxx>
  • Date: Thu, 28 Apr 2011 00:15:15 +0000

Hello!
            Finally convinced client long running code wasn't database, 
application, network problem.

Noticed when I was running one of my queries, that usually  runs in a tenth of 
a second elapsed time, was taking about 8 seconds on production server
8G, 32 CPUs with both 10.2.0.4 prod & test (separate ORACLE_HOMES) on same 
server.

Wanted Unix admin to run some type of Dtrace.   I had already run truss a 
number of times.
Didn't get that, but SA found  echo was running about 30-60 times longer on 
this server than dozens of others we manage (most not T5220s).
They ran GUDS, which didn't help and then support person came up with this from 
a buddy he reached out to.


He suggested turning page coalescing off, which we found to be beneficial in 
many performance escalations.  This is something you can do on the fly and if 
it's found to have a desirable effect, it can be permanently set in 
/etc/system. There are no know downsides to doing this in the real world.



Once this is enabled, could your DBA's run some test jobs which can  be 
compared against timings for the same jobs when the test DB is down?



Here are the dirty details from previous communications on the topic:

quote --->

Large pages are not a problem. It is finding or coalescing them when none is 
available needs improvment. LPOOB feature is designed to improve application 
out of box performance. There are number of LPOOB fixes already been integrated 
in Sol10 U4 and more are planned for U5 and U6.



It is wiser to disable coalescing than disable LPOOB. If you don't want page 
coalescing then set following tunables dynamically or in /etc/system file.

And
What I didn't mention before is that the page coalescing issue is specifically 
mentioned with the Niagara family of CPUs, which is what this T5220, is running 
on systems running Java applications and Oracle databases (the Oracle part 
being pertinent here.)  Still not saying that it's definitively going to 
resolve the problems, but it's worth trying based on the system type, Oracle, 
and symptoms.

This is dynamic change.  Support person says we can easily toggle this back 
with no service interruption
Client is not buying that and I was just wondering  what experience anyone else 
has had with T5220s?

Support said they did this mostly for SAP and while we run a number of SAPs, 
not on this server which I would categorize as relatively lightly loaded.
Prod is far busier during nightly batch window.   Scheduled stats run well 
prior to that for 3-13 minutes.

Server and database have been up close to 2 years and they just noticed these 
processes running longer about 6 weeks ago.
They put a new release in TEST but claim problem started just prior to that.  
Not refuting that.

Thanks for any ideas, suggestions, experiences.

  Larry
***************************************************************************
The information contained in this communication is confidential, is
intended only for the use of the recipient named above, and may be legally
privileged.

If the reader of this message is not the intended recipient, you are
hereby notified that any dissemination, distribution or copying of this
communication is strictly prohibited.

If you have received this communication in error, please resend this
communication to the sender and delete the original message or any copy
of it from your computer system.

Thank You.
****************************************************************************

Other related posts: