Re: Query runs fine on InstanceA but slow on InstanceB...

  • From: Cary Millsap <cary.millsap@xxxxxxxxxxxx>
  • To: "dmann99@xxxxxxxxx" <dmann99@xxxxxxxxx>
  • Date: Fri, 6 Jan 2012 17:13:56 -0600

+1 on Kellyn's advice to use 10046 data.
If it were my problem, here's specifically what I would do:

1. Isolate a business task?call it T?to use as a control. Sounds like
you've already done that.

2. Trace (dbms_monitor.session_trace_enable(null,null,true,true)) a run of
T on instance A.

3. Trace a run of T on instance B.

4. Run mrskew (a utility in the Method R
Tools<http://method-r.com/software/mrtools>product) on each trace
file. The output will resemble this:

$ mrskew file.trc
CALL-NAME                       DURATION       %   CALLS      MEAN
MIN       MAX
-----------------------------  ---------  ------  ------  --------
 --------  --------
SQL*Net message from client    20.742551   64.8%   2,773  0.007480
 0.000316  0.451740
db file sequential read         8.052732   25.2%     508  0.015852
 0.000238  0.254242
EXEC                            1.717730    5.4%   1,612  0.001066
 0.000000  0.243963
log file sync                   0.453328    1.4%     156  0.002906
 0.000429  0.101154
PARSE                           0.281959    0.9%   1,446  0.000195
 0.000000  0.044992
cursor: pin S wait on X         0.261647    0.8%      24  0.010902
 0.010309  0.011166
SQL*Net more data from client   0.168530    0.5%      67  0.002515
 0.000801  0.106081
FETCH                           0.118982    0.4%   2,236  0.000053
 0.000000  0.008999
row cache lock                  0.085146    0.3%   1,030  0.000083
 0.000028  0.000383
latch: shared pool              0.049684    0.2%      11  0.004517
 0.000745  0.007201
6 others                        0.055326    0.2%   3,015  0.000018
 0.000000  0.018224
-----------------------------  ---------  ------  ------  --------
 --------  --------
TOTAL (16)                     31.987615  100.0%  12,878  0.002484
 0.000000  0.451740

5. Compare the two profiles side-by-side. The comparison will tell you
which type of database call or OS call is consuming more of your time.

6. If there's no one or two call types that really show the difference,
then group by something other than call name. For example, here's what
grouping by SQL hash value will look like:

$ mrskew *.trc --group='$hv'
'$hv'         DURATION       %   CALLS      MEAN       MIN       MAX
-----------  ---------  ------  ------  --------  --------  --------
#0            5.109908   16.0%     688  0.007427  0.000000  0.443144
364923679     3.090265    9.7%     447  0.006913  0.000000  0.412162
3150898423    1.392148    4.4%     140  0.009944  0.000238  0.097124
3129187332    0.990504    3.1%     140  0.007075  0.000000  0.238923
864012087     0.988646    3.1%      49  0.020176  0.000312  0.126767
2888850659    0.949462    3.0%      73  0.013006  0.000044  0.133055
2680677334    0.912911    2.9%     106  0.008612  0.000000  0.249404
3902328072    0.883581    2.8%     423  0.002089  0.000000  0.451740
4013634959    0.833338    2.6%     102  0.008170  0.000000  0.354018
1471759920    0.757588    2.4%     164  0.004619  0.000000  0.218535
117 others   16.079264   50.3%  10,546  0.001525  0.000000  0.310893
-----------  ---------  ------  ------  --------  --------  --------
TOTAL (127)  31.987615  100.0%  12,878  0.002484  0.000000  0.451740

7. Repeat the process of selecting a dimension to group by (e.g., call
name, hash value) and comparing side-by-side until you find the categories
where your time consumption is different. Once you learn that, then think
about how the code path works that's beneath what you've found. For
example, if all the difference is attributable to just one SQL statement,
then have your plan hash values (plh) changed? Are your read call latencies
on one instance much worse than the other?

Rather than trying to check everything under the sun that *could* be your
difference, this method will show you in just a step or two what it *is* that's
causing the difference.


Cary Millsap
Method R Corporation
http://method-r.com


On Fri, Jan 6, 2012 at 11:20 AM, Kellyn Pot'vin <kellyn.potvin@xxxxxxxxx>wrote:

> Hi Dave,
> I think you may have what you need already at your fingertips with trace
> files if you are sure this process is the clearly the culprit and not the
> whole environment.  As the query creates a more involved trace when you
> produce a 10053 trace file, why not start with a 10046 trace file, level
> 12, (bind variables and waits, so if there is any bind peeking issues,
> etc., they will show) which can then be run through TKPROF to give a clear
> output of what wait differences there are?  Method R's tools, such as Mr
> Trace also an excellent tool for deciphering waits that may not be caught
> by TKPROF and some of the tools are quite cost effective, such as the
> plug-in for Oracle Developer.
>
> This might give you the higher level information you need and if it does
> not answer your question, you can then run the 10053 trace and drill down
> to the specific waits listed in the 10046 to help limit what data you
> inspect in the larger trace file.
>
> Hope this helps,
>
>
> Kellyn Pot'Vin
> Sr. Database Administrator and Developer
> DBAKevlar.com
>
>
> ________________________________
>  From: "Taylor, Chris David" <ChrisDavid.Taylor@xxxxxxxxxxxxxxx>
> To: "'dmann99@xxxxxxxxx'" <dmann99@xxxxxxxxx>
> Cc: "'oracle-l@xxxxxxxxxxxxx'" <oracle-l@xxxxxxxxxxxxx>
> Sent: Friday, January 6, 2012 8:41 AM
> Subject: RE: Query runs fine on InstanceA but slow on InstanceB...
>
> David,
>
> I was thinking several different options as I read your email - however,
> the more I read, each option quickly disappeared as you covered the
> strategies I would have used (minus SQLT which I haven't used).
>
> So, I'd be interested in an update on this if you find a solution that
> works well.
>
> One thing you didn't cover, is are these instances on the same server? Is
> it a vm or physical? Is the query running the same way in both instances -
> same app, same workstation (or whatever)?  Have you exported stats from one
> instance to the other to see if that resolves it?
>
> Often it doesn't take a *significant* difference in stats (or parameters)
> between instances to really change performance.
>
> I'd probably start by exporting stats for the affected tables (schema
> stats if possible) from Instance A to Instance B.  If that resolves it,
> then you know it's stats related. (I *think*)
>
> Chris Taylor
> Sr. Oracle DBA
> Ingram Barge Company
> Nashville, TN 37205
>
> "Quality is never an accident; it is always the result of intelligent
> effort."
> -- John Ruskin (English Writer 1819-1900)
>
> CONFIDENTIALITY NOTICE: This e-mail and any attachments are confidential
> and may also be privileged. If you are not the named recipient, please
> notify the sender immediately and delete the contents of this message
> without disclosing the contents to anyone, using them for any purpose, or
> storing or copying the information on any medium.
>
>
> -----Original Message-----
> From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx]
> On Behalf Of David Mann
> Sent: Friday, January 06, 2012 8:42 AM
> To: oracle-l@xxxxxxxxxxxxx
> Subject: Query runs fine on InstanceA but slow on InstanceB...
>
> I need some advice. I seem to be asked more and more the dreaded "why does
> my query run faster on InstanceA and slower on InstanceB?"
>
> I provide all kinds of query tuning help to our developers, testers, and
> product support. Mostly this covers a surgical strike of 1 SQL statement. I
> investigate the issue, find a couple of ways to improve the query, document
> my improvement and provide my recommendation ...
> and then move on.
>
> This A vs B seems to take me about 4X the time of a regular tuning
> request. I am now investigating the performance of a query on InstanceA, a
> query on InstanceB, and using my ever evolving checklist of why execution
> plans can differ to drill down into each reason (instance params, data
> volume, system stats, object stats, etc) and look for significant
> differences. I can usually find a way to improve the 
> Query@InstanceBperformance but I am not always successful finding a root 
> cause for the
> difference. Plus I usually want to know this as much as my customer so I
> know what to look out for when dealing with future tuning issues with their
> apps.
>
> I have my own checklist to guide my investigation and a bunch of custom
> queries to check data volume, system statistics, object statistics (basic
> table statistics right now, would like to expand to cover histograms, etc).
> I don't think I am covering all the bases since my checklist still gets
> changed/expanded after every tuning engagement.
>
> SQLT probably covers everything I look for (and more) has been a good tool
> to get an idea of the environment surrounding the execution of a query on a
> system. but it is a little cumbersome at times (I can be called on to help
> with any of 600 DB instances - I end up installing it pretty much every
> time I need it somewhere to make sure I have the latest/greatest version).
> I had high hopes but limited success with SQLTCompare. I have only gotten
> it to compare successfully maybe 2 out of the 6 times I used it.
>
>
> I have also had some success with 10053 traces for smaller queries.
> But at least for me these quickly get out of hand so I usually only use
> them as a last resort.
>
> So my question is how do you approach something like this? Is there a
> 'best' method or tool for investigating and drilling down to a root cause
> of the same statement against 2 difference DB instances? Are there any book
> chapters, blog posts, or other resources you have seen covering this type
> of investigation? Is it time for me to get busy writing my own tool? :)
>
> --
> Dave Mann
> www.brainio.us
> www.ba6.us - Database Stuff - http://www.ba6.us/rss.xml
> --
> //www.freelists.org/webpage/oracle-l
>
>
>
>
> --
> //www.freelists.org/webpage/oracle-l
>
> --
> //www.freelists.org/webpage/oracle-l
>
>
>

--
//www.freelists.org/webpage/oracle-l


Other related posts: