RE: Performance Tool Question (CONFIO DBFlash ) ...

  • From: "Bruce McCartney" <bruce.mccartney@xxxxxxxxxxxxxxxxx>
  • To: <sbootsma@xxxxxxxxxxxxx>, <oracle-l@xxxxxxxxxxxxx>
  • Date: Wed, 16 Nov 2005 20:37:52 -0700

Hi Sam,
It depends.  With statistical sampling - you will introduce error.  The
magnitude of the error is dependent on your workload and how long you sample
for.  I have used a similar product developed by Precise Software (Now
Veritas), which used direct attach to the SGA to read the X$Tables (which
are just memory structures anyway) on a statistical sampling cycle from 1/s
to 999/s.  The key here is that it has to be over lots of samples to reduce
the probability of significant statistical error.  You know this polls that
report opinions +-% is possible by managing the sample size.  With precise;
the overhead was very low; and allowed us to collect and save a weeks worth
of detail data.  I found in the field that 3/s sampling was good enough for
us to find problematic statements in practice.  Cary Milsap covers this
problems associated with scoping/sampling in his book on optimizing oracle
and explains also a superior method for actually resolving performance by
profiling where time went via extended tracing.  That method also suffers
from measurement resolution and  quantization error (pg 155-170).  Cary
argues that it is not significant and I would agree completely with the
extended tracing method and have seen it not be a huge issue with sampling.

 
One thing to be aware of is the effect of 'select'ing  every second and the
way it influences the performance of the system (known as the anthropic
principle http://www.anthropic-principle.com/primer.html).  The thing I
liked about the direct memory attached approach is that you minimize you
intrusion on the 'system performance'.  I was able to fence the CPU and
memory used by collecting memory samples of the direct attach (no oracle
connection, no SGA use, no buffer cache use). You may want to try to
quantify the impact of a sql-based sampling approach.
 
 
Hope that helps...
----------------------------------------------------------------------------
-------------------------------------------------------------------------

Bruce McCartney |DBIS |*403 615 3350 | bruce.mccartney@xxxxxxxxxxxxxxxxx

 


  _____  

From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx]
On Behalf Of Sam Bootsma
Sent: November 16, 2005 2:20 PM
To: oracle-l@xxxxxxxxxxxxx
Subject: Performance Tool Question (CONFIO DBFlash ) ...



I have been reviewing the white papers for the DBFlash product from CONFIO.
I am impressed, but I do have one reservation.  

 

DBFlash works by running a SQL statement (or group of statements) against X$
tables on the monitored database once every second.  The data is pulled
across the network to a repository on a separate database server and
database instance and analyzed.  A gui client can then access the repository
and tell you which SQL statements are waiting the most, and what wait events
the SQL statements are waiting on.  It can also do this for database users,
OS users, programs, and a few more.

 

My concern has to do with the frequency of polling (once every second).
Oracle records waits in micro seconds, there are 1 million micro-seconds in
a second (I think).  So a wait can last 10,000 microseconds, and not be
picked up by the software.  In fact, I would think that most waits would not
be picked up by the software because most waits probably start after one
snapshot and finish before the start of the next snapshot.  

 

I posed this question to CONFIO, and this is the response from their DBA:

 

1) With wait event tuning, the events occurring more frequently will be
caught by DBFlash.  We do statistical sampling which by definition will miss
some things.  However, the problems, i.e. the wait events happening more
frequently or waiting more time, will be caught by DBFLash.  In other words,
DBFlash will be able to find your problematic waits which is what you want.

 

What do you guys think?  Is the integrity of the performance data
questionable because of the "long" delays between polling?  Or is the
response from CONFIO valid?

 

Thanks!

 

 

Sam Bootsma

George Brown College

 <mailto:sbootsma@xxxxxxxxxxxxx> sbootsma@xxxxxxxxxxxxx

416-415-5000 x4933

 

Other related posts: