Re: index with very high LIO

  • From: Jeremy Schneider <jeremy.schneider@xxxxxxxxxxxxxx>
  • To: Greg Rahn <greg@xxxxxxxxxxxxxxxxxx>
  • Date: Mon, 21 Nov 2011 13:46:57 -0600

Ok, here are a few useful figures.  I think I'm finally doing my math
right.  For this customer:
  35 MB/s throughput (parallel direct path)
  6 ms response times
  400GB tables
time to full-scan the table:
  400GB x 1024 = 409600 MB
  409600 MB / 35 = 11703 sec
  11703 / 60 / 60 = 3.25 hrs

how many rows in 3.25 hrs with range scan (about 1.2 blocks per row)?
  6 ms / 1000 = 0.006 sec/block
  1.2 blocks/row * 0.006 = 0.0072 sec/row
  11703 sec / 0.0072 = about 1.6 million rows in 3.25 hrs

how many rows in 3.25 hrs with index join (about 4 LIOs per row)?
  4 blocks/row * 0.006 = 0.024 sec/row
  11703 sec / 0.024 = about 500,000 rows in 3.25 hrs

I guess there's also a case where you can hash join using the index, but
i'm skipping that one for now...

However there's another factor skewing things - it's only the rows *from
disk* that count - and this customer has a HUGE buffer cache, and were
seeing over 90% BCHR on some of these million-row index-based executions.
The direct path FTS will never benefit from the buffer cache.  So
accounting for this cache effect, the breakpoints become 5 million for the
second table and 16 million for the first.  This lines up with testing
we've done.  It seems that index path is still most efficient for the first
table.  And quite conveniently, as soon as the CBO got cardinality
estimates right, it picked the index on the first table and the FTS on the
second.  :)

Regarding their lack hardware, it's worth pointing out that this is the
first and only query I'm aware of to get millions of rows on this system.
Also, I believe that the data volume has been growing over the past few
months - so it's only recently that this solo query moved close to the 10
million row range.  It's a *very* busy transactional system - so total
throughput isn't a major point.  And 6ms is decent response time.

My initial 30MB/s *was* a rough estimate from elapsed time and data
sizes... I was only looking at trace files to verify the measurement...
also the first test ran in the middle of the night and I wasn't at the
console.  :)  I reran some small tests today.  Had trouble with iptraf;
seemed to give me numbers that were clearly wrong... but I did get
byte-counts from ifconfig and verified about 30-35 MB/s max throughput on
the adapter during a parallel run.

(hrishy - right there were three ways to get the throughput estimates!)


On Fri, Nov 18, 2011 at 6:12 PM, Greg Rahn <greg@xxxxxxxxxxxxxxxxxx> wrote:

> The big problem I see here is simply that the lack hardware
> infrastructure is requiring what I refer to as "unnatural acts of
> engineering" in order to do this extract.  While this system may not
> require 10GB/s of scan bandwidth, less than 100MB/s is really
> insufficient for any production system these days -- especially one
> that requires multi hundred GB scans.  To put it in perspective, one
> can get more than 100MB/s from the SSD drive in a laptop.
> I'd think that partitioning would be a big help here to reduce the
> data required off disk (thinking txns & time series data) but if you
> have to rebuild two 400GB tables and their indexes over that slow
> storage connection to get there.
> BTW, I'm not sure why you need trace files from PX servers to
> calculate IO bandwidth - just monitor the network tput or just get a
> rough number from elapsed time and the amount of data read from disk.

+1 312-725-9249

Jeremy Schneider


Other related posts: