RE: index with very high LIO

  • From: "Mark W. Farnham" <mwf@xxxxxxxx>
  • To: <greg@xxxxxxxxxxxxxxxxxx>, "'Jeremy Schneider'" <jeremy.schneider@xxxxxxxxxxxxxx>
  • Date: Tue, 22 Nov 2011 10:15:47 -0500

It is a pity that there seems to be no way to use an index to grab block
segments of a table to fts (and yes, then refilter the rows).

True, in the polar case of a lousy cluster factor an index might have one
lonely row in each block, or worse for the idea of segmented FTS, one row
per 1 megabyte (or 4 megabyte) chunk of rows.

But even with a lousy cluster factor, all the desired rows by the index
might be (now polar good case) in a few megabytes of contiguous blocks of
FTS scan order. I suppose this would be an adventure to build. You'd need to
evaluate a higher order cluster by physical segment chunk analysis for a
specific key or set of keys on the fly to see whether that was likely to be
better than just scanning the whole thing. Setting off scans in parallel one
per contiguous chunk paired with some number of scans picking up a list of
onesie-twosie blocks might FTS some quite small fraction of the whole table,
having read a small part of the index (but not using the index entries to
yank single table blocks).

Just a thought. Might be more effort than value, but I think in the long
haul everything just gets too big for a full FTS, even in parallel. Heh, the
software effort would have been more defensible 20 years ago.



-----Original Message-----
From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx]
On Behalf Of Greg Rahn
Sent: Monday, November 21, 2011 3:17 PM
To: Jeremy Schneider
Cc: oracle-l@xxxxxxxxxxxxx
Subject: Re: index with very high LIO

- could you leverage a parallel NLJ to engage more CPUs on the problem to
reduce the run time? I'm assuming that if you have such a high BCHR it must
be a CPU bound query then, right? How many CPUs do you have? ( I guess that
depends on how fast you need this to run and how much resources are
- do they use RMAN or back this up via filer snaps?
- what did you change to get different CBO cardinality estimates?  Stats?
- any chance of partitioning helping out?

Can you put a number on HUGE [buffer cache]?  My HUGE may be different than
your HUGE.  :D

I do agree with you that if you can only get 35MB/s then this system can't
really leverage PX FTS the way a more modern system could (like a 2 socket
Xeon with 1-2 GB/s).

On Mon, Nov 21, 2011 at 11:46 AM, Jeremy Schneider
<jeremy.schneider@xxxxxxxxxxxxxx> wrote:
> Ok, here are a few useful figures.  I think I'm finally doing my math 
> right.  For this customer:
>   35 MB/s throughput (parallel direct path)
>   6 ms response times
>   400GB tables
> time to full-scan the table:
>   400GB x 1024 = 409600 MB
>   409600 MB / 35 = 11703 sec
>   11703 / 60 / 60 = 3.25 hrs
> how many rows in 3.25 hrs with range scan (about 1.2 blocks per row)?
>   6 ms / 1000 = 0.006 sec/block
>   1.2 blocks/row * 0.006 = 0.0072 sec/row
>   11703 sec / 0.0072 = about 1.6 million rows in 3.25 hrs
> how many rows in 3.25 hrs with index join (about 4 LIOs per row)?
>   4 blocks/row * 0.006 = 0.024 sec/row
>   11703 sec / 0.024 = about 500,000 rows in 3.25 hrs
> I guess there's also a case where you can hash join using the index, 
> but i'm skipping that one for now...
> ==========
> However there's another factor skewing things - it's only the rows 
> *from
> disk* that count - and this customer has a HUGE buffer cache, and were 
> seeing over 90% BCHR on some of these million-row index-based executions.
> The direct path FTS will never benefit from the buffer cache.  So 
> accounting for this cache effect, the breakpoints become 5 million for 
> the second table and 16 million for the first.  This lines up with 
> testing we've done.  It seems that index path is still most efficient 
> for the first table.  And quite conveniently, as soon as the CBO got 
> cardinality estimates right, it picked the index on the first table 
> and the FTS on the second.  :)
> Regarding their lack hardware, it's worth pointing out that this is 
> the first and only query I'm aware of to get millions of rows on this
> Also, I believe that the data volume has been growing over the past 
> few months - so it's only recently that this solo query moved close to 
> the 10 million row range.  It's a *very* busy transactional system - 
> so total throughput isn't a major point.  And 6ms is decent response time.
> My initial 30MB/s *was* a rough estimate from elapsed time and data
> I was only looking at trace files to verify the measurement... also 
> the first test ran in the middle of the night and I wasn't at the 
> console.  :) I reran some small tests today.  Had trouble with iptraf; 
> seemed to give me numbers that were clearly wrong... but I did get 
> byte-counts from ifconfig and verified about 30-35 MB/s max throughput 
> on the adapter during a parallel run.

Greg Rahn


Other related posts: