Re: index with very high LIO

  • From: Greg Rahn <greg@xxxxxxxxxxxxxxxxxx>
  • To: Jeremy Schneider <jeremy.schneider@xxxxxxxxxxxxxx>
  • Date: Thu, 17 Nov 2011 21:32:45 -0800

So a few more comments:
- Even if you had perfect clustering factor, I still think a PX + FTS
+ HJ is a much better solution for this type/size of problem.  Would
you really want to get 7 million rows one rowid at a time?
- 20MB/s is about 1/10 of what 1 single process (serial execution) can
do on a modern processor -- for CPU light operations like count(*)
with FTS, you should be ~200MB/s.  Need to investigate why that is so
- On a modern system (say any 5400-5600 Xeon system) this query should
not run 10 hours, I'd wager it should take less than 10 minutes using
- Is BT1.LAST_MODIFIED_DATETIME the range partition key on this table
(it probably should be if it's a frequent filter)?  Daily partitions
might work quite nicely here.
- Your query uses November dates of  LAST_MODIFIED_DATETIME, yet your
stats are from 06-aug-2011.  I probably would not hurt to refresh
those - they are 2 months old.

On Thu, Nov 17, 2011 at 12:06 PM, Jeremy Schneider
<jeremy.schneider@xxxxxxxxxxxxxx> wrote:
> Yeah, you're not the first one to point that out...  I'm doing some
> tests now.  FWIW it's 7 million out of over a billion rows that are all
> stuffed into a 400GB heap table with the worst possible clustering
> factor on the foreign key we're using to join.  Also I haven't been able
> to get more than 20MB/s total throughput with parallel full scans,
> although single-block reads go at a somewhat decent rate of 5ms.  But
> all the same, I'm checking out the hash join... likely it will turn out
> to be more efficient.
> I should have checked this out more in-depth awhile ago...  I got a
> little carried away with maths and mistakenly thought that it was
> totally unrealistic.  This is the problem with ruling things out based
> on "calculations".  There's little to verify your correctness.  It's
> much easier to find your mistakes when you rule things out with
> *testing* instead...  I should know this by now, for some reason I'm
> still making the occasional poor judgements in the heat of the moment.  :)
> -Jeremy
> On 11/17/2011 01:24 PM, Greg Rahn wrote:
>> Am I reading this right?  7 million rows joined using NESTED LOOPS
>> join?  Wouldn't a parallel hash join be better/faster?
>> On Thu, Nov 17, 2011 at 3:34 AM, Jeremy Schneider
>> <jeremy.schneider@xxxxxxxxxxxxxx> wrote:
>>> PS... 8k block size; segment of index in question is 100G.  Query is
>>> pulling 7 million rows from a join of two billion row tables...

Greg Rahn

Other related posts: