Re: index with very high LIO

  • From: Jeremy Schneider <jeremy.schneider@xxxxxxxxxxxxxx>
  • To: Greg Rahn <greg@xxxxxxxxxxxxxxxxxx>
  • Date: Fri, 18 Nov 2011 12:46:14 -0600

The reason for 20MB/s is the storage network.  They're using DNFS with a
single (bonded-backup-mode) 1G network card.  (FYI, tcp window sizes are
16M.)  Last night I ran a parallel(64) query and I think that I was
still only getting something around 30 MB/s throughput w/direct path...
but it's tough to piece together 64 tracefiles to get an accurate total
throughput measure.  (By another measure I saw 60 MB/s but I think that
might have included some cache effect...)  At any rate the best
theoretical possible on a one gig card is around 100MB/s.  I don't think
it's possible to full scan two 400 GB tables over a 1GB network link in
10 minutes, is it?  Also, this is not a warehouse system... it's
*heavily* operational, and the operational workload is running
satisfactorily with 1G cards.  In fact this query is trying to extract a
day's worth of data to get it INTO a warehouse.

I do think the slow link skews the usual points where it becomes
beneficial to switch from index to hash methods.  I'd wager that on this
system, a couple million rows were probably faster with the index. 
(Whereas generally a hash might win for this many rows.)  Recently this
query has inched toward 10 million, and perhaps now a hash join is going
to be better -- but it's still going to take many hours.

But I thinking that I've finally started to inch toward the real
problem: rollbacks from other activity on the system -- in fact this
could also explain why recently, the problem seemed to accellerate
exponentially.  This is why I originally asked about the 13 gets/row on
the index... and I think Jonathan's pointer got me moving in the right
direction.  In fact, we would still have the same problem with a hash
join - because it's still going to take several hours on this busy
system with only a 1GB storage network.

I'm thinking that the real answer is to use shorter-running queries...
something as simple as getting 2-4 hours of data at a time instead of 24
hours at once could in fact completely resolve this problem.  Which
interestingly, might switch us back to nested loop joins again since
we're going to be getting a lot less data at once.

Still in progress... so additional thoughts (from anyone on the list)
are welcomed.


PS... we have also been discussing stats and partitioning internally...
I just view the above issues as primary.

+1 312-725-9249

Jeremy Schneider

On 11/17/2011 11:32 PM, Greg Rahn wrote:
> So a few more comments:
> - Even if you had perfect clustering factor, I still think a PX + FTS
> + HJ is a much better solution for this type/size of problem.  Would
> you really want to get 7 million rows one rowid at a time?
> - 20MB/s is about 1/10 of what 1 single process (serial execution) can
> do on a modern processor -- for CPU light operations like count(*)
> with FTS, you should be ~200MB/s.  Need to investigate why that is so
> low.
> - On a modern system (say any 5400-5600 Xeon system) this query should
> not run 10 hours, I'd wager it should take less than 10 minutes using
> PX.
> - Is BT1.LAST_MODIFIED_DATETIME the range partition key on this table
> (it probably should be if it's a frequent filter)?  Daily partitions
> might work quite nicely here.
> - Your query uses November dates of  LAST_MODIFIED_DATETIME, yet your
> stats are from 06-aug-2011.  I probably would not hurt to refresh
> those - they are 2 months old.
> On Thu, Nov 17, 2011 at 12:06 PM, Jeremy Schneider
> <jeremy.schneider@xxxxxxxxxxxxxx> wrote:
>> Yeah, you're not the first one to point that out...  I'm doing some
>> tests now.  FWIW it's 7 million out of over a billion rows that are all
>> stuffed into a 400GB heap table with the worst possible clustering
>> factor on the foreign key we're using to join.  Also I haven't been able
>> to get more than 20MB/s total throughput with parallel full scans,
>> although single-block reads go at a somewhat decent rate of 5ms.  But
>> all the same, I'm checking out the hash join... likely it will turn out
>> to be more efficient.
>> I should have checked this out more in-depth awhile ago...  I got a
>> little carried away with maths and mistakenly thought that it was
>> totally unrealistic.  This is the problem with ruling things out based
>> on "calculations".  There's little to verify your correctness.  It's
>> much easier to find your mistakes when you rule things out with
>> *testing* instead...  I should know this by now, for some reason I'm
>> still making the occasional poor judgements in the heat of the moment.  :)
>> -Jeremy
>> On 11/17/2011 01:24 PM, Greg Rahn wrote:
>>> Am I reading this right?  7 million rows joined using NESTED LOOPS
>>> join?  Wouldn't a parallel hash join be better/faster?
>>> On Thu, Nov 17, 2011 at 3:34 AM, Jeremy Schneider
>>> <jeremy.schneider@xxxxxxxxxxxxxx> wrote:
>>>> PS... 8k block size; segment of index in question is 100G.  Query is
>>>> pulling 7 million rows from a join of two billion row tables...

Other related posts: