So a few more comments: - Even if you had perfect clustering factor, I still think a PX + FTS + HJ is a much better solution for this type/size of problem. Would you really want to get 7 million rows one rowid at a time? - 20MB/s is about 1/10 of what 1 single process (serial execution) can do on a modern processor -- for CPU light operations like count(*) with FTS, you should be ~200MB/s. Need to investigate why that is so low. - On a modern system (say any 5400-5600 Xeon system) this query should not run 10 hours, I'd wager it should take less than 10 minutes using PX. - Is BT1.LAST_MODIFIED_DATETIME the range partition key on this table (it probably should be if it's a frequent filter)? Daily partitions might work quite nicely here. - Your query uses November dates of LAST_MODIFIED_DATETIME, yet your stats are from 06-aug-2011. I probably would not hurt to refresh those - they are 2 months old. On Thu, Nov 17, 2011 at 12:06 PM, Jeremy Schneider <jeremy.schneider@xxxxxxxxxxxxxx> wrote: > Yeah, you're not the first one to point that out... I'm doing some > tests now. FWIW it's 7 million out of over a billion rows that are all > stuffed into a 400GB heap table with the worst possible clustering > factor on the foreign key we're using to join. Also I haven't been able > to get more than 20MB/s total throughput with parallel full scans, > although single-block reads go at a somewhat decent rate of 5ms. But > all the same, I'm checking out the hash join... likely it will turn out > to be more efficient. > > I should have checked this out more in-depth awhile ago... I got a > little carried away with maths and mistakenly thought that it was > totally unrealistic. This is the problem with ruling things out based > on "calculations". There's little to verify your correctness. It's > much easier to find your mistakes when you rule things out with > *testing* instead... I should know this by now, for some reason I'm > still making the occasional poor judgements in the heat of the moment. :) > > -Jeremy > > > On 11/17/2011 01:24 PM, Greg Rahn wrote: >> Am I reading this right? 7 million rows joined using NESTED LOOPS >> join? Wouldn't a parallel hash join be better/faster? >> >> On Thu, Nov 17, 2011 at 3:34 AM, Jeremy Schneider >> <jeremy.schneider@xxxxxxxxxxxxxx> wrote: >>> PS... 8k block size; segment of index in question is 100G. Query is >>> pulling 7 million rows from a join of two billion row tables... > -- Regards, Greg Rahn http://structureddata.org -- //www.freelists.org/webpage/oracle-l