Huge tempspace requirement - doesn't match explain plan

  • From: cam <kadmon@xxxxxxxxx>
  • To: oracle-l@xxxxxxxxxxxxx
  • Date: Thu, 21 Aug 2008 23:51:27 +0200

Hello all,

I'm having terrible trouble with a complex query that is consuming huge
amounts of temp space. Sometimes.

This query is large with several nested queries and uses parallelism. One
upshot of this is that the explain plans, already complex, are very
challenging to read with the addition of the PX send and receive events -
they're over a hundred lines long with deep nesting.  In text form from
awrsqrpt they are truncated halfway thru the tmpspc column which doesn't
help since temp space is a specific concern.

The instance has 320 GB (!) of temp space assigned and this query is using
it all, but the tmpspc report in the html form of the explain plan generated
by awrsqrpt seems to show a requirement of less than 10 GB. I'm monitoring
the tmp space usage with a little script which polls v$tempspace &
v$sort_usage and I can see each of 64 parallel servers using 8GB each. the
query runs for about 2 hours before failing with temp space exhaustion.
Reducing parallelism doesn't seem to have any effect on the eventual temp
requirement.

On a test instance on another but fairly similar box, the temp space usage
is peaking at around 18 GB - this is with very similar driving table
volumes, identical parallelism settings etc. Run completes in under 2 hours.
Explain plans are different but it is difficult to track down how and why,
given their sheer size and complexity. Also, different 'successful' runs
with this smaller temp requirement on the test box are themselves using
quite different explain plans.

This is 10.2.0.4, full table stats (i.e. estimate=>null, cascade=>true) have
been gathered for all tables involved in the joins and BVP has been disabled
with _OPTIM_PEEK_USER_BINDS=false to avoid histograms confusing the picture
for the moment.

So, I suppose my questions are as follows for anyone willing to read this
far... I know I haven't posted much to go on - I'm pretty sure I'm unable to
post actual details for reasons of IPR etc.

- is there a workaround for awrsqrpt truncating horizontal output? I'm
guessing it would just be to edit the script but haven't dared to yet... The
HTML version is an acceptable workaround but I thought it might be nice to
be able to use diff on text versions of plans. I'd also really like to be
able to see the parent_id of a rowset - at depths like this, the indentation
is close to useless for figuring out siblings and parents.

- Why would the explain plans estimate for temspace requirement be so wildly
at odds with what is actually being consumed?

- *Very broadly speaking*, I'm assuming that this massive use of temp space
is caused by too many hash joins of full table scans - is trying to force
more nested loops a reasonable strategy to reduce temp requirement? I'd be
inclined to use optimizer_index_cost_adj and _cache parameters for this
since I want to avoid hints if possible. Is this a good way to encourage
NLs?

- Are there any tools/techniques for organising/viewing/comparing large
explain plans?

Cheers for any insights,
cam

Other related posts: