Re: UNION ALL with ROW_NUMBER vs UNION (WAS: "Wow - has anyone used ROW_NUMBER() to get around using a UNION statement when UNION ALL doesn't work??")

From: Greg Rahn <greg@xxxxxxxxxxxxxxxxxx>
To: Jeremy Schneider <jeremy.schneider@xxxxxxxxxxxxxx>, "Taylor, Chris David" <ChrisDavid.Taylor@xxxxxxxxxxxxxxx>
Date: Fri, 12 Aug 2011 14:15:35 -0700

So, let's start from the top...

At a high level the difference here is a case of "less bad or less
unlucky" - the cardinality estimates are horrid here which makes be
believe there are probably non-representative stats.  This raises the
question: What is the stats strategy and dbms_stats command being used
here?

The main problem with #1 is that deep in the plan cardinality is
grossly underestimated and that cascaded in to a series of NLJs where
56,540,199 rows show up and then have a FILTER applied to sort it down
to 89,200 (so much for FILTER early).  Compare that to #3 (and #2)
where there is just less than 3 orders of magnitude less rows flowing
through the NLJ and 89,276 sort down to 89,200 (because of a plan
difference where the FILTER is applied in an un-merged view (lines
22-26) which also shows up in the UNION ALL #2 plan).  This is a
chronic case of CBO optimizing for a small number of rows and getting
orders of magnitude more than that.  When that happens all bets are
off...

One can not make any generalizations about UNION vs UNION ALL vs
WINDOW SORT because the input into those row sources for these plans
are grossly different here (the latter 2 have the un-merged view with
the filter which is why they are close in execution times).  This is
why I asked early on to see the plans and why it is so important to
include that information - both the plan and execution stats/actual
row counts/etc.

Assuming this is not 11g, I'd be interested to see what plan you would
get if you gathered stats with a 100% sample size - offering the CBO
the best possible chance at stats (in 11g, the default of
auto_sample_size should be used).

Main takeaway here is to fix the input (stats) before attempting to
fix the output (the execution plan).  The CBO generally follows the
cliche "garbage (stats) in, garbage (plans) out".

Hope that helps...

On Fri, Aug 12, 2011 at 1:38 PM, Jeremy Schneider
<jeremy.schneider@xxxxxxxxxxxxxx> wrote:
> I don't know exactly how oracle would cost a UNION versus how oracle would
> cost a UNION-ALL plus a WINDOW.  But it probably depends a bit on some
> environmental factors including row-source cardinality, work area size,
> estimated cardinality of sort columns, etc...  But it seems immaterial which
> comes out cheaper from the CBO - you just have to decide which way to code
> it then help the optimizer get the best plan with your code.  Personally I'd
> go with the union and fix the optimizing of that plan, which is the simpler
> and more readable to do the same thing.  Then, a little digging to see where
> Oracle went wrong in it's guessing.  Calling
> display_plan(null,null,'ALLSTATS LAST') right after an execution can help
> with this since it'll give you a side-by-side comparison of estimated to
> actual rowcounts.  Ideally the optimizer should be getting the same plan for
> both ways of writing the query.
>
> On Fri, Aug 12, 2011 at 3:21 PM, Taylor, Chris David
> <ChrisDavid.Taylor@xxxxxxxxxxxxxxx> wrote:
>>
>> So it is interesting to me then that Oracle would pick a better plan for
>> UNION-ALL vs UNION in that case with the same information available to the
>> optimizer at the time each query is executed.
>> In that regard then, UNION would seem to be a more expensive operation
>> REGARDLESS of the additional SORT operation that must occur.
>>
>> (Theory) Following that, IF UNION-ALL is always less expensive than UNION
>> (before the additional SORT operation) then the WINDOW function applied
>> (depending on overhead) will OFTEN come out cheaper than a UNION.
>>
>> From: Jeremy Schneider [mailto:jeremy.schneider@xxxxxxxxxxxxxx]
>> Sent: Friday, August 12, 2011 2:49 PM
>> To: Taylor, Chris David
>> Cc: Greg Rahn; Wolfgang Breitling (breitliw@xxxxxxxxxxxxx); mwf@xxxxxxxx;
>> oracle-l@xxxxxxxxxxxxx
>> Subject: Re: UNION ALL with ROW_NUMBER vs UNION (WAS: "Wow - has anyone
>> used ROW_NUMBER() to get around using a UNION statement when UNION ALL
>> doesn't work??")
>>
>> Aaaah, it's clear now.  This has nothing to do with an optimization
>> between UNION vs ROW_NUMBER -- it's that your plan completely changed when
>> you switched the syntax -- most importantly the table join order.  If you
>> use the same join order with the UNION then you should see similar
>> performance.
>>
>> It seems that there are two table join orders with nearly identical cost.
>> Something in the costing of the UNION statement causes the optimizer to pick
>> the unlucky order, whereas with the UNION ALL it picks the more lucky one.

-- 
Regards,
Greg Rahn
http://structureddata.org
--
//www.freelists.org/webpage/oracle-l

References:
- UNION ALL with ROW_NUMBER vs UNION (WAS: "Wow - has anyone used ROW_NUMBER() to get around using a UNION statement when UNION ALL doesn't work??")
  - From: Taylor, Chris David
- Re: UNION ALL with ROW_NUMBER vs UNION (WAS: "Wow - has anyone used ROW_NUMBER() to get around using a UNION statement when UNION ALL doesn't work??")
  - From: Jeremy Schneider
- RE: UNION ALL with ROW_NUMBER vs UNION (WAS: "Wow - has anyone used ROW_NUMBER() to get around using a UNION statement when UNION ALL doesn't work??")
  - From: Taylor, Chris David
- Re: UNION ALL with ROW_NUMBER vs UNION (WAS: "Wow - has anyone used ROW_NUMBER() to get around using a UNION statement when UNION ALL doesn't work??")
  - From: Jeremy Schneider

Re: UNION ALL with ROW_NUMBER vs UNION (WAS: "Wow - has anyone used ROW_NUMBER() to get around using a UNION statement when UNION ALL doesn't work??")

Other related posts: