Re: Subquery partition pruning

  • From: amonte <ax.mount@xxxxxxxxx>
  • To: info@xxxxxxxxxxxxxxxxxxxxx
  • Date: Sun, 4 Apr 2010 00:01:52 +0200

Hi Randolf

I tried as you suggested by adding d_time dimension but I didnt specify
T.TIME_EID between 1 and 86400 because it has 86400 rows and I dont see why
I should specify a condition to get all these rows (ok we can talk about
transitivity...)

...

FROM   DW.D_DATE A,

       DW.D_TIME T
       DWH_OWN.T_MOBILE_EVENT B
WHERE  A.DATE_EID  = B.DATE_EID

  AND  T.TIME_EID = B.TIME_EID
  AND  T.TIME_EID between 1 and 86400
  AND  A.DATE_NAME = TO_DATE('20100322', 'yyyymmdd')
...


The result is with out the 1 and 86400 it reads all 751 partitions (ugly),
if I specify between a and 86400 then it runs ok although not the best plan
neither, it cartesian merge joins two dimensions (since d_date only returns
a row) and then NL the fact table, not the best way because dimension joins
returns 86400 rows. The best plan IMHO Is nl join d_date and fact table then
hash joins d_time.


Thank you

2010/4/3 Randolf Geist <info@xxxxxxxxxxxxxxxxxxxxx>

> > Thanks for the tip, I modified the query and added TIME_EID (it contains
> 24
> > hours time dimension) as follows but no luck even it looks like it is
> doing
> > pruning it filters first the second column of partition key so it is
> still
> > reading the whole table! Statistics is refreshed as well. (Global and
> > partition level statistics)
> >
> >
> > SELECT
> >        TO_CHAR(A.DATE_NAME,'YYYY-MM-DD'),
> >        B.MSISDN,
> >        B.IMSI,
> >        SUM(B.EVENT_AMOUNT)
> > FROM   DW.D_DATE A,
> >        DWH_OWN.T_MOBILE_EVENT B
> > WHERE  A.DATE_EID  = B.DATE_EID
> >   AND  A.DATE_NAME = TO_DATE('20100322', 'yyyymmdd')
> >   AND  B.TIME_EID between 1 and 86400
> > GROUP  BY TO_CHAR(A.DATE_NAME,'YYYY-MM-DD'),
> >           B.MSISDN,
> >           B.IMSI
>
>
>
> If you think about how subquery pruning is supposed to work which is to run
> a recursive subquery on the driving row sources being joined (the D_DATE
> dimension in your case) it might become obvious that adding a predicate on
> the T_MOBILE_EVENT alone probably won't help in that matter.
>
>
>
> Since the subquery pruning obviously requires all range partition keys to
> be specified in the join condition to work you would need to add the
> TIME_EID as join predicate. Now since your date dimension obviously doesn't
> cover a TIME, you probably can't simply add something like that:
>
>
>
> ...
>
> FROM   DW.D_DATE A,
>        DWH_OWN.T_MOBILE_EVENT B
> WHERE  A.DATE_EID  = B.DATE_EID
>
>   AND  A.TIME_EID = B.TIME_EID
>   AND  A.TIME_EID between 1 and 86400
>
>   AND  A.DATE_NAME = TO_DATE('20100322', 'yyyymmdd')
> ...
>
>
>
> because there is no TIME_EID in D_DATE I suppose.
>
>
>
> Since you probably have a suitable TIME dimension, you could try something
> like this:
>
>
>
> ...
>
> FROM   DW.D_DATE A,
>
>        DW.D_TIME T
>
>        DWH_OWN.T_MOBILE_EVENT B
> WHERE  A.DATE_EID  = B.DATE_EID
>
>   AND  T.TIME_EID = B.TIME_EID
>   AND  T.TIME_EID between 1 and 86400
>
>   AND  A.DATE_NAME = TO_DATE('20100322', 'yyyymmdd')
> ...
>
>
>
> But even with the underscore parameters altered as suggested by Christian
> ("_subquery_pruning_cost_factor" = 1 and "_subquery_pruning_reduction" =
> 100) I didn't get the subquery pruning to work in a simple example setup,
> since obviously the cost calculation underneath decided that the subquery
> pruning was not worth it - restricting the second table to less rows
> switched to subquery pruning at a certain threshold. Unfortunately I don't
> think that even the 10053 optimizer trace reveals the underlying cost
> calculations for the subquery pruning.
>
>
>
> Your mileage might vary so you might want to give this setup a try - if
> this is a reasonable approach is another question.
>
>
>
> If your example of a single row (or only few rows) being returned by the
> predicate on the dimension is representative for the kind of query you try
> to tune, then I would be more encouraged to look into why the optimizer
> doesn't choose the NESTED LOOP approach on its own. It looks like in your
> posted examples that the cost of the PARTITION RANGE ITERATOR operation is
> equal to the PARTITION RANGE ALL which is very likely caused by the fact
> that you join only on one of the two partition keys. In my simple test with
> a range-hash composite partitioned table the cost of the PARTITION RANGE
> ITERATOR operation was significantly lower when using the NESTED LOOP plan
> compared to the other options when joining on both columns - in fact the
> optimizer chose this plan automatically, but again only if the number of
> iterations of the loop was low, which will not be the case in your setup if
> you add the unrestricted TIME dimension holding 86400 rows.
>
>
>
> Regards,
> Randolf
>
> Oracle related stuff blog:
> http://oracle-randolf.blogspot.com/
>
> Co-author of the "OakTable Expert Oracle Practices" book:
> http://www.apress.com/book/view/1430226684
>
> http://www.amazon.com/Expert-Oracle-Practices-Database-Administration/dp/1430226684
>
>
> GRATIS für alle WEB.DE-Nutzer: Die maxdome Movie-FLAT!
> Jetzt freischalten unter http://movieflat.web.de
> -- //www.freelists.org/webpage/oracle-l

Other related posts: