Re: Better cardinality estimate when dialing optimizer_features_enable back

  • From: Neil Kodner <nkodner@xxxxxxxxx>
  • To: Randolf Geist <info@xxxxxxxxxxxxxxxxxxxxx>
  • Date: Tue, 24 Nov 2009 15:12:01 -0500

Sorry, I fixed permissions on the two google spreadsheets.

Here's the histogram
http://bit.ly/8wCq5k

and the table/index/column stats
http://bit.ly/7aWLOK

To answer your initial question, I think index access does in fact make
sense.  I'm retrieving anywhere from one to a thousand rows from letter_bin
per call.  This query is executed thousands of times as part of a batch job.

As a frame of reference, the table represents letters being mailed to an
employer(prty_id), for every employee.  There are many small businesses
 each having a few rows.  There are medium and large businesses that have
many rows.  And finally there are employment agencies which have a LOT of
rows.  Hopefully this explains the distribution of data.

As far as the transitive closure issue, there are also other queries which
query letter_bin by party_id without the join to parties - those are
affected as well.  The cardinality is always incorrect for any query by
letter_bin.party_id.



On Tue, Nov 24, 2009 at 2:56 PM, Randolf Geist
<info@xxxxxxxxxxxxxxxxxxxxx>wrote:

> > Here is the histogram on letter_bin.prty_id
> >
> > http://spreadsheets.google.com/ccc?key=0ArUTHdqN0J--dDVXZ1ZPTGlROUxfU0F
> > XV3RNdFVINlE&hl=en
> >
> > I've never manually crafted a histogram before.
>
> Neil,
>
> I can't access that Google spreadsheet - it tells me I lack privileges to
> do so.
>
> How many distinct values are in the column - how is the actual
> distribution?
>
> Are there values that are so rare that the index access makes actually
> sense? If not consider simply removing the histogram - or consider removing
> the index.
>
> Instead of fiddling with the histogram itself you can simply try to set the
> DENSITY manually of the column using DBMS_STATS.SET_COLUMN_STATS to the
> value currently shown in DENSITY - according to Alberto's post this should
> revert to the old density calculation using the DENSITY value (the new
> calculation ignores the value from the dictionary), but I haven't tested
> this recently myself. That should modify the cardinality estimate of
> unpopular column values.
>
> The plans that you've posted: There is transitive closure taking place on
> PRTY_ID between LETTER_BIN and PARTIES - which leads to the side effect that
> the optimizer removed the join condition between LETTER_BIN and PARTIES.
> With that in place the cardinality estimate might be lower and closer to
> reality. It's just a workaround and does not address the root cause but what
> do you get if you add manually the "PARTIES.PRTY_ID = <val>" to the queries?
>
> This workaround should retain the join predicate and thus lower the
> cardinality estimate.
>
> Regards,
> Randolf
>
> Oracle related stuff blog:
> http://oracle-randolf.blogspot.com/
>
> Co-author of the forthcoming "OakTable Expert Oracle Practices" book:
> http://www.apress.com/book/view/1430226684
>
> http://www.amazon.de/Expert-Oracle-Practices-Database-Administration/dp/1430226684
> ______________________________________________________
> GRATIS für alle WEB.DE-Nutzer: Die maxdome Movie-FLAT!
> Jetzt freischalten unter http://movieflat.web.de
>
>

Other related posts: