Antwort: is SIZE AUTO ever wrong? or am I wrong?

From: "Martin Klier" <Martin.Klier@xxxxxxxxxx>
To: nkodner@xxxxxxxxx
Date: Fri, 20 Nov 2009 18:14:01 +0100
Hi,

Richard Foote did some really interesting things to prove that this feature
is dangerous.
http://richardfoote.files.wordpress.com/2008/01/dbms_stats-method_opt-auto-dangers-demo-version-3.txt

--
Mit freundlichem Gruß


Martin Klier
Senior Oracle Database Administrator
------------------------------------------------------------------------------

Klug GmbH integrierte Systeme
Lindenweg 13, D-92552 Teunz
Tel.:  +49 9671/9216-245
Fax.: +49 9671/9216-112
mailto: martin.klier@xxxxxxxxxx
www.klug-is.de
------------------------------------------------------------------------------

Geschäftsführer: Johann Klug, Roman Sorgenfrei
Sitz der Gesellschaft: Teunz, USt-ID-Nr. DE175481608,
HRB Nr. 2037, Amtsgericht Amberg


|------------>
| Von:       |
|------------>
  
>--------------------------------------------------------------------------------------------------------------------------------------------------|
  |neil kodner <nkodner@xxxxxxxxx>                                              
                                                                     |
  
>--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| An:        |
|------------>
  
>--------------------------------------------------------------------------------------------------------------------------------------------------|
  |oracle-l List <oracle-l@xxxxxxxxxxxxx>                                       
                                                                     |
  
>--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Datum:     |
|------------>
  
>--------------------------------------------------------------------------------------------------------------------------------------------------|
  |20.11.2009 18:08                                                             
                                                                     |
  
>--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Betreff:   |
|------------>
  
>--------------------------------------------------------------------------------------------------------------------------------------------------|
  |is SIZE AUTO ever wrong?  or am I wrong?                                     
                                                                     |
  
>--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Gesendet   |
| von:       |
|------------>
  
>--------------------------------------------------------------------------------------------------------------------------------------------------|
  |oracle-l-bounce@xxxxxxxxxxxxx                                                
                                                                     |
  
>--------------------------------------------------------------------------------------------------------------------------------------------------|





Does Oracle ever make the wrong decision when using FOR COLUMN column_name
SIZE AUTO?

I ask because I have table letter_bin with 4778159 rows, claims with 487842
rows.

letter_bin is indexed by both clm_id and prty_id

Here are the distributions of the counts of the columns:

table claims column clmt_prty_id

     count        min        max        avg     stddev
---------- ---------- ---------- ---------- ----------
    319156          1         29 1.52855657 .939775607

table letter_bin column prty_id

     count        min        max        avg     stddev
---------- ---------- ---------- ---------- ----------
    234326          1    1712371 20.3912157 3542.44185

table letter_bin column clm_id

     count        min        max        avg     stddev
---------- ---------- ---------- ---------- ----------
    470855          1       1424  10.147916 7.88601772


I just now realized that the numbers for letter_bin include NULLs.

excluding NULLs, I get

table letter_bin column clm_id

     count        min        max        avg     stddev
---------- ---------- ---------- ---------- ----------
    470859          1       1424 10.1464345 7.82825864

table letter_bin column prty_id

     count        min        max        avg     stddev
---------- ---------- ---------- ---------- ----------
    234326          1      42314 13.0835759 188.911964


letter_bin is typically queried by both clm_id and prty_id at the same
time.  although both columns are NULL able,  when the table is queried in a
certain fashion, both values are used, one as a literal and one as a result
of a join.  prty_id is supplied, clm_id is the result of querying by
clmt_prty_id

I analyzed letter bin using for columns prty_id size auto, and for columns
clm_id size auto.  both 100% samples, execute separately.

When I compute statistics on column prty_id, I get 254 buckets.which I
expect.  When I compute statistics on clm_id, I get no histogram although I
think I should be expecting one.  To me, the concept of 'skewed' is
somewhat nebulous but I think I ought to get a histogram here.

I prefer letter_bin to by clm_id and not by prty_id.  I say this because
the average number of rows in letter_bin for a clm_id should be 10 with a
lower SD

an example query would be along the lines of
select xyz from letter_bin a,claims b where a.prty_id = 123 and
a.clm_id=b.clm_id and b.clmt_prty_id=456.
The query typically uses access path of letter_bin-party_id and then join
to claims.  I would expect it to be the other way around.

select count(*) "count",count(distinct &2)"distinct",min(cnt)"min", max
(cnt)"max", avg(cnt)"avg", stddev(cnt)"stddev"
  from ( select /*+ PARALLEL (a 6) */ &2, count(*) cnt from &1 a group by
&2 );

Now here's where the plot thickens:  This query is executed thousands of
times in a batch job.  I think it would be more efficient for letter_bin to
be accessed by clm_id, and its not.   It's getting accessed by prty_id and
HJ to claims.  Am I being unreasonable, or are my stats no good?


--
//www.freelists.org/webpage/oracle-l






--
//www.freelists.org/webpage/oracle-l
References:
- is SIZE AUTO ever wrong? or am I wrong?
  - From: neil kodner
Antwort: is SIZE AUTO ever wrong? or am I wrong?

Other related posts: