Re: histogram oddity: Height-balanced histograms where NDV < num_buckets

  • From: "Charles Schultz" <sacrophyte@xxxxxxxxx>
  • To: oracle-l <oracle-l@xxxxxxxxxxxxx>
  • Date: Fri, 11 Aug 2006 12:07:40 -0500

Further insights:

The system view (dba_tab_col_statistics) is built upon a couple other nested
system views. After digging in, I came across this piece of code from the
base view (dba_tab_cols):
      case when nvl(h.row_cnt,0) = 0 then 'NONE'
           when (h.bucket_cnt > 255
                 or
                 (h.bucket_cnt > h.distcnt and h.row_cnt = h.distcnt
                  and h.density*h.bucket_cnt*2 <= 1))
               then 'FREQUENCY'
           else 'HEIGHT BALANCED'
      end,

In order by this column to be "HEIGHT BALANCED", it has to meet the
following criteria:

  - nvl(h.row_cnt,0) != 0 (more than 0 rows)
  - h.bucket_cnt <= 255 (this was just a tad weird)
  - h.bucket_cnt > h.distcnt or h.row_cnt = h.distcnt or
  h.density*h.bucket_cnt*2 <= 1


After checking off the first two items for a particular example (table INDPART$, column TS#, bucket_cnt = 217), I looked at the 3rd. I found out that bucket_cnt > h.distcnt. So that explains why the view reported this information, but why is bucket_cnt > h.distcnt? In working through this example, I also found cases where bucket_cnt > 254, which encouraged me to check for the largest: SQL > select max(bucket_cnt) from sys.hist_head$;

MAX(BUCKET_CNT)
---------------
       2631537


Wow. I guess I really do not know what bucket_cnt means - I thought it was a count of histogram buckets, which I understaood to be limited to 254.

But back to the example (INDPART$.TS#) for a second.
SQL > select count(*), count(distinct TS#) from sys.INDPART$;

 COUNT(*) COUNT(DISTINCTTS#)
---------- --------------------
      236                    1

To make matters even more confusing
SQL > select ENDPOINT_NUMBER,ENDPOINT_VALUE from dba_histograms where
table_name = 'INDPART$' and column_name = 'TS#';

ENDPOINT_NUMBER ENDPOINT_VALUE
--------------- --------------
           217             79


Color me confused. In some ways, I hope I am missing something obvious. *grin*

PS - some day I need to get Wolfgang or Jonathan to further explain the
HEIGHT BALANCED Density formula. I am having trouble with "the sum of the
squared frequencies fo all non-popular values divided by the sum of the
frequencies of all non-popular values times the count of rows with not null
values". I would love to see those complex examples that you mention in
writing, and I regret missing the presentations to see in person.

On 8/11/06, Charles Schultz < sacrophyte@xxxxxxxxx> wrote:

Oracle 10.2.0.2, Solaris 8

Wondering if anyone could explain this. We have 105 SYS-owned tables where
num_distinct < num_buckets according to dba_tab_col_statistics (see
following query). Even weirder, some columns have 217 buckets for only 1
distinct value (verified in dba_tab_histograms). What kind of sense does
that make? Similarly, we have one application partition that exhibits the
same symptom.

select distinct table_name, column_name, num_distinct, num_buckets
from dba_tab_col_statistics where histogram = 'HEIGHT BALANCED' and
num_distinct < num_buckets
order by num_distinct,num_buckets
/




-- Charles Schultz

Other related posts: