RE: Histograms - SIZE clause & num_buckets anomaly (with apiggyback question)

  • From: "Lex de Haan" <lex.de.haan@xxxxxxxxxxxxxx>
  • To: <oracle-l@xxxxxxxxxxxxx>
  • Date: Tue, 3 Aug 2004 15:00:14 +0200

Charudatta,
to answer your piggyback question:
that's about popular values,
if I interpret the text correctly.
"compression" is slightly confusing here --
"reduction" would have been better.

just create a histogram, make sure there are some popular values,
check out USER_HISTOGRAMS, and you will see that some buckets
seem to be missing -- there are gaps in the bucket sequence numbers.
Oracle saves some space by storing a single row for the *last* bucket
corresponding to the popular value only.

Kind regards,
Lex.

---------------------------------------------
visit my website at http://www.naturaljoin.nl
---------------------------------------------



-----Original Message-----
From: oracle-l-bounce@xxxxxxxxxxxxx
[mailto:oracle-l-bounce@xxxxxxxxxxxxx]On Behalf Of Charudatta Joshi
Sent: Tuesday, August 03, 2004 13:33
To: oracle-l@xxxxxxxxxxxxx
Subject: RE: Histograms - SIZE clause & num_buckets anomaly (with
apiggyback question)



Thanks Lex,

Actually I presupposed skew in data, while asking the question.

You are right that in quite a few cases the CBO will correctly deduce
effective cardinality using h.b. For e.g. for a table with 10000 rows and
100 distinct values, if 99 DVs appear once or twice, an h.b. with 10 would
be more than sufficient. I need to look out for such cases.

Thanks once again.

Now the piggyback question:
While looking for the reason for less number of buckets than specified, I
came upon mention of 'Histogram Compression'. From one of the FMs (relevant
portion in caps):

<FM>
Note: The number of buckets in a histogram is
specified in the SIZE parameter of the SQL statement
ANALYZE. However, Oracle does not create a
histogram with more buckets than the number of
rows in the sample. ALSO, IF THE SAMPLE CONTAINS ANY
VALUES THAT ARE VERY REPETITIOUS, ORACLE CREATES THE
SPECIFIED NUMBER OF BUCKETS, BUT THE VALUE INDICATED
BY THIS COLUMN MAY BE SMALLER BECAUSE OF AN INTERNAL
COMPRESSION ALGORITHM.
</FM>

How to identify if the actual number of buckets is different from the value
shown in NUM_BUCKETS column of USER_TAB_COLUMNS?

Thanks & regards,
Charu.


-----Original Message-----
From: oracle-l-bounce@xxxxxxxxxxxxx
[mailto:oracle-l-bounce@xxxxxxxxxxxxx]On Behalf Of Lex de Haan
Sent: Tuesday, August 03, 2004 5:15 PM
To: oracle-l@xxxxxxxxxxxxx
Subject: RE: Histograms - SIZE clause & num_buckets anomaly


Hi Charudatta,
I think you said it all yourself already :-)

choosing anything "blindly" sounds dangerous/adventurous, isn't it?
first of all, your data distribution should be the main factor,
not the number of distinct values; if you have an even data distribution,
histograms will give you no benefits at all.

if your data is skewed, and if you can afford the maintenance overhead,
having f.b. histograms provides the ultimate precision --
but I certainly think you should test this.

about your example (a column with 100 distinct values) it could be
that e.g. 10 h.b. buckets will be enough to guide the optimizer properly.

in my histogram testing, I typically look at two things:
- how does the estimated CBO costs change for a typical SQL statement
- how many popular values show up

you probably know this, but popular values are values showing up more than
once as an endpoint value in an h.b. histogram; they are treated in a
special way by the CBO for equality searches.

Kind regards,
Lex.

---------------------------------------------
visit my website at http://www.naturaljoin.nl
---------------------------------------------



-----Original Message-----
From: oracle-l-bounce@xxxxxxxxxxxxx
[mailto:oracle-l-bounce@xxxxxxxxxxxxx]On Behalf Of Charudatta Joshi
Sent: Tuesday, August 03, 2004 12:28
To: oracle-l@xxxxxxxxxxxxx
Subject: RE: Histograms - SIZE clause & num_buckets anomaly



Hi Lex,

Been thinking about this comment of yours:

>>I agree that frequency histograms give the ultimate information about a
>>column population, compared with height-based histograms, but on the other
>>hand the performance should be "comparable" -- that is, the height-based
>>histogram with a reasonably high number of buckets shouldn't give bad
>>execution plans.

Currently, wherever possible & applicable, I (blindly) have the number of
buckets equal to the number of distinct values. This means that whenever
number of distinct values is less than 255, that is my bucket size.

Given the fact that histogram maintenance can be very expensive, I now
wonder if I should try to get the same benefit using less number of buckets.
I guess in this case the logic for determining the bucket size will vary
from case to case. But in general:

-- Should one experiment with different histogram sizes even if the NDV is
very less?

-- Roughly what difference in the number of buckets justifies using h.b.
over f.b.? For e.g. if I have to use 100 buckets for f.b. and 85 for h.b.,
that doesn't seem to be a good bargain for h.b.

I realize the answers can vary from case to case, still a general opinion
will be much appreciated.

Thanks & regards,
Charu.


*********************************************************
Disclaimer:

This message (including any attachments) contains
confidential information intended for a specific
individual and purpose, and is protected by law.
If you are not the intended recipient, you should
delete this message and are hereby notified that
any disclosure, copying, or distribution of this
message, or the taking of any action based on it,
is strictly prohibited.

*********************************************************
Visit us at http://www.mahindrabt.com

----------------------------------------------------------------
Please see the official ORACLE-L FAQ: http://www.orafaq.com
----------------------------------------------------------------
To unsubscribe send email to:  oracle-l-request@xxxxxxxxxxxxx
put 'unsubscribe' in the subject line.
--
Archives are at //www.freelists.org/archives/oracle-l/
FAQ is at //www.freelists.org/help/fom-serve/cache/1.html
-----------------------------------------------------------------


-- Binary/unsupported file stripped by Ecartis --
-- Type: text/x-vcard
-- File: Lex de Haan.vcf


----------------------------------------------------------------
Please see the official ORACLE-L FAQ: http://www.orafaq.com
----------------------------------------------------------------
To unsubscribe send email to:  oracle-l-request@xxxxxxxxxxxxx
put 'unsubscribe' in the subject line.
--
Archives are at //www.freelists.org/archives/oracle-l/
FAQ is at //www.freelists.org/help/fom-serve/cache/1.html
-----------------------------------------------------------------


*********************************************************
Disclaimer:

This message (including any attachments) contains
confidential information intended for a specific
individual and purpose, and is protected by law.
If you are not the intended recipient, you should
delete this message and are hereby notified that
any disclosure, copying, or distribution of this
message, or the taking of any action based on it,
is strictly prohibited.

*********************************************************
Visit us at http://www.mahindrabt.com

----------------------------------------------------------------
Please see the official ORACLE-L FAQ: http://www.orafaq.com
----------------------------------------------------------------
To unsubscribe send email to:  oracle-l-request@xxxxxxxxxxxxx
put 'unsubscribe' in the subject line.
--
Archives are at //www.freelists.org/archives/oracle-l/
FAQ is at //www.freelists.org/help/fom-serve/cache/1.html
-----------------------------------------------------------------


-- Binary/unsupported file stripped by Ecartis --
-- Type: text/x-vcard
-- File: Lex de Haan.vcf


----------------------------------------------------------------
Please see the official ORACLE-L FAQ: http://www.orafaq.com
----------------------------------------------------------------
To unsubscribe send email to:  oracle-l-request@xxxxxxxxxxxxx
put 'unsubscribe' in the subject line.
--
Archives are at //www.freelists.org/archives/oracle-l/
FAQ is at //www.freelists.org/help/fom-serve/cache/1.html
-----------------------------------------------------------------

Other related posts:

  • » RE: Histograms - SIZE clause & num_buckets anomaly (with apiggyback question)