RE: Logic for collecting statistics

From: "Charudatta Joshi" <joshic@xxxxxxxxxxxxxx>
To: "Oracle List" <oracle-l@xxxxxxxxxxxxx>
Date: Fri, 30 Jul 2004 11:50:29 +0530
Hi all,

I think I should have been more specific regarding my query. Actually I am
unsure about collecting stats regularly, varying estimate size and
automatically deciding bucket sizes. So here's a small questionnaire:

(BTW, ours is a D/W environment.)

1. Do you use MONITORING option on tables and collect stale stats regularly?
2. Do you vary your estimate percent based on table size. If so, what
criteria do you use?
3. Do you create histograms on all columns that have indexes and have low
cardinality?
4. Is there any other useful tip you can provide regarding collecting  stats
regularly?
5. Lastly, any comments on my script?

Okay, some background about my confusion:

1. I have read on this list the advise against gathering stats regularly as
it can lead to undesired query plan changes. Makes sense. But how can I be
sure that the plan that works now will work in future?

2. I have read Tom's opinion that we should gather statistics accurately and
then allow the CBO to take decision, as it considers a lot more factors than
we do. Makes sense too. But what about the 'Fallacies of the CBO'? Maybe I
shouldn't leave it to CBO?

Due to the above two conflicting assertions, I am a little unsure. Please
let me know your opinion.

Thanks & regards,
Charu.







-----Original Message-----
From: Charudatta Joshi [mailto:joshic@xxxxxxxxxxxxxx]
Sent: Thursday, July 29, 2004 4:33 PM
To: oracle-l@xxxxxxxxxxxxx
Subject: Logic for collecting statistics


Hi All,

Version : 8.1.7.4
OS: Win2K Server

I am writing a procedure to automate collection of statistics for modified
tables. The procedure calculates the estimate_percent and histogram size on
the fly, prompted by JPL's comment regarding better statistics for smaller
tables. However, the cutoff number of rows selected for deriving
estimate_percent are totally arbitrary.

Please care to share your comments if any.

Thanks & regards,
Charu.

The logic is:

-- Get the list of tables and num_rows that need to be analyzed
(User_Tab_Modifications).
-- Backup the current statistics after deleting the old ones.
-- Calculate estimate percent.
CASE
    WHEN tabrows(i) < 50000 THEN
       v_estmt_prcnt := 100;
    WHEN tabrows(i) BETWEEN 50001 AND 200000 THEN
       v_estmt_prcnt := 40;
    WHEN tabrows(i) BETWEEN 200001 AND 600000 THEN
       v_estmt_prcnt := 20;
    WHEN tabrows(i) BETWEEN 600001 AND 4000000 THEN
       v_estmt_prcnt := 10;
    WHEN tabrows(i) > 4000000 THEN
       v_estmt_prcnt := 5;
END CASE;
(Anybody has better formula for calculating this?)

-- Get the list of columns having low distinct values and also having index
on them.
-- Prepare method_opt parameter based on number of distinct values for the
column.
-- Gather statistics.


And the entire PL/SQL block is:

DECLARE

    TYPE tablist_typ IS TABLE OF user_tables.table_name%TYPE
         INDEX BY BINARY_INTEGER;
    TYPE tabrows_typ IS TABLE OF user_tables.num_rows%TYPE
         INDEX BY BINARY_INTEGER;
    TYPE tabcols_typ IS TABLE OF user_tab_columns.column_name%TYPE
         INDEX BY BINARY_INTEGER;
    TYPE tabndv_typ IS TABLE OF user_tab_columns.num_distinct%TYPE
         INDEX BY BINARY_INTEGER;

    tablist tablist_typ;
    tabrows tabrows_typ;
    tabcols tabcols_typ;
    tabndv  tabndv_typ;

    v_estmt_prcnt NUMBER;
    v_degree NUMBER := 8;
    v_method_opt VARCHAR2(32000) := 'FOR ALL INDEXED COLUMNS';

BEGIN

    -- Get the list of tables and num_rows that need to be analyzed.
    SELECT a.table_name,
           NVL(b.num_rows, 0) + NVL(a.inserts - a.deletes, 0)
           BULK COLLECT INTO tablist,
                             tabrows
    FROM user_tab_modifications a,
         user_tables b
    WHERE a.table_name = b.table_name;

    FOR i IN NVL(tablist.FIRST, 1)..NVL(tablist.LAST, 0)
    LOOP

        -- Backup earlier statistics after deleting the old ones.

        DBMS_STATS.DELETE_SCHEMA_STATS(ownname=>USER,
                                       stattab=>'STAT_TAB',
                                       statid=>tablist(i));
        DBMS_STATS.EXPORT_TABLE_STATS(ownname=>USER,
                                      stattab=>'STAT_TAB',
                                      statid=>tablist(i),
                                      tabname=>tablist(i),
                                      cascade=>TRUE);
        -- Calculate estimate percent.
        CASE
            WHEN tabrows(i) < 50000 THEN
                v_estmt_prcnt := 100;
            WHEN tabrows(i) BETWEEN 50001 AND 200000 THEN
                 v_estmt_prcnt := 40;
            WHEN tabrows(i) BETWEEN 200001 AND 600000 THEN
                 v_estmt_prcnt := 20;
            WHEN tabrows(i) BETWEEN 600001 AND 4000000 THEN
                 v_estmt_prcnt := 10;
            WHEN tabrows(i) > 4000000 THEN
                 v_estmt_prcnt := 5;
        END CASE;

        -- Build method_opt string.

        -- Collect columns having low distinct values
        -- and also having index on them.

        SELECT DISTINCT a.column_name,
                        a.num_distinct
        BULK COLLECT INTO tabcols,
                          tabndv
        FROM user_tab_columns a, user_ind_columns b
        WHERE a.table_name = b.table_name
        AND a.column_name = b.column_name
        AND a.table_name = tablist(i)
        AND a.num_distinct <= 255;

        IF tabcols.COUNT > 0 THEN

            v_method_opt := v_method_opt || ' FOR COLUMNS ';
            FOR j IN tabcols.FIRST..tabcols.LAST
            LOOP
                v_method_opt := v_method_opt || tabcols(j)
                                || ' SIZE ' || tabndv(j) || ', ' ;
            END LOOP;

            -- Remove the last comma.
            v_method_opt := SUBSTR(v_method_opt, 1, LENGTH(v_method_opt) -
2);

        END IF;

        -- Analyze the table.
        DBMS_STATS.GATHER_TABLE_STATS(ownname=>USER,
                                      tabname=>tablist(i),
                                      estimate_percent=>v_estmt_prcnt,
                                      degree=>v_degree,
                                      method_opt=>v_method_opt,
                                      cascade=>TRUE);

    END LOOP;
END;
/


*********************************************************
Disclaimer:          

This message (including any attachments) contains 
confidential information intended for a specific 
individual and purpose, and is protected by law. 
If you are not the intended recipient, you should 
delete this message and are hereby notified that 
any disclosure, copying, or distribution of this
message, or the taking of any action based on it, 
is strictly prohibited.

*********************************************************
Visit us at http://www.mahindrabt.com

----------------------------------------------------------------
Please see the official ORACLE-L FAQ: http://www.orafaq.com
----------------------------------------------------------------
To unsubscribe send email to:  oracle-l-request@xxxxxxxxxxxxx
put 'unsubscribe' in the subject line.
--
Archives are at //www.freelists.org/archives/oracle-l/
FAQ is at //www.freelists.org/help/fom-serve/cache/1.html
-----------------------------------------------------------------
References:
- Logic for collecting statistics
  - From: Charudatta Joshi
RE: Logic for collecting statistics

Other related posts: