Logic for collecting statistics

From: "Charudatta Joshi" <joshic@xxxxxxxxxxxxxx>
To: <oracle-l@xxxxxxxxxxxxx>
Date: Thu, 29 Jul 2004 16:32:38 +0530
Hi All,

Version : 8.1.7.4
OS: Win2K Server

I am writing a procedure to automate collection of statistics for modified
tables. The procedure calculates the estimate_percent and histogram size on
the fly, prompted by JPL's comment regarding better statistics for smaller
tables. However, the cutoff number of rows selected for deriving
estimate_percent are totally arbitrary.

Please care to share your comments if any.

Thanks & regards,
Charu.

The logic is:

-- Get the list of tables and num_rows that need to be analyzed
(User_Tab_Modifications).
-- Backup the current statistics after deleting the old ones.
-- Calculate estimate percent.
CASE
    WHEN tabrows(i) < 50000 THEN
       v_estmt_prcnt := 100;
    WHEN tabrows(i) BETWEEN 50001 AND 200000 THEN
       v_estmt_prcnt := 40;
    WHEN tabrows(i) BETWEEN 200001 AND 600000 THEN
       v_estmt_prcnt := 20;
    WHEN tabrows(i) BETWEEN 600001 AND 4000000 THEN
       v_estmt_prcnt := 10;
    WHEN tabrows(i) > 4000000 THEN
       v_estmt_prcnt := 5;
END CASE;
(Anybody has better formula for calculating this?)

-- Get the list of columns having low distinct values and also having index
on them.
-- Prepare method_opt parameter based on number of distinct values for the
column.
-- Gather statistics.


And the entire PL/SQL block is:

DECLARE

    TYPE tablist_typ IS TABLE OF user_tables.table_name%TYPE
         INDEX BY BINARY_INTEGER;
    TYPE tabrows_typ IS TABLE OF user_tables.num_rows%TYPE
         INDEX BY BINARY_INTEGER;
    TYPE tabcols_typ IS TABLE OF user_tab_columns.column_name%TYPE
         INDEX BY BINARY_INTEGER;
    TYPE tabndv_typ IS TABLE OF user_tab_columns.num_distinct%TYPE
         INDEX BY BINARY_INTEGER;

    tablist tablist_typ;
    tabrows tabrows_typ;
    tabcols tabcols_typ;
    tabndv  tabndv_typ;

    v_estmt_prcnt NUMBER;
    v_degree NUMBER := 8;
    v_method_opt VARCHAR2(32000) := 'FOR ALL INDEXED COLUMNS';

BEGIN

    -- Get the list of tables and num_rows that need to be analyzed.
    SELECT a.table_name,
           NVL(b.num_rows, 0) + NVL(a.inserts - a.deletes, 0)
           BULK COLLECT INTO tablist,
                             tabrows
    FROM user_tab_modifications a,
         user_tables b
    WHERE a.table_name = b.table_name;

    FOR i IN NVL(tablist.FIRST, 1)..NVL(tablist.LAST, 0)
    LOOP

        -- Backup earlier statistics after deleting the old ones.

        DBMS_STATS.DELETE_SCHEMA_STATS(ownname=>USER,
                                       stattab=>'STAT_TAB',
                                       statid=>tablist(i));
        DBMS_STATS.EXPORT_TABLE_STATS(ownname=>USER,
                                      stattab=>'STAT_TAB',
                                      statid=>tablist(i),
                                      tabname=>tablist(i),
                                      cascade=>TRUE);
        -- Calculate estimate percent.
        CASE
            WHEN tabrows(i) < 50000 THEN
                v_estmt_prcnt := 100;
            WHEN tabrows(i) BETWEEN 50001 AND 200000 THEN
                 v_estmt_prcnt := 40;
            WHEN tabrows(i) BETWEEN 200001 AND 600000 THEN
                 v_estmt_prcnt := 20;
            WHEN tabrows(i) BETWEEN 600001 AND 4000000 THEN
                 v_estmt_prcnt := 10;
            WHEN tabrows(i) > 4000000 THEN
                 v_estmt_prcnt := 5;
        END CASE;

        -- Build method_opt string.

        -- Collect columns having low distinct values
        -- and also having index on them.

        SELECT DISTINCT a.column_name,
                        a.num_distinct
        BULK COLLECT INTO tabcols,
                          tabndv
        FROM user_tab_columns a, user_ind_columns b
        WHERE a.table_name = b.table_name
        AND a.column_name = b.column_name
        AND a.table_name = tablist(i)
        AND a.num_distinct <= 255;

        IF tabcols.COUNT > 0 THEN

            v_method_opt := v_method_opt || ' FOR COLUMNS ';
            FOR j IN tabcols.FIRST..tabcols.LAST
            LOOP
                v_method_opt := v_method_opt || tabcols(j)
                                || ' SIZE ' || tabndv(j) || ', ' ;
            END LOOP;

            -- Remove the last comma.
            v_method_opt := SUBSTR(v_method_opt, 1, LENGTH(v_method_opt) -
2);

        END IF;

        -- Analyze the table.
        DBMS_STATS.GATHER_TABLE_STATS(ownname=>USER,
                                      tabname=>tablist(i),
                                      estimate_percent=>v_estmt_prcnt,
                                      degree=>v_degree,
                                      method_opt=>v_method_opt,
                                      cascade=>TRUE);

    END LOOP;
END;
/


*********************************************************
Disclaimer:          

This message (including any attachments) contains 
confidential information intended for a specific 
individual and purpose, and is protected by law. 
If you are not the intended recipient, you should 
delete this message and are hereby notified that 
any disclosure, copying, or distribution of this
message, or the taking of any action based on it, 
is strictly prohibited.

*********************************************************
Visit us at http://www.mahindrabt.com

----------------------------------------------------------------
Please see the official ORACLE-L FAQ: http://www.orafaq.com
----------------------------------------------------------------
To unsubscribe send email to:  oracle-l-request@xxxxxxxxxxxxx
put 'unsubscribe' in the subject line.
--
Archives are at //www.freelists.org/archives/oracle-l/
FAQ is at //www.freelists.org/help/fom-serve/cache/1.html
-----------------------------------------------------------------
Follow-Ups:
- RE: Logic for collecting statistics
  - From: Charudatta Joshi
References:
- Re: 9.2.0.5 SQL-s converted to UPPER case & comments are truncated.
  - From: J . Velikanovs
Logic for collecting statistics

Other related posts: