Re: Which one is good in terms of performance

Spooky, isn't it?  Block sampling really means rolling the dice, I think...

If you'd like the script I used to generate these test results, feel free t=
o
email me offline.   They're nothing special, but it might allow others a
jumpstart to test on different data conditions...



on 12/27/04 11:48 AM, Jacques Kilchoer at Jacques.Kilchoer@xxxxxxxxx wrote:

> If I read your table example right, then dbms_stats and block sampling ga=
ve
> worse results for 50% sampling than it did for 20%, .01%, .000001% sampli=
ng?
>=20
> -----Original Message-----
> From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx=
] On
> Behalf Of Tim Gorman
> Sent: vendredi, 24. d=E9cembre 2004 07:23
> To: 'Oracle-L (E-mail)
> Subject: Re: Which one is good in terms of performance
>=20
> Seema and Wolfgang,
> Here is some info I generated recently, to add to the discussion...
>=20
> My apologies for the length of the post, but feel free to delete if not
> interested...
>=20
>=20
> Accuracy when calculating table statistics with different sample sizes:
>>=20
> These are the cumulated averaged results from comparing statistics gather=
ed
> using both DBMS_STATS and ANALYZE TABLE against the baseline of performin=
g a
> COMPUTE using DBMS_STATS.  For the first round, I concentrated on table
> statistics, and ignored column statistics (a.k.a. histograms) by using =B3F=
OR
> ALL COLUMNS SIZE 1=B2.
>=20
> I tested against four major tables in the Oracle E-Business Suite ERP
> system, of varying sizes (from small to humongous).
>=20
> For each of the table statistics gathered for the cost-based optimizer (i=
.e.
> the columns NUM_ROWS, BLOCKS, and AVG_ROW_LEN in DBA_TABLES), displayed i=
s
> the percentage deviation from the baseline of a COMPUTE using DBMS_STATS:
>=20
>                                                  Nbr of
>                                                  tables % Deviation %
> Deviation % Deviation
> LABEL                                             sampled Nbr of Blks Nbr=
 of
> Rows Avg Row Len
> -------------------------------------------------- ------- -----------
> ----------- -----------
> DBMS_STATS: COMPUTE, FOR ALL COLUMNS SIZE 1              4       0.00
> 0.00        0.00
> DBMS_STATS: 10.000000%, FOR ALL COLUMNS SIZE 1           4       0.00
> 0.02        0.00
> DBMS_STATS: 05.000000%, FOR ALL COLUMNS SIZE 1           4       0.00
> 0.03        0.00
> DBMS_STATS: 01.000000%, FOR ALL COLUMNS SIZE 1           4       0.00
> 0.10        0.00
> DBMS_STATS: 50.000000%, FOR ALL COLUMNS SIZE 1           4       0.00
> 0.22        0.00
> DBMS_STATS: 10.000000% BLKSAMP, FOR ALL COLUMNS SI       4       0.00
> 0.15        0.11
> DBMS_STATS: 00.010000%, FOR ALL COLUMNS SIZE 1           4       0.00
> 0.46        0.00
> DBMS_STATS: 20.000000%, FOR ALL COLUMNS SIZE 1           4       0.00
> 0.58        0.00
> DBMS_STATS: 05.000000% BLKSAMP, FOR ALL COLUMNS SI       4       0.00
> 0.91        0.11
> DBMS_STATS: 00.000001%, FOR ALL COLUMNS SIZE 1           4       0.00
> 1.12        0.00
> DBMS_STATS: 20.000000% BLKSAMP, FOR ALL COLUMNS SI       4       0.00
> 1.35        0.00
> DBMS_STATS: 00.010000% BLKSAMP, FOR ALL COLUMNS SI       4       0.00
> 1.47        0.00
> DBMS_STATS: 00.000001% BLKSAMP, FOR ALL COLUMNS SI       4       0.00
> 1.58        0.00
> DBMS_STATS: 50.000000% BLKSAMP, FOR ALL COLUMNS SI       4       0.00
> 2.03        0.00
> DBMS_STATS: 01.000000% BLKSAMP, FOR ALL COLUMNS SI       4       0.00
> 4.61        0.00
> ANALYZE: 05.000000%, FOR ALL COLUMNS SIZE 1              4       0.00
> 0.42        7.56
> ANALYZE: COMPUTE, FOR ALL COLUMNS SIZE 1                 4       0.00
> 0.00        8.34
> ANALYZE: 50.000000%, FOR ALL COLUMNS SIZE 1              4       0.00
> 0.00        8.34
> ANALYZE: 20.000000%, FOR ALL COLUMNS SIZE 1              4       0.00
> 0.62        7.73
> ANALYZE: 10.000000%, FOR ALL COLUMNS SIZE 1              4       0.00
> 1.44        7.15
> ANALYZE: 01.000000%, FOR ALL COLUMNS SIZE 1              4       0.00
> 7.24        8.62
> ANALYZE: 00.010000%, FOR ALL COLUMNS SIZE 1              4       0.00
> 7.24        8.62
> ANALYZE: 00.000001%, FOR ALL COLUMNS SIZE 1              4       0.00
> 7.24        8.62
>=20
> The sorting in this report is by the summarization of the three percentag=
e
> columns, from "most accurate" results to "least accurate" results.
>=20
> A couple things to note about the results shown here:
>=20
> 1. The value in BLOCKS is always completely accurate, regardless of sampl=
e
> size, because the value is not calculated at all, but is obtained directl=
y
> from the segment header
> 2. The very worst calculations from the DBMS_STATS package are (apparentl=
y)
> more accurate than the very best calculations from ANALYZE command
>> * The biggest cause seems to be vastly different results on AVG_ROW_LEN
>>> * However, the method of calculating AVG_ROW_LEN by DBMS_STATS is easil=
y
>>> exposed by performing SQL trace
>>>> * Method is:  select avg(vsize(col-1)) + avg(vsize(col-2)) + ... +
>>>> avg(vsize(col-N)) from table-name ...
>> * The method of calculating AVG_ROW_LEN by the ANALYZE command cannot be
>> determined using SQL trace
> 3. The worst deviation with the results from DBMS_STATS when calculating
> NUM_ROWS is 4.61%, while the worst deviation with the results from ANALYZ=
E
> is almost twice that, at 7.24%
> 4. The results from ANALYZE TABLE ... SAMPLE 1 PERCENT, SAMPLE 0.01 PERCE=
NT,
> and SAMPLE 0.000001 PERCENT are exactly the same, indicating the lowest
> valid sampling value for the ANALYZE command is 1 PERCENT
>> * DBMS_STATS allows 0.000001% sampling
>> * DBMS_STATS also permits the use of BLOCK SAMPLING, which ANALYZE does =
not
>=20
> Accuracy when calculating index statistics with different sample sizes:
>=20
> In contrast, where even the very worst results obtained by the smallest
> sample of table statistics is just 7.24% off from performing a COMPUTE
> operation, the results from using smaller samples when calculating index
> statistics get much worse very quickly as the sampling size decreases.
>=20
> Here is the results of testing against the nine indexes of the
> FND_CONCURRENT_REQUESTS table, which vary widely in terms of density, dat=
a
> characteristics, etc.  The results are being sorted by the summation of t=
he
> percentage columns, so that the "most accurate" shows first and the "leas=
t
> accurate" shows last...
>=20
>                              Nbr of                             % Dev'n  =
%
> Dev'n
>                             indexes % Dev'n   % Dev'n   % Dev'n Avg Leaf
> Avg Data % Dev'n       % Dev'n
> LABEL                         sampled  BLevel Leaf Blks Dist Keys Blks/Ke=
y
> Blks/Key Clu Fac # Idx Entries
> ------------------------------ ------- ------- --------- --------- ------=
--
> -------- ------- -------------
> DBMS_STATS: INDEX COMPUTE           9    0.00      0.00      0.00     0.0=
0
> 0.00   0.00          0.00
> DBMS_STATS: INDEX 05.000000%        9    0.00      0.00     22.13    14.8=
5
> 16.40   0.44          0.00
> DBMS_STATS: INDEX 20.000000%        9    0.00      0.00     25.39    24.2=
9
> 24.22   0.44          0.00
> DBMS_STATS: INDEX 10.000000%        9    0.00      0.00     25.56    16.6=
1
> 33.89   0.44          0.00
> DBMS_STATS: INDEX 00.010000%        9    0.00      0.00     27.45    20.1=
3
> 37.91   0.44          0.00
> DBMS_STATS: INDEX 50.000000%        9    0.00      0.00     26.42    21.5=
3
> 38.65   0.44          0.00
> DBMS_STATS: INDEX 01.000000%        9    0.00      0.00     31.17    27.9=
0
> 28.18   0.44          0.00
> DBMS_STATS: INDEX 00.000001%        9    0.00      0.00     27.81    22.4=
6
> 40.37   0.44          0.00
> ANALYZE: INDEX COMPUTE              9    0.00    595.30     0.00    239.8=
8
> 0.00   0.00          0.00
> ANALYZE: INDEX 50.000000%           9    0.00    596.27     23.55   329.3=
1
> 13.85  17.29         17.60
> ANALYZE: INDEX 01.000000%           9    8.33    530.13     32.28   353.8=
6
> 23.04  32.87         34.08
> ANALYZE: INDEX 00.010000%           9    8.33    530.13     32.28   353.8=
6
> 23.04  32.87         34.08
> ANALYZE: INDEX 00.000001%           9    8.33    530.13     32.28   353.8=
6
> 23.04  32.87         34.08
> ANALYZE: INDEX 20.000000%           9    0.00    596.27     32.97   551.4=
3
> 58.40  19.22         19.51
> ANALYZE: INDEX 05.000000%           9    0.00    596.27     40.77   854.4=
6
> 59.40  24.67         23.23
> ANALYZE: INDEX 10.000000%           9    0.00    596.27     40.86   792.7=
2
> 141.73  29.25         29.07
>=20
> First of all, there is the huge deviation the calculations of LEAF_BLOCKS=
 by
> DBMS_STATS and by ANALYZE. Each command (DBMS_STATS or ANALYZE) is
> consistent, so one must be wrong and the other right.  Unfortunately, in
> this case, it is the DBMS_STATS package that is uniformly wrong on this,
> because the results produced by the ANALYZE INDEX commands are corroborat=
ed
> by similar results from the ANALYZE INDEX ... VALIDATE STRUCTURE command,
> the DBMS_SPACE package, and the values stored in the DBA_SEGMENTS view.  =
I
> searched MetaLink for a bug report related to DBMS_STATS and LEAF_BLOCKS,
> but could not find one.  I intend to retest this against 9.2.0.5 and
> 10.1.0.3, when I get a chance.  I=B9ll also test it against 9.2.0.1...
>=20
> A few more things to note about the results shown here:
>=20
> 1. DBMS_STATS.GATHER_INDEX_STATS and ANALYZE INDEX produce the exact same
> results when COMPUTEing
>> * except the LEAF_BLOCKS anomaly
>>> * which also apparently causes an anomaly with AVG_LEAF_BLOCKS_PER_KEY
> 2. the ANALYZE INDEX ... COMPUTE results are corroborated by other
> utilities, so the ANALYZE INDEX command is more accurate in v9.2.0.4 than
> the DBMS_STATS.GATHER_INDEX_STATS command
> 3. The results from both DBMS_STATS and ANALYZE are wildly worse when
> performing an estimated sample, instead of compute
>=20
> That last point is the most important.  Nonetheless, since the ANALYZE IN=
DEX
> ... COMPUTE command is clearly the most accurate, here is the same result
> data, this time using the results from ANALYZE INDEX .. COMPUTE STATISTIC=
S
> as the baseline instead of DBMS_STATS:
>>=20
>                              Nbr of                             % Dev'n  =
%
> Dev'n
>                             indexes % Dev'n   % Dev'n   % Dev'n Avg Leaf
> Avg Data % Dev'n       % Dev'n
> LABEL                         sampled  BLevel Leaf Blks Dist Keys Blks/Ke=
y
> Blks/Key Clu Fac # Idx Entries
> ------------------------------ ------- ------- --------- --------- ------=
--
> -------- ------- -------------
> ANALYZE: INDEX COMPUTE              9    0.00     0.00      0.00    0.00
> 0.00    0.00          0.00
> ANALYZE: INDEX 50.000000%           9    0.00     0.13     23.55    15.18
> 13.85   17.29         17.60
> DBMS_STATS: INDEX COMPUTE           9    0.00    85.04      0.00    41.30
> 0.00    0.00          0.00
> DBMS_STATS: INDEX 05.000000%        9    0.00    85.04     22.13    38.50
> 16.40    0.44          0.00
> DBMS_STATS: INDEX 20.000000%        9    0.00    85.04     25.39    36.70
> 24.22    0.44          0.00
> ANALYZE: INDEX 01.000000%           9    8.33     8.42     32.28    36.11
> 23.04   32.87         34.08
> ANALYZE: INDEX 00.010000%           9    8.33     8.42     32.28    36.11
> 23.04   32.87         34.08
> ANALYZE: INDEX 00.000001%           9    8.33     8.42     32.28    36.11
> 23.04   32.87         34.08
> DBMS_STATS: INDEX 01.000000%        9    0.00    85.04     31.17    36.24
> 28.18    0.44          0.00
> DBMS_STATS: INDEX 10.000000%        9    0.00    85.04     25.56    38.07
> 33.89    0.44          0.00
> ANALYZE: INDEX 20.000000%           9    0.00     0.13     32.97    53.44
> 58.40   19.22         19.51
> DBMS_STATS: INDEX 50.000000%        9    0.00    85.04     26.42    37.22
> 38.65    0.44          0.00
> DBMS_STATS: INDEX 00.010000%        9    0.00    85.04     27.45    37.53
> 37.91    0.44          0.00
> DBMS_STATS: INDEX 00.000001%        9    0.00    85.04     27.81    37.05
> 40.37    0.44          0.00
> ANALYZE: INDEX 05.000000%           9    0.00     0.13     40.77   107.16
> 59.40   24.67         23.23
> ANALYZE: INDEX 10.000000%           9    0.00     0.13     40.86    95.99
> 141.73   29.25         29.07
>=20
> Again, this perspective shows the two types of COMPUTE largely agreeing w=
ith
> each other (except for the LEAF_BLOCKS and the AVG_LEAF_BLOCKS_PER_KEY va=
lue
> derived from it), with all other results showing dramatically less accura=
te
> due to decreases in the sample size.
>=20
> For indexes, clearly computing yields the best results, while a bug in
> DBMS_STATS appears to be skewing the LEAF_BLOCK results in v9.2.0.4. maki=
ng
> the ANALYZE INDEX command the best alternative for that version.
>=20
> So, from this data, which is admittedly sparse and limited (needs more
> testing on a wider range of versions and applications), I think that one =
can
> conclude:
>=20
> * Perform the gathering of table/column and index statistics separately f=
rom
> one another
>> * Gather statistics on tables and columns in one operation using
>> DBMS_STATS.GATHER_TABLE_STATS
>>> * leave the CASCADE parameter at the default of FALSE
>>> * use a low ESTIMATE_PERCENT and even use BLOCK_SAMPLE=3D>TRUE, if necess=
ary
>> * Gather statistics on indexes in another operation to either ANALYZE IN=
DEX
>> or
>> DBMS_STATS.GATHER_INDEX_STATS
>>> * only do COMPUTE
> * There is a bug in DBMS_STATS.GATHER_INDEX_STATS in 9.2.0.4 regarding th=
e
> calculation of LEAF_BLOCKS that needs to be explored further...
>=20
>> From this, I don=B9t think that one can conclude that it is better to ANAL=
YZE
> INDEX ... COMPUTE instead of
> DBMS_STATS.GATHER_INDEX_STATS(ESTIMATE_PERCENT=3D>NULL), because I don=B9t kn=
ow
> if the anomaly found exists outside of 9.2.0.4...
>=20
> Hope this helps...
>=20
>=20

--
http://www.freelists.org/webpage/oracle-l

Other related posts: