I've found that dates are more likely to present a problem because sequentially generated keys are usually highly selective anyway. Say a financial application loads three million trades a day. If the stats are not updated the optimizer will think a given TRADE_ID has 0 occurrences (since it's after the column's high endpoint) and round that up to 1, which may well be correct anyway. However when you run a query for this week's trades on a Friday and it thinks there are none when in fact there are 15 million then you have a much bigger problem. That's my experience anyway. 2008/10/1 Niall Litchfield <niall.litchfield@xxxxxxxxx> > dates and sequence driven key columns both exhibit this behaviour. > > On 30/09/2008, William Robertson <william@xxxxxxxxxxxxxxxxxxxx> wrote: > > I'm not sure how using or not using histograms makes any fundamental > > difference to your approach. > > > > One thing that comes to mind is the use of date predicates. If the > > optimizer knows the highest sale date you have (for example) is 30 Sept > > 2008, you are fine today and maybe some of next week, but in a few > > months' time any queries for "last week's sales" may get unrealistically > > low cardinality estimates leading to inappropriate index/nested loop > > access paths, Cartesian joins and so on. > > > > If you use partitioning and you have date columns as > > partition/subpartition keys you could have even more fun in store. > > > > > > -----Original message----- > > From: Sriram Kumar > > Date: 30/9/08 22:41 > >> Hi Folks, > >> > >> We had migrated a OLTP/Batch hybrid 9i RBO application to 10g > >> and after rounds of tuning the application now is working at its best. > >> We had experimented with various sampling rates for statistics for > >> various tables and now I feel we are in a optimal point of performance > >> where CBO is picking up the right Indexes. > >> > >> 1) We do not use histograms > >> 2) From here on the application volumes are bound increase > proportionately > >> 3) No new tables would be added in production. > >> 4) As of now, We have locked the schema stats for the application > >> schemas. The default scheduler job collects stats for all other schemas > >> > >> Since the increase in volume of data is quite proportional to the > >> existing data, what would be the compelling reason to periodically > >> recollect statistics for the tables that already have good execution > >> plans? > >> > >> e.g lets say that there is a table with million records and with few > >> indexes. We have collected the stats for this table and indexes with > >> million records and all the access paths to this table has been > >> validated that the CBO is picking up the correct optimal index in > >> every case. > >> > >> Now the volume of the table increases to 2 million and from > >> application point of view, the access paths that were valid for 1 > >> million would be the valid for 2 million as well. > >> > >> Given this scenario, is it required to collect statistics for 2 > >> million as well?. I opine that it would not be required but some of my > >> colleagues feel that we should collect statistics periodically but I > >> am not able to get a valid reason for periodically recollecting the > >> stats?. Is this one of the myths? > >> > >> I would agree to periodically recollect stats if we are using > >> histograms but we are not using histograms. Any other reason that > >> would need a periodic statistic gathering? >