dbms_stats.gather_system_stats - crazy numbers

From: "CRISLER, JON A" <JC1706@xxxxxxx>
To: "oracle-l@xxxxxxxxxxxxx" <oracle-l@xxxxxxxxxxxxx>
Date: Mon, 29 Jul 2013 04:56:05 +0000
I have been working on a pair of RAC systems that are otherwise identical, but 
sometimes give some crazy explain plans that force full table scans in 
situations where a FTS would never be done.  It got to the point where putting 
in sql profiles and otherwise tweaking could not keep up with the problem, so I 
decided to dig into it from a different perspective.

I finally found that dbms_stats.gather_system_stats had collected some really 
strange numbers for CPUSPEEDNW (no workload) and CPUSPEED (workload), along 
with some really odd numbers for disk transfer speed.

System A had an absurdly low number for cpuspeednw and cpuspeed (1/4 expected 
numbers), and at the same time a disk throughput number that was 10-50 times 
higher than expected.  System B had the opposite problem: Better than possible 
numbers for cpuspeed while disk transfer was at the default number due to some 
quirk in workload system stats gathering.

In order to get the systems to match, the workaround was to delete the system 
stats (dbms_stats.delete_system_stats) on both and then run it only with a 
noworkload: cpu and disk popped into the expected ranges.  The two clusters are 
tuned exactly the same, have the exact same OS, hardware, SAN storage, etc.  
Slight variances are to be expected for sure in the numbers, but not huge 
differences.  The system that had a really low cpu num and ultrastupid high 
IOTFRSPEED would force a FTS at the strangest times, and users would complain 
about performance.  The system with better than possible cpu numbers and low 
IOTFRSPEED did not get complaints.

Here is an example

column sname format a20
column pname format a20
column pval2 format a20
select sname, pname , pval1, pval2  
from  sys.aux_stats$;

SNAME                PNAME                     PVAL1 PVAL2
-------------------- -------------------- ---------- --------------------
SYSSTATS_INFO        STATUS                          COMPLETED
SYSSTATS_INFO        DSTART                          07-27-2013 23:02
SYSSTATS_INFO        DSTOP                           07-27-2013 23:02
SYSSTATS_INFO        FLAGS                         1
SYSSTATS_MAIN        CPUSPEEDNW                 1249  (when bad this was either 
hi or low, like 300 or 10000)
SYSSTATS_MAIN        IOSEEKTIM                    7
SYSSTATS_MAIN        IOTFRSPEED                28662 (when bad this number was 
like 600000)
SYSSTATS_MAIN        SREADTIM
SYSSTATS_MAIN        MREADTIM
SYSSTATS_MAIN        CPUSPEED
SYSSTATS_MAIN        MBRC

SNAME                PNAME                     PVAL1 PVAL2
-------------------- -------------------- ---------- --------------------
SYSSTATS_MAIN        MAXTHR
SYSSTATS_MAIN        SLAVETHR

Two bugs are similar but not exact hits for this problem-
Unrealistic high value for IOTFRSPEED can drive CBO to choose bad expensive 
plan (Doc ID 785799.1)
Bug 9842771 - Wrong SREADTIM and MREADTIM statistics in AUX_STATS$ (Doc ID 
9842771.8)

The hardware is typical very large servers, native Red Hat, fast SAN.  It is 
possible that some Intel Turbo Boost or power saving feature is causing odd cpu 
numbers that mess with oracle's understanding of the hardware ?
model name      :        Intel(R) Xeon(R) CPU E7- 4820  @ 2.00GHz
stepping        : 2
cpu MHz         : 1997.886
cache size      : 18432 KB
--
//www.freelists.org/webpage/oracle-l
dbms_stats.gather_system_stats - crazy numbers

Other related posts: