RE: Missing SQL in DBA_HIST_SQLSTAT

  • From: "Herring Dave - dherri" <Dave.Herring@xxxxxxxxxx>
  • To: "Petr Novak" <Petr.Novak@xxxxxxxxxxxx>, "Teehan, Mark" <mark.teehan@xxxxxxxxxxxxxxxxx>, "ORACLE-L" <oracle-l@xxxxxxxxxxxxx>
  • Date: Tue, 2 Nov 2010 10:14:05 -0500

True, smaller statements may not be as important, but per my original post, why 
would they be skipped if the default for TOPNSQL is 2 billion?

I got into this research because I was looking into SQL statements suffering 
from bind variable peeking and skewed values.  I wanted to check AWR history to 
see the frequency of different plans chosen for some statements and the average 
resource impact per SQL_ID + PLAN_HASH_VALUE.  In reviewing the results I saw 
that some days certain plans weren't chosen at all.  Those were for recent 
entries and it just so happened that those entries were still in the shared 
pool, even after a few days (because they're executed multiple times per day 
and haven't been invalidated).  Because of this, AWR won't help because it's 
inaccurate, which sent me down this whole path.

Dave Herring  | DBA
Acxiom Global Technology Solutions   

630-944-4762 office | 630-430-5988 cell | 630-944-4989 fax
1501 Opus Pl | Downers Grove, IL, 60515 | U.S.A. | www.acxiom.com
Service Desk: 888-243-4566, https://servicedesk.acxiom.com, GSCA@xxxxxxx



-----Original Message-----
From: Petr Novak [mailto:Petr.Novak@xxxxxxxxxxxx] 
Sent: Tuesday, November 02, 2010 10:01 AM
To: Herring Dave - dherri; Teehan, Mark; ORACLE-L
Subject: AW: Missing SQL in DBA_HIST_SQLSTAT

Hallo Dave,

I executed your query and checked one of the returned SQL_IDs.


SQL> select FIRST_LOAD_TIME,EXECUTIONS,LAST_LOAD_TIME,LAST_ACTIVE_TIME 
,PARSING_SCHEMA_NAME,CHILD_NUMBER,PLAN_HASH_VALUE,OPTIMIZER_ENV_HASH_VALUE  
from  v$sql where sql_id='cjkmnxsvny74t';

FIRST_LOAD_TIME         EXECUTIONS LAST_LOAD_TIME      LAST_ACTIVE_TIME    
PARSING_SCHEMA_NAME              CHILD_NUMBER PLAN_HASH_VALUE 
OPTIMIZER_ENV_HASH_VALUE
------------------- -------------- ------------------- ------------------- 
------------------------------ -------------- --------------- 
------------------------
2010-09-01/05:00:29             45 2010-11-02/14:55:50 02.11.2010 14:55:49 TMU  
                                       0      1721219251               
2180547739
2010-09-01/05:00:29              7 2010-11-01/05:00:48 01.11.2010 05:07:01 TMU  
                                       1      1721219251               
2180547739

AWR reports only one interval for this query:

End             Plan Hash   Optim Env        CPU    Elapsed    IOTime  AppTime  
Cluster  Concurr    Execs  Loa    PaC      Ela/Ex     Buffer/Ex   DReads/Ex     
Rows/Ex Prf
----------- ------------- ----------- ---------- ---------- --------- -------- 
-------- -------- -------- ---- ------ ----------- ------------- ----------- 
----------- ---
25.10 11:00    1721219251  2180547739        .01        .05        .0       .0  
     .0       .0        3    0      3         .02          20.0        14.0     
    3.0



I thing it is natural, that AWR does not capture every statement executed in 
the interval,even if the statement was not aged out from V$SQL.
Such statements are very 'small' , concerning resource consumption.
I didnt try to play with TOPNSQL, but you could check the maximum count of 
captured statements for snapshot and instance, if TOPNSQL has any impact.
It is not important how many statements are captured, but how big percentage of 
DB load (reads,gets,writes,CPU) you can explain with captured statements.

Best Regards,
Petr






________________________________________
Von: oracle-l-bounce@xxxxxxxxxxxxx [oracle-l-bounce@xxxxxxxxxxxxx]&quot; im 
Auftrag von &quot;Herring Dave - dherri [Dave.Herring@xxxxxxxxxx]
Gesendet: Dienstag, 2. November 2010 14:37
Bis: Teehan, Mark; ORACLE-L
Betreff: RE: Missing SQL in DBA_HIST_SQLSTAT

Mark,

Thanks for the detailed response!

I can understand issues where SQL is aged out/invalidated inbetween AWR 
snapshots, causing statements to be missed.  But I'm finding SQL statements 
having changes in executions ACROSS snapshots, which is where I'm confused.

Here's one of the SQL statements I'm using to try and determine the extent of 
this problem.  It'll list statements whose last active time is greater than 
then date of the last AWR snapshot + snapshot interval, meaning it should have 
been captured but hasn't:

SELECT sql_id, TO_CHAR(awr_last_active_dt, 'DD-MON-YY HH24:MI:SS') 
chr_awr_last_active_dt
     , TO_CHAR(last_snapshot_dt, 'DD-MON-YY HH24:MI:SS') chr_last_snapshot_dt
     , shared_pool_last_active_dt
  FROM (SELECT MAX(ss.end_interval_time) awr_last_active_dt
             , sq.sql_id, snap_interval, ss2.last_snapshot_dt, 
s.shared_pool_last_active_dt
          FROM dba_hist_sqlstat sq
             , dba_hist_snapshot ss
               /* Get the date of the most recent snapshot */
             , (SELECT MAX(begin_interval_time) last_snapshot_dt
                  FROM dba_hist_snapshot) ss2
               /* Get the date of LAST_ACTIVE_TIME for every SQL stmt in the 
shared pool */
             , (SELECT sql_id, plan_hash_value
                     , MAX(last_active_time) shared_pool_last_active_dt
                  FROM v$sql
                 GROUP BY sql_id, plan_hash_value) s
               /* Get the SNAP_INTERVAL, to help determine the offset for the 
most recent
                  possible LAST_ACTIVE_TIME */
             , (SELECT snap_interval FROM dba_hist_wr_control)
         WHERE (    sq.snap_id = ss.snap_id
                AND sq.instance_number = ss.instance_number
                AND sq.dbid = ss.dbid)
           AND (    sq.sql_id = s.sql_id
                AND sq.plan_hash_value = s.plan_hash_value)
         GROUP BY sq.sql_id, snap_interval, ss2.last_snapshot_dt, 
s.shared_pool_last_active_dt)
 WHERE LEAST(shared_pool_last_active_dt, last_snapshot_dt) - snap_interval > 
awr_last_active_dt
 ORDER BY LEAST(shared_pool_last_active_dt, last_snapshot_dt) - snap_interval -
          awr_last_active_dt DESC;

Dave Herring  | DBA
Acxiom Global Technology Solutions

630-944-4762 office | 630-430-5988 cell | 630-944-4989 fax
1501 Opus Pl | Downers Grove, IL, 60515 | U.S.A. | www.acxiom.com
Service Desk: 888-243-4566, https://servicedesk.acxiom.com, GSCA@xxxxxxx



-----Original Message-----
From: Teehan, Mark [mailto:mark.teehan@xxxxxxxxxxxxxxxxx]
Sent: Tuesday, November 02, 2010 12:53 AM
To: Herring Dave - dherri; ORACLE-L
Subject: RE: Missing SQL in DBA_HIST_SQLSTAT

Dave,
I've noticed that the quality of AWR data collection varies a lot depending on 
the application profile.

I am not an expert on database internals ("Tanel!!") but this is what I have 
worked out; my problems in this area are due to flushing of the sql area. As 
you can see the metrics in gv$sqlstat; then it may not be relevant.

A database hosting an application that executes most of its workload during a 
batch window seems to have lower quality of AWR data collection during its 
busiest period. One major reason for this is aging of single-execution 
statements.
The SQL Area sizes automatically based on the workload of the entire day, so 
when a large burst of statements is run (= overnight batch cycle), the SQL area 
will not resize up immediately (if at all), and single-execution statements 
will be aged out first before the next AWR snap occurs. I've seen this happen a 
lot - the statement metrics are no longer visible in gv$sqlstat at snap-time so 
it never goes into AWR.

Aging of metrics for statements is only a problem if the database needs to 
flush the sql area: it can be flushed by other activity. Single-execution 
statements appear to be chosen first, irrespective of how long they ran for - 
AFAIK no attempt is made to classify which single-execution statements may 
still have important metrics.

A flush offender is a stats gather on objects with a lot of column statistics: 
it generates a large number of recursive statements for each column (select 
min(), max() etc) in a short span, which can flush a lot of metrics from the 
sql area.
The AWR partition rolling job can also generate a lot of statements - this 
generally runs once a day.

I've raised an SR to find out if there is any way to keep statements in the SQL 
area until they have been snapped into AWR, but support told me to go away, its 
supposed to be like that. It is not possible to directly set the size of the 
sql area to improve metric retention. Statement pinning doesn't work either: 
the text is kept, but the metrics (gv$sqlstat) are flushed!
I hit a brick wall on this. It seems a design flaw to me - the sql area flush 
process should assess the metrics for each statement to see if it is eligible 
for AWR retention. Not trivial I know, but a lot depends on AWR quality.

Interesting that you have seen this on a 4-node RAC - the worst offender I have 
is a 3-node RAC - is there any chance AWR is not picking up metrics across all 
instances?

HTH
Mark Teehan
Singapore


-----Original Message-----
From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx] On 
Behalf Of Herring Dave - dherri
Sent: 26 October 2010 06:08
To: ORACLE-L
Subject: Missing SQL in DBA_HIST_SQLSTAT

Has anyone checked into the accuracy and reliability of AWR SQL stats?  From 
checks I've done it appears that a lot of SQL statements are not getting 
captured in AWR snapshots.

Here's our config: 4-node RAC, Oracle 10.2.0.2, RH 4.x, 30-minute AWR snapshots.

We've left TOPNSQL as "DEFAULT", which in our case should be enough:

SELECT topnsql
  FROM sys.wrm$_wr_control;

           TOPNSQL
------------------
        2000000000

This means that the first 2 BILLION SQL statements per category will be 
selected (seems rather excessive!).  Yet I ran a few checks where I compared 
(GV$SQL.LAST_ACTIVE_TIME - snapshot interval) to the MAX(end_interval_time) for 
snapshots (all by SQL_ID) and found over 1600 statements that are executed yet 
no updates are captured by AWR.

As a double check, I reviewed data in DBA_HIST_ACTIVE_SESS_HISTORY and found 
that 1 of the 1600 statements from above is executed multiple times per day, 
yet the last DBA_HIST_SQLSTAT snapshot it's in is from 4 months ago.

Has anyone else seen this type of thing and/or done detailed checking on the 
accuracy of AWR SQL data?

BTW, I tried tracing the snapshot activity and can't get past what's done to 
populate internal objects X$KEWRSQLIDTAB and X$KEWRSQLCRIT.

Dave Herring  | DBA
Acxiom Global Technology Solutions

630-944-4762 office | 630-430-5988 cell | 630-944-4989 fax
1501 Opus Pl | Downers Grove, IL, 60515 | U.S.A. | www.acxiom.com Service Desk: 
888-243-4566, https://servicedesk.acxiom.com, GSCA@xxxxxxx



***************************************************************************
The information contained in this communication is confidential, is intended 
only for the use of the recipient named above, and may be legally privileged.

If the reader of this message is not the intended recipient, you are hereby 
notified that any dissemination, distribution or copying of this communication 
is strictly prohibited.

If you have received this communication in error, please resend this 
communication to the sender and delete the original message or any copy of it 
from your computer system.

Thank You.
****************************************************************************

--
//www.freelists.org/webpage/oracle-l



===============================================================================
Please access the attached hyperlink for an important electronic communications 
disclaimer:
http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html
===============================================================================

--
//www.freelists.org/webpage/oracle-l


--
//www.freelists.org/webpage/oracle-l


Other related posts: