SGA polling

  • From: Stephane Faroult <sfaroult@xxxxxxxxxx>
  • To: "oracle-l@xxxxxxxxxxxxx" <oracle-l@xxxxxxxxxxxxx>
  • Date: Mon, 17 May 2004 10:33:15 +0200

List,

  Several vendors market some tools (extremely expensive ones) which
poll the SGA at a subsecond rate by attaching directly to the SGA, since
polling the V$ views as fast is out of question. Their argument is that
this is the only way not to miss anything.
  I don't object to that, except that I have always been extremely
dubious about the real practical use of such a technical feat. I am
ready to miss on 10%, when I am called in for a performance problem it's
usually because things are 2 or 3 times slower at least than expected
... My reasoning has always been that, either you have a dreadful query
which you can't miss, or queries executed very repetitively which will
stay forever in the SGA. The only problem are unbinded fast queries; but
first you will certainly catch quite a number, if not all, of them at
any moment, and second a high hard-parse rate is easy to spot and is,
anyhow, the very first thing to check before proceeding further.

  To test my hypothesis, I have run every 5 minutes the following sql
script, imbededded in a shell script :

 
set linesize 250 
set pagesize 0 
set feedback off 
set recsep off 
set colsep '    '  -- Tab, to make loading into Excel easier 
select sum(decode(s.name, 'logons cumulative', s.value, 0))
logons_cumul, 
       sum(decode(s.name, 'logons current', s.value, 0)) logons_current, 
       sum(decode(s.name, 'session logical reads', s.value, 0)) lios, 
       m.sga_lios, 
       sum(decode(s.name, 'physical reads', s.value, 0)) pios, 
       m.sga_pios, 
       sum(decode(s.name, 'parse count (total)', s.value, 0)) parsing, 
       m.sga_parse, 
       sum(decode(s.name, 'execute count', s.value, 0)) executions, 
       m.sga_exec, 
       sum(decode(s.name, 'sorts (memory)', s.value, 
                        'sorts (disk)', s.value, 
                                        0)) sorts, 
       m.sga_sort 
from v$sysstat s, 
     (select sum(buffer_gets) SGA_LIOS, 
             sum(disk_reads)  SGA_PIOS, 
             sum(executions)  SGA_EXEC, 
             sum(parse_calls) SGA_PARSE, 
             sum(sorts)       SGA_SORT 
      from v$sql 
      where command_type in (2, 3, 6, 7)) m 
where s.name in ('logons cumulative', 
                 'logons current', 
                 'session logical reads', 
                 'physical reads', 
                 'parse count (total)', 
                 'execute count', 
                 'sorts (memory)', 
                 'sorts (disk)') 
group by m.sga_lios, 
         m.sga_pios, 
         m.sga_parse, 
         m.sga_exec, 
         m.sga_sort 
/

Basically the idea is to compare the values recorded in V$SYSSTAT to the
values obtained by summing up what we find in V$SQL. Note that I have
restrained my query to SELECTs, INSERTs, UPDATEs and DELETEs (that's
what 
      command_type in (...)
means). I was wary of including PL/SQL blocks (command_type 47), which
already aggregate all the 'basic' queries they execute (I know how to
relate them, but this is the type of query I am uncomfortable running
against a production database), as well as invalid rows (command_type
0). I therefore do a 'systemic' error, not counting (especially) COMMITs
nor DDL (I am unsure whether COMMITs really show up as executions too in
V$SYSSTAT).

I have run this against a true production database (not a toy), loaded
the results into Excel, and begun computing for each 5mn slice the
values according to V$SYSSTAT and V$SQL. A funny thing to notice is that
with V$SQL you sometimes have negative values - A function replaces them
with 0 in the spreadsheet (mostly to have scales starting from 0 in
charts). However, it's obvious that a mere interpolation (averaging he
two nearest values) would yield a correct result. I guess it comes from
some periodic 'garbage collection'.
I have uploaded the result at this address :
  
    http://www.roughsea.com/public/SGA_POLL.zip

Bar the 'accidents', the LIO count is close within less than one
percent. There is a delay in the PIO count, probably because V$SYSSTAT
is updated asynchronously. Interestingly, V$SQL shows more PIOs than
V$SYSSTAT. The difference in the number of executions is very small, and
I was expecting *some* anyway. The only place where there is any
noticeable difference is with sorts. Unfortunately, V$SQL (at least in
9.2) doesn't make any difference between memory and disk sorts. My
feeling is that missed sorts belong to this myriad of small memory
sorts. As you can see from the figures, that was a 'serious' database
(telco).

  My conclusion is that my gut feeling was good, and that you can have,
let's say in 99.99% of cases (always let the door open to the unexpected
:-)) a perfectly valid image of what is occurring (once again, assuming
no massive hard-parse problem) by querying V$SQL at a relatively sedate
rate. And that anyway you can have a fair estimate of what you are
missing by checking V$SYSSTAT.

  Would be glad to hear of your views, and even gladder if you could run
the same test on some of your bases (the query runs reasonably fast).

 And thanks for having taken the pain to read so far ...

Stephane Faroult
Oriole Ltd
----------------------------------------------------------------
Please see the official ORACLE-L FAQ: http://www.orafaq.com
----------------------------------------------------------------
To unsubscribe send email to:  oracle-l-request@xxxxxxxxxxxxx
put 'unsubscribe' in the subject line.
--
Archives are at //www.freelists.org/archives/oracle-l/
FAQ is at //www.freelists.org/help/fom-serve/cache/1.html
-----------------------------------------------------------------

Other related posts: