Re: Disk Device Busy (%) - What exactly is this?

  • From: Karl Arao <karlarao@xxxxxxxxx>
  • To: grzegorzof@xxxxxxxxxx
  • Date: Mon, 21 Nov 2011 11:25:25 -0600

To add on this blog link, if you have collectl installed somewhere there's
a file called formatit.ph that contains all the formatting/formulas that
collectl is using.. there's a section where the device busy % is derived
 ($dskUtil)
[root@desktopserver ~]# locate formatit.ph
/usr/share/collectl/formatit.ph
[root@desktopserver ~]# less /usr/share/collectl/formatit.ph

....

      # we only need these if doing individual disk calculations
      if ($subsys=~/D/)
      {
        # if doing hires time, we need the interval duration and
unfortunately at
        # this point in time $intSecs has not been set so we can't use it
        $microInterval=($fullTime-$lastSecs[$rawPFlag])*100    if
$hiResFlag;

        $numIOs=$dskRead[$dskIndex]+$dskWrite[$dskIndex];
        $dskRqst[$dskIndex]=   $numIOs ?
($dskReadKB[$dskIndex]+$dskWriteKB[$dskIndex])/$numIOs : 0;
        $dskQueLen[$dskIndex]=
$dskWeighted[$dskIndex]/$microInterval*$HZ/1000;
        $dskWait[$dskIndex]=   $numIOs ?
($dskReadTicks[$dskIndex]+$dskWriteTicks[$dskIndex])/$numIOs : 0;
        $dskSvcTime[$dskIndex]=$numIOs ? $dskTicks[$dskIndex]/$numIOs : 0;
        $dskUtil[$dskIndex]=   $dskTicks[$dskIndex]*10/$microInterval;
      }

....


if you are troubleshooting a "slow IO", you also need to consider and
correlate the service times of the SAN, oracle datafiles, and the session
IO service times... of course you need to sample them in a consistent and
fine grained manner, I would do 5secs interval for all the 3 subsystems
- SAN -> iostat -xnc 1 100000 |  while read line; do echo "`date +%T`"
"$line" ; done >> iostat_1.txt
- datafiles ->
https://www.dropbox.com/s/jzcl5ydt29mvw69/PerformanceAndTroubleshooting/filestat.sql
- session - > @snapper ash=sql_id+sid+event+wait_class+module+service,stats
5 5 sid=<sid>

I had a recent scenario on Solaris M5000/9000 where the SAN (Symmetrix) and
datafiles are on the 10-60ms range and the oracle sessions are doing slow
IO and having around 900ms to 1sec service times, well that issue is
related to CPU scheduling (they have a really high load avg) and sessions
spinning on vxfslocks (due to concurrent IO not set).. but that is
something you have to keep in mind on the IO troubleshooting, the response
time of the kernel mode calls down to the low-level components (not
preempted) + the response time of the user mode calls (session IO - not
being serviced properly because of preemption brought by scheduling/lock
issues).

Here's the sample distribution of that scenario
http://karlarao.tiddlyspot.com/#%5B%5Bavg%20latency%20issue%5D%5D





-- 
Karl Arao
karlarao.wordpress.com
karlarao.tiddlyspot.com


--
//www.freelists.org/webpage/oracle-l


Other related posts: