Re: Deadlock ITL Waits

  • From: kyle Hailey <kylelf@xxxxxxxxx>
  • To: Stalin <stalinsk@xxxxxxxxx>, ORACLE-L <Oracle-L@xxxxxxxxxxxxx>
  • Date: Thu, 21 Jul 2011 08:23:01 -0700

At the simplest you can just run a statspack report and search for "log file
parallel write" . It will be in the report as well.

Well the metric tables are not part of perf pack, so this should be fine:

Latencies in the past minute

col name for a25
select m.intsize_csec,
       n.name ,
       m.time_waited,
       m.wait_count,
       10*m.time_waited/nullif(m.wait_count,0) avgms
from v$eventmetric m,
     v$event_name n
where m.event_id=n.event_id
  and n.name in (
                  'log file sync',
                  'log file parallel write'
);

but alas that's the last minute. On the other hand if you have statspack you
can query the statspack  tables in a similar way to   the AWR querying based
on

       DBA_HIST_SYSTEM_EVENT e,
       DBA_HIST_SNAPSHOT s

Could you give this a try on statspack and post the query?


- Kyle
http://dboptimizer.com


On Wed, Jul 20, 2011 at 11:50 PM, Stalin <stalinsk@xxxxxxxxx> wrote:

> Thanks Kyle.
>
> Apparently we don't have diagnostic pack licenses to use DBA_HIST tables.
> Is this something i can get from statspack tables ?
>
>
> On Wed, Jul 20, 2011 at 10:20 PM, kyle Hailey <kylelf@xxxxxxxxx> wrote:
>
>>
>> Yeah, the massive log waits could exacerbate the ITL waits as things will
>> be slow commiting their changes.
>> The db file sequential reads are huge as well. Looks like you have a
>> problem on your storage.
>>
>> Check the log write times  ( I blogged some of this at
>> http://dboptimizer.com/2011/07/20/wait-event-and-wait-class-metrics-vs-vsystem_event/)
>>
>>
>> Latencies in the past minute
>>
>> col name for a25
>> select m.intsize_csec,
>>        n.name ,
>>        m.time_waited,
>>        m.wait_count,
>>        10*m.time_waited/nullif(m.wait_count,0) avgms
>> from v$eventmetric m,
>>      v$event_name n
>> where m.event_id=n.event_id
>>   and n.name in (
>>                   'log file sync',
>>                   'log file parallel write'
>> );
>>
>>
>> Latencies averaged over each hour
>>
>> select
>>        btime,
>>        (time_ms_end-time_ms_beg)/nullif(count_end-count_beg,0) avg_ms
>> from (
>> select
>>        to_char(s.BEGIN_INTERVAL_TIME,'DD-MON-YY HH24:MI')  btime,
>>        total_waits count_end,
>>        time_waited_micro/1000 time_ms_end,
>>        Lag (e.time_waited_micro/1000)
>>               OVER( PARTITION BY e.event_name ORDER BY s.snap_id) 
>> time_ms_beg,
>>        Lag (e.total_waits)
>>               OVER( PARTITION BY e.event_name ORDER BY s.snap_id) count_beg
>> from
>>        DBA_HIST_SYSTEM_EVENT e,
>>        DBA_HIST_SNAPSHOT s
>> where
>>          s.snap_id=e.snap_id
>>    and e.event_name in (
>>                   'log file sync',
>>                   'log file parallel write'
>> )
>> order by begin_interval_time
>> )
>> order by btime
>> /
>>
>> If the log file parallel writes are bad then probably something wrong on
>> the storage system.
>>
>> - Kyle Hailey
>> http://dboptimizer.com
>>
>>
>>
>> On Wed, Jul 20, 2011 at 3:37 PM, Stalin <stalinsk@xxxxxxxxx> wrote:
>>
>>> We have been seeing lots of deadlock errors lately in load testing
>>> environments and they all have been due to enq: TX - allocate ITL entry. In
>>> reviewing the statspack report for the periods of deadlock, i see that, log
>>> file sync wait being the top consumer with a terrible wait time. That makes
>>> to me think the deadlock, is just a symptom of high log file sync wait
>>> times.  Below is the snippet from statspack and looking at these numbers,
>>> especially CPU not being heavily loaded, wondering if this could be a case
>>> of storage issue. Sys Admins are checking the storage layer but thought
>>> would check here get any opinions/feedback.
>>>
>>> Top 5 Timed Events                                                    Avg
>>> %Total
>>> ~~~~~~~~~~~~~~~~~~                                                   wait
>>>   Call
>>> Event                                            Waits    Time (s)   (ms)
>>>   Time
>>> ----------------------------------------- ------------ ----------- ------
>>> ------
>>> log file sync                                1,400,773   4,357,902
>>> 3111   91.4
>>> db file sequential read                        457,568     334,834    732
>>>    7.0
>>> db file parallel write                         565,843      27,573     49
>>>     .6
>>> read by other session                           16,168       7,395    457
>>>     .2
>>> enq: TX - allocate ITL entry                       575       6,854  11919
>>>     .1
>>>          -------------------------------------------------------------
>>> Host CPU  (CPUs: 64  Cores: 8  Sockets: 1)
>>> ~~~~~~~~              Load Average
>>>                      Begin     End      User  System    Idle     WIO
>>> WCPU
>>>                    ------- -------   ------- ------- ------- -------
>>> --------
>>>                       3.13    7.04      2.26    3.30   94.44    0.00
>>>  7.81
>>>
>>> Statistic                                      Total     per Second
>>>  per Trans
>>> --------------------------------- ------------------ --------------
>>> ------------
>>> redo synch time                          435,852,302      120,969.3
>>>  309.7
>>> redo synch writes                          1,400,807          388.8
>>>    1.0
>>> redo wastage                               5,128,804        1,423.5
>>>    3.6
>>> redo write time                              357,414           99.2
>>>    0.3
>>> redo writes                                    9,935            2.8
>>>    0.0
>>> user commits                               1,400,619          388.7
>>>    1.0
>>>
>>>
>>> Environment : 11gr2 EE (11.2.0.1), Sol 10 Sparc
>>>
>>> Thanks,
>>> Stalin
>>>
>>>
>>
>
>
> --
> Thanks,
>
> Stalin
>

Other related posts: