Yes, bad log write times: Event Waits <1ms <2ms <4ms <8ms <16ms <32ms <=1s >1s -------------------------- ----- ----- ----- ----- ----- ----- ----- ----- ----- log file parallel write 38K 72.5 15.4 5.4 2.0 .8 .4 1.3 2.2 You've got over 2% over a second! That's pretty bad. There is something wrong with your I/O subsystem. I did some lgwr I/O write time distribution graphs on a couple of systems here http://dboptimizer.com/2011/07/14/lgwr-redo-write-times-log-file-parallel-write/ Even at 6MB log writes , I was still getting under 300ms over NFS mounted file systems . Typically log writes are much smaller and on the order of 3ms. - Kyle Hailey http://dboptimizer.com On Thu, Jul 21, 2011 at 12:16 AM, Stalin <stalinsk@xxxxxxxxx> wrote: > Hah, totally forgot about event_histograms :) > > i think either the storage is pushed to its limits or something terribly > wrong on the storage side. > > Wait Event Histogram DB/Inst: NBBSDB/nbbsdb Snaps: 11203-11204 > -> Total Waits - units: K is 1000, M is 1000000, G is 1000000000 > -> % of Waits - column heading: <=1s is truly <1024ms, >1s is truly > >=1024ms > -> % of Waits - value: .0 indicates value was <.05%, null is truly 0 > -> Ordered by Event (idle events last) > > Total ----------------- % of Waits > ------------------ > Event Waits <1ms <2ms <4ms <8ms <16ms <32ms <=1s > >1s > -------------------------- ----- ----- ----- ----- ----- ----- ----- ----- > ----- > ADR block file read 13 69.2 15.4 7.7 7.7 > ADR block file write 5 80.0 20.0 > ADR file lock 6 100.0 > ARCH wait for archivelog l 8 100.0 > Data Pump slave startup 1 100.0 > Datapump dump file I/O 18 72.2 27.8 > Disk file operations I/O 1567 92.5 .1 .3 .1 .1 2.1 > 4.9 > LGWR wait for redo copy 562 98.9 .9 .2 > Log archive I/O 2 100.0 > PL/SQL lock timer 28 3.6 3.6 92.9 > Parameter File I/O 104 76.0 9.6 1.9 1.9 1.9 3.8 4.8 > SGA: allocation forcing co 19 5.3 94.7 > SQL*Net break/reset to cli 56 94.6 3.6 1.8 > SQL*Net more data to clien 3818 91.9 .0 8.1 > Streams AQ: enqueue blocke 1 > 100.0 > asynch descriptor resize 269 100.0 > buffer busy waits 7556 87.7 .8 .3 .2 .2 .2 2.5 > 8.2 > buffer deadlock 212 100.0 > buffer exterminate 92K .0 .0 .0 99.1 .5 .4 > change tracking file synch 58 17.2 13.8 10.3 1.7 43.1 > 13.8 > change tracking file synch 58 39.7 3.4 5.2 43.1 > 8.6 > control file parallel writ 1474 32.2 6.8 2.0 1.6 2.1 2.8 44.1 > 8.3 > control file sequential re 8078 55.9 .8 1.1 3.6 2.0 1.6 30.0 > 5.0 > cursor: mutex S 7 100.0 > cursor: pin S 1363 29.2 7.7 7.9 2.4 2.5 6.4 43.9 > cursor: pin S wait on X 378 9.8 6.6 82.3 > 1.3 > db file parallel read 7 14.3 14.3 28.6 > 42.9 > db file parallel write 276K 13.5 7.9 11.8 13.2 12.4 10.8 30.0 > .3 > db file scattered read 32K 11.3 2.7 10.2 20.2 44.3 8.9 2.3 > .0 > db file sequential read 308K 30.2 .9 2.3 4.1 1.6 .6 38.6 > 21.7 > db file single write 5 60.0 40.0 > direct path read 143 31.5 5.6 17.5 30.1 6.3 .7 7.0 > 1.4 > direct path read temp 9032 7.8 5.9 14.2 28.2 10.1 2.0 25.5 > 6.2 > direct path sync 2 100.0 > direct path write 4 25.0 25.0 50.0 > direct path write temp 3441 2.2 2.6 6.4 22.2 58.8 6.2 1.6 > .1 > enq: CF - contention 22 4.5 22.7 > 72.7 > enq: CR - block range reus 6 83.3 16.7 > enq: FB - contention 2 100.0 > enq: HW - contention 17 35.3 5.9 11.8 5.9 5.9 > 35.3 > enq: KO - fast object chec 1 100.0 > enq: RO - fast object reus 7 42.9 28.6 14.3 14.3 > enq: SQ - contention 11 27.3 72.7 > enq: TQ - DDL contention 1 100.0 > enq: TQ - TM contention 1 100.0 > enq: TX - allocate ITL ent 1716 .9 .1 .2 .6 1.9 2.2 30.7 > 63.4 > enq: TX - contention 1 100.0 > enq: TX - index contention 592 37.2 6.6 .2 .3 1.7 3.0 38.0 > 13.0 > enq: TX - row lock content 4 25.0 25.0 25.0 25.0 > enq: UL - contention 1 100.0 > kksfbc child completion 1 100.0 > kupp process wait 2 100.0 > latch free 33K 26.8 10.7 12.3 8.2 17.0 19.5 5.5 > latch: In memory undo latc 1046 98.6 .7 .5 .3 > latch: cache buffers chain 4750 96.3 1.1 1.1 .9 .2 .4 .0 > latch: cache buffers lru c 51 37.3 7.8 5.9 21.6 5.9 19.6 2.0 > latch: enqueue hash chains 1927 6.8 3.3 3.2 2.5 3.0 .9 80.3 > latch: messages 333 100.0 > latch: object queue header 39 76.9 5.1 5.1 12.8 > latch: redo allocation 5 20.0 20.0 20.0 20.0 20.0 > latch: redo writing 12 83.3 8.3 8.3 > latch: row cache objects 173 30.1 6.9 .6 4.0 36.4 22.0 > latch: session allocation 240 17.1 11.7 18.3 49.6 3.3 > latch: shared pool 2 50.0 50.0 > latch: undo global data 9 77.8 22.2 > library cache load lock 1 100.0 > library cache lock 31 64.5 9.7 6.5 19.4 > library cache: mutex X 649 98.9 .6 .2 .3 > log file parallel write 38K 72.5 15.4 5.4 2.0 .8 .4 1.3 > 2.2 > log file sequential read 470 2.8 .2 .9 .6 .2 72.6 > 22.8 > log file single write 8 62.5 12.5 12.5 > 12.5 > log file switch (private s 2303 .0 .2 6.2 > 93.6 > log file sync 838K 2.9 1.0 .5 1.7 1.7 .8 7.6 > 83.8 > os thread startup 21 100.0 > rdbms ipc reply 29 89.7 6.9 > 3.4 > read by other session 96K 73.6 2.0 3.7 3.5 1.4 .4 10.7 > 4.6 > reliable message 58 62.1 37.9 > row cache lock 4 100.0 > undo segment extension 20 15.0 15.0 10.0 60.0 > utl_file I/O 4602K 99.7 .0 .0 .3 .0 .0 .0 > wait list latch free 1 100.0 > DIAG idle wait 6675 100.0 > .0 > KSV master wait 3 66.7 33.3 > SQL*Net message from clien 2871K 33.7 2.8 4.4 3.8 10.9 18.3 22.2 > 3.8 > SQL*Net message to client 2811K 100.0 .0 .0 .0 .0 .0 .0 > SQL*Net more data from cli 26K 99.8 .0 .0 .0 .0 .1 > Space Manager: slave idle 690 > 100.0 > Streams AQ: qmn coordinato 269 51.3 .4 > 48.3 > Streams AQ: qmn slave idle 251 2.4 > 97.6 > Streams AQ: waiting for ti 2 50.0 > 50.0 > class slave wait 30 100.0 > jobq slave wait 120 100.0 > pmon timer 1243 8.5 > 91.5 > rdbms ipc message 53K 13.0 30.5 8.7 1.8 2.1 2.5 14.6 > 26.8 > smon timer 18 5.6 > 94.4 > wait for unread message on 81 1.2 1.2 2.5 18.5 8.6 6.2 61.7 > ------------------------------------------------------------- > > > On Wed, Jul 20, 2011 at 11:50 PM, Stalin <stalinsk@xxxxxxxxx> wrote: > >> Thanks Kyle. >> >> Apparently we don't have diagnostic pack licenses to use DBA_HIST tables. >> Is this something i can get from statspack tables ? >> >> >> On Wed, Jul 20, 2011 at 10:20 PM, kyle Hailey <kylelf@xxxxxxxxx> wrote: >> >>> >>> Yeah, the massive log waits could exacerbate the ITL waits as things will >>> be slow commiting their changes. >>> The db file sequential reads are huge as well. Looks like you have a >>> problem on your storage. >>> >>> Check the log write times ( I blogged some of this at >>> http://dboptimizer.com/2011/07/20/wait-event-and-wait-class-metrics-vs-vsystem_event/) >>> >>> >>> Latencies in the past minute >>> >>> col name for a25 >>> select m.intsize_csec, >>> n.name , >>> m.time_waited, >>> m.wait_count, >>> 10*m.time_waited/nullif(m.wait_count,0) avgms >>> from v$eventmetric m, >>> v$event_name n >>> where m.event_id=n.event_id >>> and n.name in ( >>> 'log file sync', >>> 'log file parallel write' >>> ); >>> >>> >>> Latencies averaged over each hour >>> >>> select >>> btime, >>> (time_ms_end-time_ms_beg)/nullif(count_end-count_beg,0) avg_ms >>> from ( >>> select >>> to_char(s.BEGIN_INTERVAL_TIME,'DD-MON-YY HH24:MI') btime, >>> total_waits count_end, >>> time_waited_micro/1000 time_ms_end, >>> Lag (e.time_waited_micro/1000) >>> OVER( PARTITION BY e.event_name ORDER BY s.snap_id) >>> time_ms_beg, >>> Lag (e.total_waits) >>> OVER( PARTITION BY e.event_name ORDER BY s.snap_id) count_beg >>> from >>> DBA_HIST_SYSTEM_EVENT e, >>> DBA_HIST_SNAPSHOT s >>> where >>> s.snap_id=e.snap_id >>> and e.event_name in ( >>> 'log file sync', >>> 'log file parallel write' >>> ) >>> order by begin_interval_time >>> ) >>> order by btime >>> / >>> >>> If the log file parallel writes are bad then probably something wrong on >>> the storage system. >>> >>> - Kyle Hailey >>> http://dboptimizer.com >>> >>> >>> >>> On Wed, Jul 20, 2011 at 3:37 PM, Stalin <stalinsk@xxxxxxxxx> wrote: >>> >>>> We have been seeing lots of deadlock errors lately in load testing >>>> environments and they all have been due to enq: TX - allocate ITL entry. In >>>> reviewing the statspack report for the periods of deadlock, i see that, log >>>> file sync wait being the top consumer with a terrible wait time. That makes >>>> to me think the deadlock, is just a symptom of high log file sync wait >>>> times. Below is the snippet from statspack and looking at these numbers, >>>> especially CPU not being heavily loaded, wondering if this could be a case >>>> of storage issue. Sys Admins are checking the storage layer but thought >>>> would check here get any opinions/feedback. >>>> >>>> Top 5 Timed Events >>>> Avg %Total >>>> ~~~~~~~~~~~~~~~~~~ >>>> wait Call >>>> Event Waits Time (s) >>>> (ms) Time >>>> ----------------------------------------- ------------ ----------- >>>> ------ ------ >>>> log file sync 1,400,773 4,357,902 >>>> 3111 91.4 >>>> db file sequential read 457,568 334,834 >>>> 732 7.0 >>>> db file parallel write 565,843 27,573 >>>> 49 .6 >>>> read by other session 16,168 7,395 >>>> 457 .2 >>>> enq: TX - allocate ITL entry 575 6,854 >>>> 11919 .1 >>>> ------------------------------------------------------------- >>>> Host CPU (CPUs: 64 Cores: 8 Sockets: 1) >>>> ~~~~~~~~ Load Average >>>> Begin End User System Idle WIO >>>> WCPU >>>> ------- ------- ------- ------- ------- ------- >>>> -------- >>>> 3.13 7.04 2.26 3.30 94.44 0.00 >>>> 7.81 >>>> >>>> Statistic Total per Second >>>> per Trans >>>> --------------------------------- ------------------ -------------- >>>> ------------ >>>> redo synch time 435,852,302 120,969.3 >>>> 309.7 >>>> redo synch writes 1,400,807 388.8 >>>> 1.0 >>>> redo wastage 5,128,804 1,423.5 >>>> 3.6 >>>> redo write time 357,414 99.2 >>>> 0.3 >>>> redo writes 9,935 2.8 >>>> 0.0 >>>> user commits 1,400,619 388.7 >>>> 1.0 >>>> >>>> >>>> Environment : 11gr2 EE (11.2.0.1), Sol 10 Sparc >>>> >>>> Thanks, >>>> Stalin >>>> >>>> >>> >> >> >> -- >> Thanks, >> >> Stalin >> > > > > -- > Thanks, > > Stalin >