Hi Matt,dmesg shows only timeouts on a cdrom drive and reservation conflicts on tape devices. Multipathing is not used.
scsi3 (0,2,0) : reservation conflict scsi3 (0,2,0) : reservation conflict ide-cd: cmd 0x1e timed out hda: irq timeout: status=0xd0 { Busy } hda: irq timeout: error=0x00 hda: ATAPI reset complete ide-cd: cmd 0x25 timed out hda: irq timeout: status=0xd0 { Busy } hda: irq timeout: error=0x00 hda: ATAPI reset complete Thanks, Pawel On 2009-05-15 15:57, Matthew Zito wrote:
If you run a "dmesg" - do you see any errors in the kernel logs? If the devices stop responding to I/O for periods of time there should be SCSI timeouts in the logs, or at least some warnings from the multipathing driver. Thanks, Matt -- Matthew Zito Chief Scientist GridApp Systems P: 646-452-4090 mzito@xxxxxxxxxxx http://www.gridapp.com -----Original Message----- From: oracle-l-bounce@xxxxxxxxxxxxx on behalf of Pawel Kotlarz Sent: Fri 5/15/2009 9:48 AM To: oracle-l@xxxxxxxxxxxxx Subject: Oracle 10g hangs intermittently waiting for I/OHello all.I have oracle 10.2.0.3 data warehouse database on 11.1.0.7 ASM with asmlib. RHEL 4.7. Proliant DL585 G2 with MSA70 storage. The problem I face is an 'I/O hiccup'. The database can work properly for a week or two and then suddenly keep stalling for no apparent reason. Users complain that their selects take 2x or 3x more time. vmstat shows I/O activity (bi, bo colums) for half a minute and for another half a minute shows no activity (bi and bo columns equal to 0) and a number of processes waiting for I/O (procs/b column). strace on anoracle process waiting for I/O shows it is waiting for a completion of 'read' call. The only thing that helps is rebooting the box.I can isolate the problem to specific disks using iostat. These disks are the same on a day the problem occurs but they are different on another occurrance of the problem. Storage / Linux admins do not see any problem on their side. I have several one-off patches recommended by Oracle support: Bug 5452672: Hung database instance if linux kernel miss aio request Bug 6656824: LNX-10204-TC6 SIGSEGV AT SKGFR_REAP64()+281, IN DBW0 Bug 6087207: WARNING:ORACLE PROCESS RUNNING OUT OF OS KERNEL I/O RESOURCESBug 6882513 - MERGE LABEL REQUEST ON TOP OF 10.2.0.3 FOR BUGS 6801535 5576584Bug 5576584 (4880399): ASM PARALLEL READS PERFORMANCE NOT ACCEPTABLE I plan to upgrade to 10.2.0.4 but need first to sort out some hash join bugs (yet unknown to Oracle) that break our large queries with ora-600 errors. What would you recommend to do to narrow down the problem to Oracle / ASM / asmlib / Linux / storage fault? Do you know of any other bugs that can show such a behaviour? Thanks. Pawel Kotlarz -- //www.freelists.org/webpage/oracle-l
-- //www.freelists.org/webpage/oracle-l