I was able to catch the hang in production again. Really ugly this time - one command took 8-9 minutes to execute and a simple lsnrctl status took close to 15 minutes to execute in a production environment. Here was my command... truss -d -D -f -o bill2.out ls -lrt This is a small section of the resulting output...A bad start right out of the gates... Base time stamp: 1323721323.0651 [ Mon Dec 12 15:22:03 EST 2011 ] 1186: 0.0000 0.0000 execve("/usr/bin/ls", 0xFFBFF514, 0xFFBFF520) argc = 2 1186: 102.3725 102.3725 resolvepath("/usr/lib/ld.so.1", "/lib/ld.so.1", 1023) = 12 1186: 107.4393 5.0668 resolvepath("/usr/bin/ls", "/usr/bin/ls", 1023) = 11 1186: 107.4420 0.0027 stat64("/usr/bin/ls", 0xFFBFF1D0) = 0 Here was the second incident...(I replaced some of our file system names with variables to protect the innocent... :) ) The wait times are outrageous... truss -d -D -f -o bill3.out lsnrctl status <listener_name> Base time stamp: 1323722672.7062 [ Mon Dec 12 15:44:32 EST 2011 ] 9898: 0.0000 0.0000 execve("{ORACLE_HOME}/bin/lsnrctl", 0xFFFFFFFF7FFFF368, 0xFFFFFFFF7FFFF388) argc = 3 9898: 80.3434 80.3434 resolvepath("/usr/lib/sparcv9/ld.so.1", "/lib/sparcv9/ld.so.1", 1023) = 20 9898: 116.6389 36.2955 resolvepath("{ORACLE_HOME}/bin/lsnrctl", "{ORACLE_HOME}/bin/lsnrctl", 1023) = 46 9898: 116.6403 0.0014 stat("{ORACLE_HOME}/bin/lsnrctl", 0xFFFFFFFF7FFFEF60) = 0 9898: 569.7705 453.1302 open("/var/ld/64/ld.config", O_RDONLY) Err#2 ENOENT ________________________________ From: Johnson, William L (TEIS) Sent: Sunday, December 11, 2011 7:37 PM To: 'Jared Still' Cc: ORACLE-L Subject: RE: Solaris 10 dtrace command... Well, this is rather embarrassing. I just looked through my .sh_history file and found that I never included the "-d -D" option for my truss command. I was building a script to run and never ran the script when the problem surfaced...I ran the command by hand and forgot the options. Now I have to hope I can catch the system hanging again... Here is what the output looks like when I run it properly... <server_name>:XXX02/xxx02/data00/oracle/local>more bill2.out Base time stamp: 1323649536.7395 [ Sun Dec 11 19:25:36 EST 2011 ] 1402: 0.0000 0.0000 execve("/usr/bin/ls", 0xFFBFF51C, 0xFFBFF528) argc = 2 1402: 0.0031 0.0031 resolvepath("/usr/lib/ld.so.1", "/lib/ld.so.1", 1023) = 12 1402: 0.0032 0.0001 resolvepath("/usr/bin/ls", "/usr/bin/ls", 1023) = 11 1402: 0.0034 0.0002 stat64("/usr/bin/ls", 0xFFBFF1D8) = 0 1402: 0.0035 0.0001 open("/var/ld/ld.config", O_RDONLY) Err#2 ENOENT 1402: 0.0037 0.0002 stat64("/usr/lib/X11R5/libsec.so.1", 0xFFBFE968) Err#2 ENOENT 1402: 0.0038 0.0001 stat64("/usr/lib/Motif1.2/libsec.so.1", 0xFFBFE968) Err#2 ENOENT ________________________________ From: Jared Still [mailto:jkstill@xxxxxxxxx] Sent: Friday, December 09, 2011 6:50 PM To: Johnson, William L (TEIS) Cc: ORACLE-L Subject: Re: Solaris 10 dtrace command... On Fri, Dec 9, 2011 at 10:46 AM, Johnson, William L (TEIS) <WLJohnson@xxxxxx<mailto:WLJohnson@xxxxxx>> wrote: I am having sporadic problems with a Solaris 10 database server running on a ZFS file system. Every once in a while, a simple OS command like "ls -al" in a directory with 10-20 files will hang for more than 1 minute. I was able to use the truss command to finally capture one of the incidents where the "ls -al" command took over 1 minute. The unfortunate thing is that the truss output wasn't able to capture where the wait occurred. I am now moving on to dtrace - but wow...I am really hoping that someone on the list has had prior Please share the truss output. You can use http://pastebin.com<http://pastebin.com/fmtGg5rM> to share it. Just paste the text into the box, click submit, and share the resulting URL. Such as: http://pastebin.com/fmtGg5rM Jared Still Certifiable Oracle DBA and Part Time Perl Evangelist Oracle Blog: http://jkstill.blogspot.com Home Page: http://jaredstill.com -- //www.freelists.org/webpage/oracle-l