reliable message - bug PMON involved

  • From: Grzegorz Goryszewski <grzegorzof@xxxxxxxxxx>
  • To: oracle-l@xxxxxxxxxxxxx
  • Date: Sat, 26 Nov 2011 16:54:59 +0100

Hi,
 its more for blog post but Im not blogging so maybe share here :) .
Looks like we hit (in 10.2.0.3 env) :
*DBMS_SERVER_ALERT.SET_THRESHOLD HANGS FOREVER AT RELIABLE MESSAGE [ID
794589.1]

looks not bad (relaible message is idle wait right ?) but when I've
tried to deal with hanging processes  via kill -9 processes are no
longer on os pid lists but
from Oracle point of view we still got sessions for that ospids and PMON
is unable to proper clear that session .
From PMON trace:
**** 2011-11-25 13:44:35.047
found process 0x25f5f8bd0 pid=40 serial=2 ospid = 15291 dead
found process 0x25f5f93b8 pid=42 serial=1 ospid = 30345 dead
*** 2011-11-25 13:44:45.060
found process 0x25f5f8bd0 pid=40 serial=2 ospid = 15291 dead
found process 0x25f5f93b8 pid=42 serial=1 ospid = 30345 dead
*** 2011-11-25 13:44:47.064
found process 0x25f5f8bd0 pid=40 serial=2 ospid = 15291 dead
found process 0x25f5f93b8 pid=42 serial=1 ospid = 30345 dead

in alert log PMON is unable to clean up process bla bla .
After restarting EM grid agents there are two new hanging processes on
dbms_server_alert.set_threshold still reliable message .
When You strace that proces You can see
 strace -p 30540
Process 30540 attached - interrupt to quit
semtimedop(819218, 0x7fbfff7270, 1, {1, 0}) = -1 EAGAIN (Resource
temporarily unavailable)
semtimedop(819218, 0x7fbfff7270, 1, {1, 0}) = -1 EAGAIN (Resource
temporarily unavailable)
semtimedop(819218, 0x7fbfff7270, 1, {1, 0}) = -1 EAGAIN (Resource
temporarily unavailable)
semtimedop(819218, 0x7fbfff7270, 1, {1, 0}) = -1 EAGAIN (Resource
temporarily unavailable)
semtimedop(819218, 0x7fbfff7270, 1, {1, 0}) = -1 EAGAIN (Resource
temporarily unavailable)
getrusage(RUSAGE_SELF, {ru_utime={0, 352946}, ru_stime={0, 95985}, ...}) = 0
getrusage(RUSAGE_SELF, {ru_utime={0, 352946}, ru_stime={0, 95985}, ...}) = 0
semtimedop(819218, 0x7fbfff7270, 1, {1, 0}) = -1 EAGAIN (Resource
temporarily unavailable)
semtimedop(819218, 0x7fbfff7270, 1, {1, 0}) = -1 EAGAIN (Resource
temporarily unavailable)
semtimedop(819218, 0x7fbfff7270, 1, {1, 0}) = -1 EAGAIN (Resource
temporarily unavailable)
semtimedop(819218, 0x7fbfff7270, 1, {1, 0}) = -1 EAGAIN (Resource
temporarily unavailable)
semtimedop(819218, 0x7fbfff7270, 1, {1, 0}) = -1 EAGAIN (Resource
temporarily unavailable)
semtimedop(819218, 0x7fbfff7270, 1, {1, 0}) = -1 EAGAIN (Resource
temporarily unavailable)


so its timeout on semaphore set operation call .
There is SR open but Oracle not responded so far .
Dont want to be so dramatic but Im sure shutdown immediate will not help
here :) .
Any ideas how to deal with session hanging on that event (reliable
message ) ?
Regards
GregG



--
//www.freelists.org/webpage/oracle-l


Other related posts:

  • » reliable message - bug PMON involved - Grzegorz Goryszewski