I meant SYN_SENT, not TCP_WAIT. Slightly different. On Fri, Mar 21, 2008 at 5:09 PM, Jeremy Schneider < jeremy.schneider@xxxxxxxxxxxxxx> wrote: > That's the workhorse script called by CRS to start/stop/stat resources. > Find out what the parameter is (start, stop or stat) with something like > this: > > cat /proc/[pid####]/cmdline|tr '\000' '\n' > > That'll tell us whether CRS is continually restarting ONS or just trying > to "stat" it. (crs_stat can also tell you if there were failed restarts.) > Then you might try to figure out what racgmain is waiting for. To start I'd > look at the process status (is it 'D'? what's WCHAN from ps -l?) and the > network connections (does netstat show any connections in TCP_WAIT state?). > You might also get a stack trace with gdb -p and then "backtrace". > > Just a few ideas... I'm really interested to hear what you turn up. :) > > -Jeremy > > > > On Fri, Mar 21, 2008 at 3:21 PM, William Wagman <wjwagman@xxxxxxxxxxx> > wrote: > > > Greetings, > > > > The question pertains to a two node RAC cluster running Oracle > > 10.2.0.3.0 SE on 32-bit Linux 2.6.9-67.ELsmp. CRS, ASM & RDBMS are each > > in a separate home. Yesterday on node 1 I started seeing messages in the > > /var/log/messages file of the form... > > > > Mar 20 07:5:34 spenser init: Id "h3" respawning too fast: disabled for 5 > > minutes > > > > We did some looking around to try and determine the cause of this but > > didn't come up with anything immediately. There were a core dump > > generated in the $CRS_HOME/log/<node_name>/crsd directory at about the > > time we noticed this beginning. Various error messages indicating > > various failures (I can provide a segment) appeared at this time in the > > crsd.log also. At this point I didn't know what was occurring so opened > > an SR with Oracle. > > > > This morning, which gathering some additional information I found that > > on node2 in this cluster there were a large number of racgmain processes > > running and the number of these processes running was increasing, all > > the swap space and virtually all of the memory on this node were in use. > > Some of the processes were running out of the CRS home and some out of > > the ASM home. I did some investigating to see if it would be possible to > > stop these processes gracefully and was unable to gather any > > information. Ultimately we rebooted node2 of the cluster and everything > > appears to be functioning as is expected at this point. > > > > My question is what would cause the racgmain process to run amok this > > way. Currently ps -ef|grep racgmain shows none running on either node. > > I'm puzzled by this and other than information indicating that this > > process is part of ONS I am not able to find any further information or > > details. Any suggestions would be greatly appreciated. > > > > Thanks. > > > > Bill Wagman > > Univ. of California at Davis > > IET Campus Data Center > > wjwagman@xxxxxxxxxxx > > (530) 754-6208 > > > > -- > > //www.freelists.org/webpage/oracle-l > > > > > > > > > -- > Jeremy Schneider > Chicago, IL > http://www.ardentperf.com/category/technical > -- Jeremy Schneider Chicago, IL http://www.ardentperf.com/category/technical