Hello Riyaj I'm re-installing the Operating System of machines and tomorrow I'll re-install the Oracle RAC (with default settings I'll check the crsd logs) and try tuning this time. Thanks again. PS: In generally, what the time between stop the first node and the second node up the first VIP interface ?! Good Night All. Waldirio 2008/6/12 Riyaj Shamsudeen <riyaj.shamsudeen@xxxxxxxxx>: > Hello Waldirio > Breaking up crsd.log, Approximately 30 seconds spent on CLSC recv/send > failure etc. Parameter css misscount is set to 30 in unix platforms. I would > say, misscount is controlling this duration, but that need to be validated > enabling further trace and looking at cssd.log etc.., if you want. > > 2008-06-12 14:19:15.781: [ OCRMSG][1484962144]prom_rpc: CLSC recv > failure..ret code 7 > 2008-06-12 14:19:42.464: [ OCRMSG][1484962144]prom_rpc: CLSC send > failure..ret code 6 > > Another 26 seconds spent in Cluster reconfiguration below.. > > 2008-06-12 14:19:46.036: [ OCRSRV][2541411904]proath_init: Failed to > retrieve pubdata. Expect a rcfg > 2008-06-12 14:20:12.283: [ OCRMAS][1210108256]th_master:12: I AM THE NEW > OCR MASTER at incar 1. Node Number 1 > > Changing these parameters have profound effect on availability especially > if the network architecture is not good enough. > > Cheers > Riyaj Shamsudeen > The Pythian Group www.pythian.com <http://www.pythian.com/> > Personal blog: orainternals.wordpress.com < > http://orainternals.wordpress.com/> > > Waldirio Manhães Pinheiro wrote: > >> Hello Friend >> Thank you for answer .., let's check. >> 2008/6/12, Riyaj Shamsudeen <riyaj.shamsudeen@xxxxxxxxx <mailto: >> riyaj.shamsudeen@xxxxxxxxx>>: >> >> Hello Waldirio >> >> the time to the first machine detect the second machine >> powered off is very big (between 1 and 2 min), >> How are you measuring this time? Are you checking alert log or >> are you using DB connections to check it? >> >> I was check this time starting when I have been send the shutdown to >> server until the second VIP interface up on second node (backup node). >> >> Can you also send crsd.log? >> >> Ok, following the address because the size ... >> http://rafb.net/p/hqE13995.html >> When I send the power off on first node, on second node (crsd log on link >> above), on line 1 log the message "[ COMMCRS][1147169120]clsc_receive: >> (0xc6d180) Error receiving, ns (12535, 12560), transport (505, 110, 0)" and >> still "Connection not active" until line 2045. >> PS: Now, my VIP address of first node don't migrated to second node later >> power off ... (maybe will be necessary re-install the OS and Oracle >> ClusterWare, because I've changed the system a lot of to test) >> >> Further, refer $CRS_HOME/bin/racgvip and there are few parameters >> such as check interval, restart attempts etc controlling behavior >> of VIP failover too. Not sure, they are applicable when machine is >> rebooted since heartbeat will fail before vip check.. >> >> Yes, I checked this file too, but don't changed. >> Now, looking the crsd log file, I believe the Oracle know when another >> node is out, but who is responsible to make a failover (mount the aliases of >> VIP on another machine) !? (Script, Daemon, Angel :P ) >> Thank you friends for help. >> Waldirio >> >> Cheers >> Riyaj Shamsudeen >> The Pythian Group www.pythian.com <http://www.pythian.com/> >> Personal blog: orainternals.wordpress.com >> <http://orainternals.wordpress.com/> >> >> Waldirio Manhães Pinheiro wrote: >> >> Hello Friends >> I'd like to ask about Oracle RAC in Linux environment. I >> installed two machine with RedHat AS 4Up5 and Oracle 10.2.0.3 >> <http://10.2.0.3/> <http://10.2.0.3/> with ClusterWare. The >> >> installation finish with successful and the data base work fine. >> I checked my environment of availability with the test below: >> Station cambeba UP >> Station cangua UP >> # crs_stat -t >> Name Type Target State Host >> ------------------------------------------------------------ >> ora....BA.lsnr application ONLINE ONLINE cambeba >> ora....eba.gsd application ONLINE ONLINE cambeba >> ora....eba.ons application ONLINE ONLINE cambeba >> ora....eba.vip application ONLINE ONLINE cambeba >> ora....UA.lsnr application ONLINE ONLINE cangua >> ora.cangua.gsd application ONLINE ONLINE cangua >> ora.cangua.ons application ONLINE ONLINE cangua >> ora.cangua.vip application ONLINE ONLINE cangua >> ora.ora10gq.db application ONLINE ONLINE cangua >> ora....q1.inst application ONLINE ONLINE cangua >> ora....q2.inst application ONLINE ONLINE cambeba >> At this point, that's ok, but when I force a power off in >> cangua or cambeba (the name of my machines), the time to the >> firt machine detect the second machine powered off is very big >> (between 1 and 2 min), so, if my client was working, will lost >> the query for time out. >> I changed the configurations in objects ora.cambeba.vip and >> ora.cangua.vip, but without successful. >> Any Ideia to fix this problem (decrease the time of check >> between nodes on cluster) ?!?! >> PS: I checked in list database, but without successful about >> this problem >> >> Thanks in advanced. >> -- ______________ >> Atenciosamente >> Waldirio >> msn: wmp@xxxxxxxxxxxxx <mailto:wmp@xxxxxxxxxxxxx> >> <mailto:wmp@xxxxxxxxxxxxx <mailto:wmp@xxxxxxxxxxxxx>> >> Site: www.waldirio.com.br <http://www.waldirio.com.br/> >> <http://www.waldirio.com.br/> >> Blog: blog.waldirio.com.br <http://blog.waldirio.com.br/> >> <http://blog.waldirio.com.br/> >> PGP: www.waldirio.com.br/public.html >> <http://www.waldirio.com.br/public.html> >> <http://www.waldirio.com.br/public.html> >> >> >> >> >> >> -- >> ______________ >> Atenciosamente >> Waldirio >> msn: wmp@xxxxxxxxxxxxx <mailto:wmp@xxxxxxxxxxxxx> >> Site: www.waldirio.com.br <http://www.waldirio.com.br> >> Blog: blog.waldirio.com.br <http://blog.waldirio.com.br> >> PGP: www.waldirio.com.br/public.html < >> http://www.waldirio.com.br/public.html> >> > > -- ______________ Atenciosamente Waldirio msn: wmp@xxxxxxxxxxxxx Site: www.waldirio.com.br Blog: blog.waldirio.com.br PGP: www.waldirio.com.br/public.html