Re: CRS install on Firewire Linux

  • From: Mladen Gogala <gogala@xxxxxxxxxxxxx>
  • To: uwe.weber@xxxxxxxxxxxxx
  • Date: Mon, 10 Oct 2005 05:15:37 +0000

Uwe, according to your logfile, your primary node is evicting the second node 
because it has missed several checkins. In other words, heartbeat is not 
coming through. The cause for that might simply be a faulty Ethernet card.
Also, hopefully, you used a switch when connecting two machines with the 
private interconnect. If you didn't, your database will simply hang, waiting 
for the answer from another side which might never come. First thing, though, 
should be to look in messages file ("dmesg" command) and see whether everything
is OK with your Ethernet card. Using "ping" command to further check the NIC
is also a smart idea. Second, if everything is OK, you may be having 
contention problem with your network "send" and "receive" buffers. On 2.6
kernels you can speed things up by setting the following parameters:

net.core.rmem_default = 524288
net.core.wmem_default = 524288
net.core.rmem_max = 524288
net.core.wmem_max = 524288

You can do that by using sysctl command. Linux 2.6 kernel is of what is 
known as "Microsoft quality" and gives you almost no option to control it.

Sometimes, just sometimes, there are things called bugs. Then you call
Oracle and open a TAR. In 6 months or so you'll probably receive a patch,
usually for the the problem you don't have, unless you've purchased  
golden support. Oracle has outsourced technical support and parts of
development to Elbonia and that reflects on the quality of both code 
and support. The price for golden or platinum support is, of course, 
your firstborn child.

In my opinion, given the quality of both Oracle and Linux, you should put
your production database on Linux only if you have particularly strong desire
for ulcer or heart attack. 

One such nice feature, recently discovered, is that one cannot use export 
utility from the 9.2.0.7 database to export 9.2.0.x, x<=6 database. The reason
is that there is a new part of the SYS.DBMS_EXPORT_EXTENSION package, 
namely FUNC_INDEX_DEFAULT procedure, which doesn't exist in 9.2.0.x , x<=6
and is, of course, invoked by "exp" utility in 9.2.0.7. No warning was 
given, that particular pearl wasn't published anywhere. Those @#$%!
[EXPLETIVE DELETED] [EXPLETIVE DELETED] at oracle seem to think that 
compatibility is a game in which you win points if you dig out surprises.
You don't want to know what I really feel like.

On 10/09/2005 05:42:43 PM, Uwe Weber wrote:
> Hello,
> 
> I have some trouble installing CRS on two WBEL 4.0 nodes.
> Shortly after installing and starting the cssd on the 2nd node this
> node is evicted grom the cluster and hangs. 
> 
> This is what the logfile says:
> 
> [alert$nodename.log]:  [cssd(7501)]CRS-1607:CSSD evicting node inigo. 
> Details in /opt/oracle/product/10.2.0/crs/log/athlon/cssd/ocssd.log
> 
> [ocssd.log]: [    CSSD]2005-10-09 22:41:52.745 [3051846576] >TRACE:   
> clssnmPollingThread: node inigo (2) missed(4) checkin(s)
> [snips 58 repetitions of this message]
> [    CSSD]2005-10-09 22:42:48.850 [3051846576] >TRACE:   
> clssnmPollingThread: Eviction started for node
>  inigo (2), flags 0x000d, state 3, wt4c 0
> 
> Not much help to me and metalink is silent too. But maybe some
> of you can point me in the right direction?
> 
> My setup is:
> 
> 2 nodes, White Box Enterprise Linux 4.0,
> FW-Kernel from oss.oracle.com,
> OCR and CSS on ocfs2 shared filesystem,
> ASM modules from otn,
> interconnect is through 100Mb Ethercards connected with a 
> crosslink cable,
> Firewirecards are Agere Systems (former Lucent Microelectronics) FW323 
> (rev 61) (Belkin)
> and the harddisk is a Maxtor One Touch II. 
> 
> Regards,
> uwe
> 
> 
> 
>  
> 
> --
> //www.freelists.org/webpage/oracle-l
> 
> 

-- 
Mladen Gogala
http://www.mgogala.com


--
//www.freelists.org/webpage/oracle-l

Other related posts: