Re: RAC on OCFS2 acceptance testing

From: Steve Perry <sperry@xxxxxxxxxxx>
To: kevinc@xxxxxxxxxxxxx
Date: Sat, 30 Dec 2006 11:17:57 -0600

A customer ran into a simlar problem(s) with OCFS2 and RHEL4 upd 4(smp kernel).heavy db updates or mixed io (cp from ocfs to ext3, oracle export toext3) would cause the cluster to become unresponsive and crash a node.cp and exp caused a high load avg and heavy swapping. We couldn'teven ssh to the host.I didn't understand the heavy swapping because there was 3GB of cachemem available (shown by free -m).something to do with ocfs and low mem usage. I never got a clearanswer on it.

the ended up setting "vm.lower_zone_protection=100" which helped theswapping issue.


The fencing problem was attributed to the following init.ora parms.
filesystemio_options     = asynch
disk_asynch_io           = TRUE

they were changed to:
disk_asynch_io=FALSE
filesystemio_options='DIRECTIO'

Things have improved since.

I asked Oracle for a good document for OCFS2 and RAC and stillhaven't got a response.

I also asked for optimal kernel parameter settings for OCFS2.

The closest I got was the following list, but no values.
- vm.swappiness
- vm.lower_zone_protection
- vm.vfs_cache_pressure
- vm.dirty_ratio
- vm.dirty_background_ratio

I'm not sure about "unbreakable" Oracle/Linux combo. I'd be happy ifthey focused on "stable" Oracle/Linux.

It comes back to "You get what you pay for". Customers think thatOracle spends as much money on the "freebies" (i.e. OCFS) as they dothe database.


my 2¢

P.S. I spend as much time on Bugzilla as I do metalink these days.


On Dec 28, 2006, at 11:14 AM, Kevin Closson wrote:


And to point out that I'm not being obtuse,
here is a snippet from
http://oss.oracle.com/bugzilla/show_bug.cgi?id=822 :


Environment:
   Linux x86-64  Redhat 4.0 Update 3
   OCFS2 1.2.3  3-node cluster.
Problem:

After installation, created two filesystems to be used forsoftware.To limit timeout problems, increased theO2CB_HEARTBEAT_THRESHOLD TO

31.

   During maintenance window, decided to use the OCFS2 filesystem
   to store a large backup file (about 5-10 gig file).
   SCP'ed the file from an outside server to node1 of the cluster
   using command "scp $file oracle@sachlp10:/ocfs2_fs1/.

   After a few minutes, node1 crashed.
   Did not find error messages on node1, but found them in
/var/log/messages
   on node2:

...wow, sounds like a pretty aggressive workload, right?
--
//www.freelists.org/webpage/oracle-l


--
//www.freelists.org/webpage/oracle-l

Follow-Ups:
- Re: RAC on OCFS2 acceptance testing
  - From: Mladen Gogala

References:
- RE: RAC on OCFS2 acceptance testing
  - From: Kevin Closson

Re: RAC on OCFS2 acceptance testing

Other related posts: