Re: RAC on OCFS2 acceptance testing
- From: Steve Perry <sperry@xxxxxxxxxxx>
- To: kevinc@xxxxxxxxxxxxx
- Date: Sat, 30 Dec 2006 11:17:57 -0600
A customer ran into a simlar problem(s) with OCFS2 and RHEL4 upd 4
(smp kernel).
heavy db updates or mixed io (cp from ocfs to ext3, oracle export to
ext3) would cause the cluster to become unresponsive and crash a node.
cp and exp caused a high load avg and heavy swapping. We couldn't
even ssh to the host.
I didn't understand the heavy swapping because there was 3GB of cache
mem available (shown by free -m).
something to do with ocfs and low mem usage. I never got a clear
answer on it.
the ended up setting "vm.lower_zone_protection=100" which helped the
swapping issue.
The fencing problem was attributed to the following init.ora parms.
filesystemio_options = asynch
disk_asynch_io = TRUE
they were changed to:
disk_asynch_io=FALSE
filesystemio_options='DIRECTIO'
Things have improved since.
I asked Oracle for a good document for OCFS2 and RAC and still
haven't got a response.
I also asked for optimal kernel parameter settings for OCFS2.
The closest I got was the following list, but no values.
- vm.swappiness
- vm.lower_zone_protection
- vm.vfs_cache_pressure
- vm.dirty_ratio
- vm.dirty_background_ratio
I'm not sure about "unbreakable" Oracle/Linux combo. I'd be happy if
they focused on "stable" Oracle/Linux.
It comes back to "You get what you pay for". Customers think that
Oracle spends as much money on the "freebies" (i.e. OCFS) as they do
the database.
my 2¢
P.S. I spend as much time on Bugzilla as I do metalink these days.
On Dec 28, 2006, at 11:14 AM, Kevin Closson wrote:
And to point out that I'm not being obtuse,
here is a snippet from
http://oss.oracle.com/bugzilla/show_bug.cgi?id=822 :
Environment:
Linux x86-64 Redhat 4.0 Update 3
OCFS2 1.2.3 3-node cluster.
Problem:
After installation, created two filesystems to be used for
software.
To limit timeout problems, increased the
O2CB_HEARTBEAT_THRESHOLD TO
31.
During maintenance window, decided to use the OCFS2 filesystem
to store a large backup file (about 5-10 gig file).
SCP'ed the file from an outside server to node1 of the cluster
using command "scp $file oracle@sachlp10:/ocfs2_fs1/.
After a few minutes, node1 crashed.
Did not find error messages on node1, but found them in
/var/log/messages
on node2:
...wow, sounds like a pretty aggressive workload, right?
--
http://www.freelists.org/webpage/oracle-l
--
http://www.freelists.org/webpage/oracle-l
- Follow-Ups:
- Re: RAC on OCFS2 acceptance testing
- From: Mladen Gogala
- References:
- RE: RAC on OCFS2 acceptance testing
- From: Kevin Closson
Other related posts:
- » Re: RAC on OCFS2 acceptance testing
- » Re: RAC on OCFS2 acceptance testing
- » RE: RAC on OCFS2 acceptance testing
- » RE: RAC on OCFS2 acceptance testing
- » RE: RAC on OCFS2 acceptance testing
- » RE: RAC on OCFS2 acceptance testing
- » RE: RAC on OCFS2 acceptance testing
- » RAC on OCFS2 acceptance testing
- » RE: RAC on OCFS2 acceptance testing
- » RE: RAC on OCFS2 acceptance testing
- » RE: RAC on OCFS2 acceptance testing
- » RE: RAC on OCFS2 acceptance testing
- » Re: RAC on OCFS2 acceptance testing
- » Re: RAC on OCFS2 acceptance testing
And to point out that I'm not being obtuse, here is a snippet from http://oss.oracle.com/bugzilla/show_bug.cgi?id=822 : Environment: Linux x86-64 Redhat 4.0 Update 3 OCFS2 1.2.3 3-node cluster. Problem:After installation, created two filesystems to be used for software. To limit timeout problems, increased the O2CB_HEARTBEAT_THRESHOLD TO
31. During maintenance window, decided to use the OCFS2 filesystem to store a large backup file (about 5-10 gig file). SCP'ed the file from an outside server to node1 of the cluster using command "scp $file oracle@sachlp10:/ocfs2_fs1/. After a few minutes, node1 crashed. Did not find error messages on node1, but found them in /var/log/messages on node2: ...wow, sounds like a pretty aggressive workload, right? -- http://www.freelists.org/webpage/oracle-l
- Re: RAC on OCFS2 acceptance testing
- From: Mladen Gogala
- RE: RAC on OCFS2 acceptance testing
- From: Kevin Closson