RE: High "global cache blocks lost" statistics on one RAC node

  • From: "Hameed, Amir" <Amir.Hameed@xxxxxxxxx>
  • To: <racdba@xxxxxxxxxxxxx>
  • Date: Wed, 7 Sep 2005 21:41:29 -0400

Hi Anand,
Xerox is primarily a Sun/Oracle shop and deviating from Sun is not an
option and is also a decision that was made at the top level a long time
ago. When we started this project, the goal was to get to RAC in two
steps; 1) to implement HA 2) and then RAC. When I was looking into
choosing a cluster product, my primary pick was Sun Cluster and that was
mainly because of their RSM interconnect technology, which is lightening
fast. But there were a few things that I did not like abut Sun; 1)
Oracle had not certified their GFS for RAC and he had to choose RAW if
we were to go with Sun 2) I became aware that there were hardly a few
RSM implementations out in the industry and it was mostly being used in
the labs. So, Veritas was an obvious choice, even though we could also
have looked into Polyserve but we have good relations with Veritas and
we are quite satisfied with their service.
The RAC project has already been delayed by a few months and we are
trying to get it in by Jan-Feb next year and also want to make sure that
it does not impact the existing transaction response time because just
like other organizations, we have elements within our organization who
have been against RAC from the get-go.
So, we have to impalement RAC and then we can say these big irons from
Sun bye-bye and go to smaller servers v890s, etc and scale horizontally.
If my subscription to the DL has expired then let me know how to renew
it :-)

Amir
-----Original Message-----
From: racdba-bounce@xxxxxxxxxxxxx [mailto:racdba-bounce@xxxxxxxxxxxxx]
On Behalf Of Anand Rao
Sent: Wednesday, September 07, 2005 8:56 PM
To: racdba@xxxxxxxxxxxxx
Subject: Re: High "global cache blocks lost" statistics on one RAC node

Hi Amir,

I have personally configured UDP for a 2-node E15K cluster on Solaris
9 and Apps 11.5.9.

we used 2 cards and load balanced them. They were are also a $$$
company with millions of $$$ at stake. They are happy now and not
complaining.

If you want the best in the business, then i suggest you change your
platform to Tru64 and RDG is miles ahead of anyone. Memory Channel is
absoultely charming and flies.

I don't think you can do that overnight ..can you !?!?!!

So, UDP can be used, but as you have already said, You need to present
the case to your management. There are some caveats in using 2 cards.
you need to speak to Oracle Support before that.

A VOS call is good, i think they can come out with some good tuning
suggestions.

cheers
anand

On 08/09/05, Hameed, Amir <Amir.Hameed@xxxxxxxxx> wrote:
> This is a mission critical application and processes millions of $$ of
> revenue and I am a little hesitant in going with UDP because of its
> not-so-reliable reputation. I may escalate this with VOS tomorrow and
> see what they have to say.
> 
> Amir
> -----Original Message-----
> From: racdba-bounce@xxxxxxxxxxxxx [mailto:racdba-bounce@xxxxxxxxxxxxx]
> On Behalf Of Anand Rao
> Sent: Wednesday, September 07, 2005 8:43 PM
> To: racdba@xxxxxxxxxxxxx
> Subject: Re: High "global cache blocks lost" statistics on one RAC
node
> 
> Hi,
> 
> you need to speak to your sysadmin or Veritas support to check for the
> buffer parameters on LLT and increase them appropriately. i am sure
> there are some recommendations from their side for RAC.
> 
> and yes, as Gopal has mentioned, you are better off on UDP. customers
> and Oracle have found out (the hard way) that UDP, though labeled as
> 'Unreliable Datagram Protocol' has worked well in many cases. You need
> to tune it properly.
> 
> I would love to go into a discussion of User-mode protocols, etc.
> Having said that UDP is ok for use, it is still susceptible to dropped
> packets and buffer overflows when overloaded, so you have to tune it
> correctly to get the best out of UDP.
> 
> cheers
> anand
> 
> On 08/09/05, Hameed, Amir <Amir.Hameed@xxxxxxxxx> wrote:
> > This is the same 15K pair and the OS is Solaris 8. We are using
> Veritas
> > DBE/AC and it uses its own protocol (LMX/LLT) for communication. The
> > following is the output from "oradebug ipc".
> >
> > VCSIPC DCTX ===================================================
> > VCSIPC DCTX
> >         VERITAS IPC 3.5MP3 11:27:01 Apr  2 2004
> > VCSIPC DCTX[ ctx.g 0x1033e0958.0x1033e0958, at 0, lif
> > 0xffffffff7f153c90, lptyp 0, iocr 0, sverr 0[]
> > VCSIPC DCTX: hist
> > VCSIPC DCTX: LMX: inst 274, gen 1756; crt 2649
> > VCSIPC DCTX: tflg INIT, flg INIT, lmxfd 10, cbk 0x1029503e0
> 0x10331a208
> > VCSIPC DCTX: last genno: cnno 0, pino 0, rgnno 1, bino 0
> > VCSIPC DCTX: bufinit: q 0x0, tp 0x0, cnt 0
> > VCSIPC   dlmx: crt 2649 oracleracvt1N2
> > VCSIPC   dlmx: thr: enabled 1, fd 11, ioc: cnt 0, ret 0, err 0, stk:
> sz
> > 8192, base 0x103404000, tid 2
> > VCSIPC   dlpl[ ctx 0x1033e0958
> > VCSIPC   dlpl: tout 0, flg 0, nd 0.0x1033e0a38, ptidp 0x0, rgn 0,
req
> > 0x0, doneq 0x0, biq 0x0
> > VCSIPC   dlpl: last LMX rqh in poll struct: NULL
> > VCSIPC   dlpl: LMX doneq: EMPTY
> > VCSIPC   dlpl: ctx: biq 0x0, tp 0x0, cnt 0
> > VCSIPC   dlpl]
> > VCSIPC DDON: == done queue: EMPTY
> > VCSIPC DCNH: == cnh  queue: EMPTY
> > VCSIPC DWIR: == wire queue: EMPTY
> > VCSIPC DPT : == port queue: EMPTY
> > VCSIPC DRGN: == rgn  queue: begin
> > VCSIPC   drgn[ rgn 0x10341f390 ACTV, no 1.1872, base 0x3800b6000, sz
> > 3841236992
> > VCSIPC   drgn: -- allocated rbds: EMPTY
> > VCSIPC   drgn: -- free rbds: total 128
> > VCSIPC   drgn]
> > VCSIPC DRGN: == rgn  queue: end: total = 1
> > VCSIPC DCTX] ctx 0x1033e0958
> > VCSIPC DCTX ===================================================
> >
> >
> >
> > Thank you
> > Amir
> >
> > -----Original Message-----
> > From: racdba-bounce@xxxxxxxxxxxxx
[mailto:racdba-bounce@xxxxxxxxxxxxx]
> > On Behalf Of Anand Rao
> > Sent: Wednesday, September 07, 2005 7:27 PM
> > To: racdba@xxxxxxxxxxxxx
> > Subject: Re: High "global cache blocks lost" statistics on one RAC
> node
> >
> > Is this the same Sun E15K?
> >
> > If using UDP, check the values of
> >
> > udp_xmit_hiwat
> > udp_recv_hiwat
> >
> > usually, the default value is not enough for large
> > machines/applications, especially 11i. You need to increase these
> > values. I believe you cannot set udp_xmit_hiwat and udp_recv_hiwat
> > above 64K. Moreover, tcp receive buffer is 64K, so no point in
setting
> > a large value at the higher level (udp). Need to check this on
> > Solaris.
> >
> > increase the value of udp_max_buf to 512K or a max of 1MB if you are
> > really seeing a lot of packet drops/Socket Buffer overflows at the
OS
> > level. Talk to your sysadmin before changing these values. I have
> > never seen any sysadmin readily agree to change these values.
> >
> > i think you are nearing the end of your free service from racdba my
> > friend :-)
> >
> > anand
> >
> > On 08/09/05, Ravi_Kulkarni@xxxxxxxx <Ravi_Kulkarni@xxxxxxxx> wrote:
> > >
> > > Amir,
> > >
> > > Does your netstat -i list any non-zero values for RX-ERR / TX-ERR
> for
> > any of
> > > the nodes ?
> > >
> > > Also netstat -s for UDP should have negligible errors. Check both
> > netstat -i
> > > & -s for BOTH instances. (Switches may vary with your *nix.
> Following
> > is on
> > > Linux)
> > >
> > > $ netstat -i
> > > Kernel Interface table
> > > Iface       MTU Met    RX-OK RX-ERR RX-DRP RX-OVR    TX-OK TX-ERR
> > TX-DRP
> > > TX-OVR Flg
> > > bond1      1500   0 16359630      0      0      0 16963867      0
> > 0
> > >  0 BMmRU
> > >
> > > $netstat -s
> > > ..
> > > Udp:
> > >     12577981 packets received
> > >     162 packets to unknown port received.
> > >     0 packet receive errors
> > >     12466995 packets sent
> > > ..
> > >
> > >
> > > Thanks,
> > > Ravi.
> > >
> > >
> > >  ________________________________
> > >  From: racdba-bounce@xxxxxxxxxxxxx
> > [mailto:racdba-bounce@xxxxxxxxxxxxx] On
> > > Behalf Of Hameed, Amir
> > > Sent: Wednesday, September 07, 2005 3:38 PM
> > > To: racdba@xxxxxxxxxxxxx; oracle-l@xxxxxxxxxxxxx
> > > Subject: High "global cache blocks lost" statistics on one RAC
node
> > >
> > >
> > >
> > >
> > > I have a two-node RAC running an 11i-ebusiness suite
(11.5.9/9.2.0.6
> > > 64-bit).  I am seeing the following from Statspack:
> > >
> > > Inst    Statistic
> > >   Total     per Second    per Trans
> > > -----   ----------------------------------------
> > > ------------------ -------------- ------------
> > > 1.      global cache blocks lost
> > >      28            0.0          0.0
> > > 2.      global cache blocks lost
> > >   4,410            2.7          0.9
> > >
> > > We are using three giga-bit private interconnects going through a
> > switch.
> > > This does not see like an interconnect issue otherwise, I think, I
> > would
> > > have seen a higher number on the 1 node as well.
> > >
> > > Any idea what might be causing it?
> > >
> > > Thanks
> > > Amir
> >
> >
> >
> 
> 
>


Other related posts: