Re: High "global cache blocks lost" statistics on one RAC node

From: Anand Rao <panandrao@xxxxxxxxx>
To: racdba@xxxxxxxxxxxxx
Date: Thu, 8 Sep 2005 10:55:56 +1000
Hi Amir,

I have personally configured UDP for a 2-node E15K cluster on Solaris
9 and Apps 11.5.9.

we used 2 cards and load balanced them. They were are also a $$$
company with millions of $$$ at stake. They are happy now and not
complaining.

If you want the best in the business, then i suggest you change your
platform to Tru64 and RDG is miles ahead of anyone. Memory Channel is
absoultely charming and flies.

I don't think you can do that overnight ..can you !?!?!!

So, UDP can be used, but as you have already said, You need to present
the case to your management. There are some caveats in using 2 cards.
you need to speak to Oracle Support before that.

A VOS call is good, i think they can come out with some good tuning suggestions.

cheers
anand

On 08/09/05, Hameed, Amir <Amir.Hameed@xxxxxxxxx> wrote:
> This is a mission critical application and processes millions of $$ of
> revenue and I am a little hesitant in going with UDP because of its
> not-so-reliable reputation. I may escalate this with VOS tomorrow and
> see what they have to say.
> 
> Amir
> -----Original Message-----
> From: racdba-bounce@xxxxxxxxxxxxx [mailto:racdba-bounce@xxxxxxxxxxxxx]
> On Behalf Of Anand Rao
> Sent: Wednesday, September 07, 2005 8:43 PM
> To: racdba@xxxxxxxxxxxxx
> Subject: Re: High "global cache blocks lost" statistics on one RAC node
> 
> Hi,
> 
> you need to speak to your sysadmin or Veritas support to check for the
> buffer parameters on LLT and increase them appropriately. i am sure
> there are some recommendations from their side for RAC.
> 
> and yes, as Gopal has mentioned, you are better off on UDP. customers
> and Oracle have found out (the hard way) that UDP, though labeled as
> 'Unreliable Datagram Protocol' has worked well in many cases. You need
> to tune it properly.
> 
> I would love to go into a discussion of User-mode protocols, etc.
> Having said that UDP is ok for use, it is still susceptible to dropped
> packets and buffer overflows when overloaded, so you have to tune it
> correctly to get the best out of UDP.
> 
> cheers
> anand
> 
> On 08/09/05, Hameed, Amir <Amir.Hameed@xxxxxxxxx> wrote:
> > This is the same 15K pair and the OS is Solaris 8. We are using
> Veritas
> > DBE/AC and it uses its own protocol (LMX/LLT) for communication. The
> > following is the output from "oradebug ipc".
> >
> > VCSIPC DCTX ===================================================
> > VCSIPC DCTX
> >         VERITAS IPC 3.5MP3 11:27:01 Apr  2 2004
> > VCSIPC DCTX[ ctx.g 0x1033e0958.0x1033e0958, at 0, lif
> > 0xffffffff7f153c90, lptyp 0, iocr 0, sverr 0[]
> > VCSIPC DCTX: hist
> > VCSIPC DCTX: LMX: inst 274, gen 1756; crt 2649
> > VCSIPC DCTX: tflg INIT, flg INIT, lmxfd 10, cbk 0x1029503e0
> 0x10331a208
> > VCSIPC DCTX: last genno: cnno 0, pino 0, rgnno 1, bino 0
> > VCSIPC DCTX: bufinit: q 0x0, tp 0x0, cnt 0
> > VCSIPC   dlmx: crt 2649 oracleracvt1N2
> > VCSIPC   dlmx: thr: enabled 1, fd 11, ioc: cnt 0, ret 0, err 0, stk:
> sz
> > 8192, base 0x103404000, tid 2
> > VCSIPC   dlpl[ ctx 0x1033e0958
> > VCSIPC   dlpl: tout 0, flg 0, nd 0.0x1033e0a38, ptidp 0x0, rgn 0, req
> > 0x0, doneq 0x0, biq 0x0
> > VCSIPC   dlpl: last LMX rqh in poll struct: NULL
> > VCSIPC   dlpl: LMX doneq: EMPTY
> > VCSIPC   dlpl: ctx: biq 0x0, tp 0x0, cnt 0
> > VCSIPC   dlpl]
> > VCSIPC DDON: == done queue: EMPTY
> > VCSIPC DCNH: == cnh  queue: EMPTY
> > VCSIPC DWIR: == wire queue: EMPTY
> > VCSIPC DPT : == port queue: EMPTY
> > VCSIPC DRGN: == rgn  queue: begin
> > VCSIPC   drgn[ rgn 0x10341f390 ACTV, no 1.1872, base 0x3800b6000, sz
> > 3841236992
> > VCSIPC   drgn: -- allocated rbds: EMPTY
> > VCSIPC   drgn: -- free rbds: total 128
> > VCSIPC   drgn]
> > VCSIPC DRGN: == rgn  queue: end: total = 1
> > VCSIPC DCTX] ctx 0x1033e0958
> > VCSIPC DCTX ===================================================
> >
> >
> >
> > Thank you
> > Amir
> >
> > -----Original Message-----
> > From: racdba-bounce@xxxxxxxxxxxxx [mailto:racdba-bounce@xxxxxxxxxxxxx]
> > On Behalf Of Anand Rao
> > Sent: Wednesday, September 07, 2005 7:27 PM
> > To: racdba@xxxxxxxxxxxxx
> > Subject: Re: High "global cache blocks lost" statistics on one RAC
> node
> >
> > Is this the same Sun E15K?
> >
> > If using UDP, check the values of
> >
> > udp_xmit_hiwat
> > udp_recv_hiwat
> >
> > usually, the default value is not enough for large
> > machines/applications, especially 11i. You need to increase these
> > values. I believe you cannot set udp_xmit_hiwat and udp_recv_hiwat
> > above 64K. Moreover, tcp receive buffer is 64K, so no point in setting
> > a large value at the higher level (udp). Need to check this on
> > Solaris.
> >
> > increase the value of udp_max_buf to 512K or a max of 1MB if you are
> > really seeing a lot of packet drops/Socket Buffer overflows at the OS
> > level. Talk to your sysadmin before changing these values. I have
> > never seen any sysadmin readily agree to change these values.
> >
> > i think you are nearing the end of your free service from racdba my
> > friend :-)
> >
> > anand
> >
> > On 08/09/05, Ravi_Kulkarni@xxxxxxxx <Ravi_Kulkarni@xxxxxxxx> wrote:
> > >
> > > Amir,
> > >
> > > Does your netstat -i list any non-zero values for RX-ERR / TX-ERR
> for
> > any of
> > > the nodes ?
> > >
> > > Also netstat -s for UDP should have negligible errors. Check both
> > netstat -i
> > > & -s for BOTH instances. (Switches may vary with your *nix.
> Following
> > is on
> > > Linux)
> > >
> > > $ netstat -i
> > > Kernel Interface table
> > > Iface       MTU Met    RX-OK RX-ERR RX-DRP RX-OVR    TX-OK TX-ERR
> > TX-DRP
> > > TX-OVR Flg
> > > bond1      1500   0 16359630      0      0      0 16963867      0
> > 0
> > >  0 BMmRU
> > >
> > > $netstat -s
> > > ..
> > > Udp:
> > >     12577981 packets received
> > >     162 packets to unknown port received.
> > >     0 packet receive errors
> > >     12466995 packets sent
> > > ..
> > >
> > >
> > > Thanks,
> > > Ravi.
> > >
> > >
> > >  ________________________________
> > >  From: racdba-bounce@xxxxxxxxxxxxx
> > [mailto:racdba-bounce@xxxxxxxxxxxxx] On
> > > Behalf Of Hameed, Amir
> > > Sent: Wednesday, September 07, 2005 3:38 PM
> > > To: racdba@xxxxxxxxxxxxx; oracle-l@xxxxxxxxxxxxx
> > > Subject: High "global cache blocks lost" statistics on one RAC node
> > >
> > >
> > >
> > >
> > > I have a two-node RAC running an 11i-ebusiness suite (11.5.9/9.2.0.6
> > > 64-bit).  I am seeing the following from Statspack:
> > >
> > > Inst    Statistic
> > >   Total     per Second    per Trans
> > > -----   ----------------------------------------
> > > ------------------ -------------- ------------
> > > 1.      global cache blocks lost
> > >      28            0.0          0.0
> > > 2.      global cache blocks lost
> > >   4,410            2.7          0.9
> > >
> > > We are using three giga-bit private interconnects going through a
> > switch.
> > > This does not see like an interconnect issue otherwise, I think, I
> > would
> > > have seen a higher number on the 1 node as well.
> > >
> > > Any idea what might be causing it?
> > >
> > > Thanks
> > > Amir
> >
> >
> >
> 
> 
>
References:
- RE: High "global cache blocks lost" statistics on one RAC node
  - From: Hameed, Amir
Re: High "global cache blocks lost" statistics on one RAC node

Other related posts: