Re: Oracle RAC - Any formula to calculate/preempt the right number of LMS processes?

  • From: PD Malik <pdthedba@xxxxxxxxx>
  • To: Riyaj Shamsudeen <riyaj.shamsudeen@xxxxxxxxx>, kaygopal@xxxxxxxxx, Oracle-L Group <oracle-l@xxxxxxxxxxxxx>, rshamsud@xxxxxxxxxxxxxxxx
  • Date: Wed, 22 Jun 2011 12:52:24 +0100

Riyaj & Gopal - Thanks for responding to my query.

Riyaj Bhai - You are an expert - no denying there - for yourself you may
have the entire ocean to cover for people like me who are still learning to
swim - you are an expert :-)

Just to answer your & Gopal's Que on RT to start with : We are on 10gR2 on
AIX 6 and LMS is in _high_priority_processes and _os_sched_high_priority  is
also set to default 1 so I'd assume that it is set on high priority.

Gopal, Is this somehow can be done in AIX as well pls do you know?
"One of the most common practices is to schedule the number of RT processes
< # of CPUs to avoid spin/hang on CPUs."
and also how to check whether these settings are working fine or not?

A bit more explanation of our CPU behviour (very broadly) - We've seen that
in past some months our CPU usage has gone very high compared to the
incremental user load on the system and our capacity management departement
has highlighted LMS being one of the top consumers (amongst other
candidates). Its not that it always used to behave earlier no apparanely its
always been bad and has come to our attention now because of this CPU issue.

While also looking at other things it seems that this LMS setting is anyway
too high and looks like Oracle is 'correcting' it in 11g  so this is
something we shuold look at doing anyway.

I had a bit of google digging on this CR Fabrication lead that you
provided (Thanks to Gopal and his answer he posted in one of the forums)
and this is probably you meant :

SQL> select inst_id, cr_requests, light_works, data_requests,
fairness_down_converts, (fairness_down_converts/data_requests)*100
   INST_ID CR_REQUESTS LIGHT_WORKS DATA_REQUESTS FAIRNESS_DOWN_CONVERTS
RATIO
---------- ----------- ----------- ------------- ----------------------
----------
         2    58831118    74411475      58831114                9720672
16.523012
         1   350485652   884185071     350485608               59529981
16.9850001
         6    14175101    20025830      14175101                2019579
14.2473694
         4    13457882    19142318      13457880                1960022
14.5641215
         3   440964324   909842437     440964274               79782554
18.0927478

And the ratio looks OK to me (less than 40% - even though aggregated but
still indicative I think) and it seems there isn't much of CR Fabrication
going on BUT I may be looking totally at wrong thing  and if thats the case
then I'd request if you can point me to any writeups/whitepaperts etc. that
I can go thru to understand it a bit better - that'll be a great help.

I know, to know RAC I'll need to read Gopal's book which is already in my To
Do list but unfortunately thats not an overnight thing.

So our current approach is take these new settings (5 most probably) thru a
Performance Test and see how these settings are bahiving by looking and gc
waits and stats. I know its not the best approch as its 'trying to fix'
without getting to the root cause (due to lack of total mastery in this
area) but hopefully a performance test will give us a good confidence. Any
comments about our approach pls?

The only other thing that I've been thinking is that in 11g as you know
remastering is at file level instead on object level so is it that this file
block remastering is more 'efficient' than object remastering hence Oracle
has massively downsized the formula to calculate the default number of LMS?
This along with other optimizations in building/pinning/sending a block etc.
in 11g might have reduced the pressure on LMS hence the lower number may be
OK in 11g but it may not be a good idea in 10g yet? Just thinking ...

Sorry this email has gone a bit long ..

Thanks.


On Tue, Jun 21, 2011 at 5:14 PM, Riyaj Shamsudeen <
riyaj.shamsudeen@xxxxxxxxx> wrote:

> Hello Malik
>
>   I am no RAC expert, seems like a ocean more to learn.
>
>   I would agree with KG's comments with one enhancement. I would go with an
> odd number, 3 or 5, prior to 11g version. There is an internal bug (10.2)
> where the distribution of workload is non-uniform if # of LMS processes is
> an even number. ( I know, it is weird, but I don't have any more details,
> can't reproduce it, a RAC developer told me this few years ago, and seems
> like it is worth a risk to take making it a prime number. This comment does
> not apply to 11g and above. )
>
>  As you have figured out, 11.2 by default, sets this parameter to more
> meaningful value. If you are looking for an algorithm, why not use 11.2
> default algorithm, then? In this case, for 80 CPUs, formula will be
> 2+(80/32)= 4.5, round that up to 5.So, set gcs_server_processes to 5.
>
>
> http://download.oracle.com/docs/cd/E14072_01/server.112/e10820/initparams088.htm
>
>   Remember that just because LMS is running in RT priority (Solaris lingo)
> doesn't mean that it consumes CPU. RT simply bumps the LMS process to the
> front of the CPU run queue for immediate scheduling, if LMS needs to use
> CPU. You should also review to understand why LMS consuming CPU? CR
> fabrication or Current mode buffer creation (most common reasons) or is it
> due to some other reason?
>
> Cheers
>
> Riyaj Shamsudeen
> Principal DBA,
> Ora!nternals -  http://www.orainternals.com - Specialists in Performance,
> RAC and EBS11i
> Blog: http://orainternals.wordpress.com
> OakTable member http://www.oaktable.com
>
> Co-author of the books: Expert Oracle 
> Practices<http://tinyurl.com/book-expert-oracle-practices/>,
> Pro Oracle SQL,  Expert PL/SQL 
> Practices<http://tinyurl.com/book-expert-plsql-practices>
>
>
>
> On Tue, Jun 21, 2011 at 10:02 AM, K Gopalakrishnan <kaygopal@xxxxxxxxx>wrote:
>
>> Malik,
>>
>> Are you running them in real time priority? If yes, you can reduce it to 4
>> or (# of instances-1)  and monitor the latencies of the RAC wait events/CPU
>> usage of the LMS processes.  AFAIK there is no magic formula for
>>  calculating the # of LMSs.
>>
>> One of the most common practices is to schedule the number of RT processes
>> < # of CPUs to avoid spin/hang on CPUs.  If your current settings works fine
>> for your workload, I would stick to it.
>>
>> -Gopal
>>
>>
>> On Tue, Jun 21, 2011 at 6:02 AM, PD Malik <pdthedba@xxxxxxxxx> wrote:
>>
>>> Hello,
>>>
>>> We are on 10gR2 5 node RAC (a huge and very busy DB) and also very high
>>> number of CPUs (80ish) hence the default number of LMS processes that Oracle
>>> generates for us (num_cpu dependant) is way too high and is causing CPU
>>> consumption issue (due to their high priority) for some time now so we need
>>> to look to reduce (and fix) them to a sensible number and going by RAC
>>> expert Riyaz's advice we shud reduce them down to 4 (one for each 'other'
>>> RAC node) which makes sense to us particularly becs if we were going 11g and
>>> left this parameter to Oracle's default calculation it'll come down to 3.
>>> (Current number of LMS processes : (Due to the varied number of CPUs ) - On
>>> 2 nodes, 10, on 1 12 and on the other 2 : 8).
>>>
>>> Moreover, we've done the analysis and there are no major RAC wait events
>>> (as in we are not much suffering from RAC) and we "think" we should be fine
>>> by reducing these and obviously we'll take them thru to Performance Test to
>>> gain some confidence.
>>>
>>> However the question is, is there any other mathematical way of getting
>>> to just the right number of LMS processes for a RAC instance pls? Is there
>>> any formula to predict them or even to predict if there will be any latency
>>> impact by reducing these?
>>>
>>> Thanks in advance.
>>>
>>
>>
>

Other related posts: