Re: Scalability fo Batch Operations in RAC

  • From: Tim Gorman <tim@xxxxxxxxx>
  • To: oracle-l <oracle-l@xxxxxxxxxxxxx>
  • Date: Fri, 06 Jan 2006 10:03:06 -0700

Sriram,

Please remember than cache contention (i.e. ³global cache services² or GCS)
is only half of the synchronization issue with RAC.  The other half is
³global enqueue services² or GES, and there is absolutely no way to
³partition² that activity.

Essentially, if the batch operations in question perform frequent
acquisition/release of transaction locks (i.e. TX and TM), then your
scalability problems may be (at least partially) blamed on GES.  For a quick
scratch-n-sniff test, see if the algorithm of the batch process something
like:

    open main_cursor;
    loop
        fetch from main_cursor;
        <do-some-INSERT-UPDATE-DELETEs>
        commit;
    end loop;
    close main_cursor;

The significant point is that the COMMIT statement is occuring on each
iteration of the loop.  If so, Oracle will be acquiring TX and TM enqueues
with each iteration of the loop, implicitly acquired with the first DML
statement, and then releasing those same enqueues with the explicit COMMIT.
So, if your batch processes are iterating 1,000,000 times in the
³main_cursor² loop, then you¹ll be performing millions and millions of
global enqueue acquisitions and releases (i.e. 1,000,000 ³TX² acquisitions,
1,000,000 ³TX² releases, and several million ³TM² acquisitions and
releases).  This is a very ³RAC-unfriendly² program design, in essence.
Unfortunately, there are many good reasons, stemming from program
requirements, why this design (committing frequently) is necessary.
However, if the algorithm of the batch processes look more like:

    open main_cursor;
    loop
        fetch from main_cursor;
        <do-some-INSERT-UPDATE-DELETEs>
    end loop;
    close main_cursor;
    commit;

Then, Oracle will be acquiring TX and TM enqueues only at the first DML
statement(s) of the first iteration of the ³main_cursor² loop, and then
releasing those same TM and TX enqueues at the COMMIT.  So, we¹re talking
about only a handful of global enqueue operations, as opposed to millions.
This is a very ³RAC-friendly² program design, but it may not accommodate the
needs of the application, unfortunately, in many cases.

The conventional wisdom is that more frequent commits improve performance ?
the reality is the opposite.  This is true in non-RAC as well as RAC, but as
has been stated frequently in the past, ³RAC exaggerates².

It is very easy to determine if GES is a significant contributor to your
scalability problem ? just COMMIT less frequently.  If you can, COMMIT only
once at the close of the loop, but if not, then try COMMITing less
frequently, if possible.

The good news is, if application logic permits, it is far easier to minimize
GES scalability problems than it is to minimize GCS scalability problems.
Just COMMIT less frequently.

Hope this helps...

-Tim


on 12/26/05 11:53 AM, Sriram Kumar at k.sriramkumar@xxxxxxxxx wrote:

> Hi Mark,
>  
> Thanks for your insight. Presently we are not using partioning as an option
> but I will thinkover your suggestion and discuss with the Development team to
> see if this approach can be followed,
>  
> Thanks again for the insight
>  
> Best Regards
>  
> Sriram Kumar
> 
>  
> On 12/26/05, Mark W. Farnham <mwf@xxxxxxxx> wrote:
>>  
>> Your traces should tell you a lot. Of course the RA in RAC stands for Real
>> Applications with at least a marketing meaning that you don't have to
>> re-write your applications to run on multiple nodes.
>>  
>> However, if you take a batch application that is threading through blocks in
>> search of rows to work on that are intertwined with likely identical plans,
>> those 4 ms block shuttles, even though pretty doggone fast,
>> are likely the extra overhead. You might also have  "consistent write" (see
>> Tom Kyte) problems in running the job in parallel that are exacerbated by
>> bringing a second node into the equation. Hmm, let's see - twice as much
>> horsepower and the job takes 3 halves the previous elapsed time. That is
>> pretty bad scaling. If adding 3rd, 4th, ... nth nodes tracks, then you can
>> use half-life equations to see how many nodes you have to add to make the job
>> asymptotic to forever.
>>  
>> So while you don't have to re-write the application to make it continue to
>> function logically correctly in RAC, maybe you can come up with a reasonable
>> way to make the 2 (or n) nodes at least start out after
>> different blocks to begin with. If there is a natural way to partition or
>> cluster for the batch job so that threads on node 1 naturally go after one
>> set of blocks and threads on node 2 naturally go after a different set of
>> blocks, then you are more likely to reduce the overall elasped time by
>> keeping inter node traffic light. Even the overhead of assembling disparate
>> lists of blocks to process into intermediate "select against" tables and
>> adding the node's block candidate list to the query might work out in your
>> favor if partitioning is not a reasonable solution. Do please remember that
>> disjoint sets of rows are NOT good enough and that you need disjoint sets of
>> blocks to avoid internode dependencies.
>>  
>> Good luck.
>>  
>> mwf
>>> -----Original Message-----
>>> From: oracle-l-bounce@xxxxxxxxxxxxx  <mailto:oracle-l-bounce@xxxxxxxxxxxxx>
>>> [mailto:oracle-l-bounce@xxxxxxxxxxxxx]On Behalf Of Sriram Kumar
>>> Sent: Monday, December 26, 2005 10:50 AM
>>> To: ORACLE-L
>>> Subject: Scalability fo Batch Operations in RAC
>>> 
>>> Hi Gurus,
>>>  
>>>    We are running a heavy batch operation of a banking application on RAC.
>>> Say the configuration is a 2 node 4 CPU Itanium2 box running 9.2.0.4
>>> <http://9.2.0.4/> .
>>>  
>>>   We have batch operation that can be run in parallel. What I am noticing is
>>> that when I run all the batch operations in node 1, it gets over in 2 hours
>>> . During this time, the other node is not used at all.  To optimally use the
>>> resources , I run the batches across 2 nodes, I see a degradation in
>>> performance( the elapsed time would be around 3 hours).There are no blocking
>>> locks and interconnect performance is good( block transfer time is around
>>> 4ms). I am planning to do a 10046 on both the scenarios and compare the top
>>> SQL's and waits.
>>>  
>>> I just wanted to know if any one of you had faced this issue before or is
>>> this normal?.
>>>  
>>> Best Regards
>>>  
>>> Sriram Kumar
> 
> 


Other related posts: