RE: batch process runs slower and slower over time

  • From: "Mark W. Farnham" <mwf@xxxxxxxx>
  • To: <exriscer@xxxxxxxxx>, "'Oracle Mailinglist'" <oracle-l@xxxxxxxxxxxxx>
  • Date: Thu, 7 Jun 2012 15:54:46 -0400

I've read this thread with interest. Most of the suggestions have focused on
what could your PL/SQL be doing that would degrade overtime or what "leak"
type bugs there might be and how you could diagnose them.

That is all good stuff.

I do note, however, that you actually asked "does anyone know how to table
this sort of problem...:-?"

YES. The operational solution, to get this off your plate as a production
issue while you proceed with scientific investigation to your heart's desire
is to observe that you process about 1800 customers in the first hour.
If the operational set-up to run the loop is relatively cheap, I'd have the
process bail and resubmit itself every 1000 customers until a submission
finds no work to do.
If the operational set-up to run the loop is expensive enough to care about
make that 1800.
If there are no update dependency conflicts, you might consider a pre-job
query to establish the rough customer (id number? alpha range id?) range
conditions such that you might run 10 copies of this in parallel (not
parallel degree, parallel jobs) such that each of the jobs does roughly 10%
of the work, each at the "fast" rate and you get the entire job done in
about 33 minutes. (Well, that's if there is no de-scaling effect. Still, I'd
expect the whole thing to be done in less than an hour that way.)

This will NOT diagnose what the technology complex failure is, but changes
the operation to something that probably will work.

Even if there are concurrency issues per customer so you cannot do the
disjoint customer ranges in parallel, running smaller chunks head to tail
should achieve your objective.

Regards,

mwf

-----Original Message-----
From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx]
On Behalf Of Ls Cheng
Sent: Thursday, May 17, 2012 2:49 PM
To: Oracle Mailinglist
Subject: batch process runs slower and slower over time

Hi all
My developers have a pretty complex batch process (written entirely in
PL/SQL packages & functions) running in 10.2.0.5 RAC database (Solaris 10),
this process runs in around 12 hours time, we have noticed that the process
gets slower and slower over time.

This process treats customers information, roughly 10000 customers, during
the first hour it takes around 2 seconds per customer, after 1 hour it tajes
3, after 2 hours 4 seconds and so on, the funny thing is that if we abort
the process and restart again the elapsed time per customer drops again to 2
seconds but after 1 hour it starts increasing again. I have add debug
information, after each customer treatment we capture v$sesstat and
v$session_event to see what statistics or wait event is increasing when the
elapsed time goes up, the only statistics which increases over time is "CPU
used by this session", basically the process is burning CPU, the rest of
statistics doesnt not vary.

It is very puzzling and I cannot find more information to debug, does anyone
know how to tackle this sort of problem... :-?


TIA

PD: I ran dtrace (http://www.brendangregg.com/DTrace/procsystime) in the
last test run but the only call which had sustantial variation between a
fast execution and slow execution is pollsys


--
//www.freelists.org/webpage/oracle-l


--
//www.freelists.org/webpage/oracle-l


Other related posts: