Re: When to run batch in parallel?

  • From: Charlotte Hammond <charlottejanehammond@xxxxxxxxx>
  • To: Daniel Fink <daniel.fink@xxxxxxxxxxxxxx>, oracle-l@xxxxxxxxxxxxx
  • Date: Wed, 26 May 2010 12:23:03 -0700 (PDT)

Thanks Daniel,
I guess my problem boils down to: how do I know if the SAN has spare capacity? 
This is shared hardware being hit by all sorts of other servers doing 
mysterious stuff and data cacheing characteristics can change drastically 
depending what order things are run in.
We actually get our best results when parallel sessions are hitting the SAME 
data, presumably as they are getting it from the buffer/SAN caches rather going 
off to fight for yet more I/O.
Given very limited resource for empirical testing / benchmarking, is there any 
reasonable way to say if we can add more job streams or if the SAN is already 
maxed out?
Thanks!
Charlotte



----- Original Message ----
From: Daniel Fink <daniel.fink@xxxxxxxxxxxxxx>
To: charlottejanehammond@xxxxxxxxx
Cc: oracle-l@xxxxxxxxxxxxx
Sent: Mon, May 24, 2010 11:29:54 PM
Subject: Re: When to run batch in parallel?

I would start with reading Doug Burns' papers on Parallelism 
(http://oracledoug.com/serendipity/index.php?/archives/835-PX-and-the-Magic-of-2.html)

Just because you are waiting on I/O does not necessarily mean that the I/O 
subsystem is the bottleneck. What is the average duration and duration spread 
for the reads? If the duration is small and the SAN is not maxing out it's 
throughput, then it may have unused capacity that can be used for parallel 
processing.

Are the underlying tables and data parallel 'friendly'? If you are running in 
parallel, but each process is wanting to read the same data, you can induce 
contention waits (read by other session). If each parallel process is accessing 
different data/index blocks, you can reduce the likelihood of contention. 
Better still if the underlying objects are partitions and the parallel 
processes read different partitions.

This sounds like a great opportunity to benchmark and test the changes. Test, 
trace, profile, report.


Charlotte Hammond wrote:
> We've got a long running batch job.  Tracing shows 95% of the time is I/O 
> waits (mostly scattered reads steming from hash joins) against our SAN.  Our 
> architect wants to split it into several parallel streams but I'm dubious.  
> Surely the SAN won't serve up data any faster just because we're running in 
> parallel.
> 
> Is there any benefit in this strategy?  I would expect parallelism to come 
> into play if there's significant non-I/O time (e.g. busy on CPU) where other 
> sessions could take advantage of the burstiness, or if the database was 
> accessing multiple disks directly, not virtualised over a big SAN cache.  I 
> reckon we just need a faster SAN (and/or some SQL tuning).
> 
> Any comments?
> 
> Many thanks
> Charlotte
> 
> 
>      
> --
> //www.freelists.org/webpage/oracle-l
> 
> 
> 
>  


      

--
//www.freelists.org/webpage/oracle-l


Other related posts: