RE: best way to invoke parallel in DW loads?

  • From: "Herring Dave - dherri" <Dave.Herring@xxxxxxxxxx>
  • To: "neil kodner" <nkodner@xxxxxxxxx>
  • Date: Wed, 16 Sep 2009 09:26:23 -0500

Neil,

> How do you know when you have parallel_max_servers set too high  
> relative to your CPU?  Is there a certain wait event that I should be

> looking for, or should i just go off of what top/sar tell me?

Good question and unfortunately I don't have a simple answer.  You can
probably dig up recommendations for the max # of processes per CPU
specifically for your hardware, but for me it's more of throughput.
Increasing parallelism eventually tapers off in terms of elapsed time of
the operation, even degrading due to the overhead of all the processes
created and management of their messages.

If you know no one else will be running parallel operations during the
load, then I'd first determine the optimal parallelism for the load and
make sure that PARALLEL_MAX_SERVERS won't limit the load.

Throughput above also applies to having concurrent parallel processes,
pretty much the breaking point of where more concurrent parallel
processes will cause all concurrently running to run slower than if 1 or
more processes were dropped to running serially.  Kind of like having a
large batch of SQL statements that, when all of them run concurrently
(with parallelism) takes a total time of X.  Then you find out taking
out 1 or more statements and having them run without parallelism and the
overall elapsed time is Y, which is less than X.  Not sure if that makes
sense - probably instead confuses the issue.

I guess it's much different if you have 1 parallel process you care
about vs. managing many concurrent parallel ones.  If it's just 1, make
sure you find the max the server can handle.  If you've got many
concurrent ones, you now need to find out where the "elbow" is in the
curve of elapsed times.

David C. Herring  | DBA, Acxiom Database Services
 
630-944-4762 office | 630-430-5988 cell | 630-944-4989 fax
1501 Opus Pl | Downers Grove, IL, 60515 | U.S.A. | www.acxiom.com
 

-----Original Message-----
From: neil kodner [mailto:nkodner@xxxxxxxxx] 
Sent: Wednesday, September 16, 2009 8:35 AM
To: Herring Dave - dherri
Cc: oracle-l@xxxxxxxxxxxxx
Subject: Re: best way to invoke parallel in DW loads?

Thanks for the detailed response!    I should add that CPU_COUNT and  
PARALLEL_THREADS_PER_CPU were set to Oracle's defaults and that I did  
not override them.  I am experimenting with the parallel_max_servers  
values.  For what its worth, I expect our warehouse loads to run in  
parallel, serially (ugh!) meaning only one process will be running at  
a time.

To clarify, I'm not looking at improving the performance of our  
existing  9i production system.  All of the work I'm doing is in our  
10g development environment, to prepare our existing warehouse for 10g.

We are not currently using partitioning.

The types of queries we're using parallelism for are mostly INSERT /*+  
APPEND */, CTAS, and a few SELECTs with large hash joins to populate  
cursors.

We have parallel_automatic_tuning set to FALSE - maybe that's an  
issue.  I thought that since its a deprecated parameter in 10g, that I  
could leave it alone.  Maybe that's not the case.

parallel_execution_message_size is 2152, I haven't tested with any  
other values thus far.

In our 10g environment, we have full statistics computed recently with  
estimate_percent=>dbms_stats.auto_sample_size.  Our 9i production  
environment lacks statistics.  My goal is to remove add statistics,  
remove RULE hints, and make sure things run acceptably under 10g in  
preparation for a production rollout.

How do you know when you have parallel_max_servers set too high  
relative to your CPU?  Is there a certain wait event that I should be  
looking for, or should i just go off of what top/sar tell me?

Your suggestion to leave the larger segments with DEFAULT degree was  
along the lines of what I was thinking, as well as the removal of the  
parallel hints.  It sounds like I need to enable  
parallel_automatic_tuning in order to have Oracle do its thing.




On Sep 16, 2009, at 8:52 AM, Herring Dave - dherri wrote:

> I've worked with a number of DWs over the past years and have a few  
> suggestions that seem to conflict with some of your settings.
>
> CPU_COUNT -> please don't change this.  Let Oracle figure it out.   
> In 10g it's an underscore parameter and it doesn't make much sense  
> to mess with it.  The only impact that a manual change will have (I  
> believe) is the default parallel degree, calculated by CPU_COUNT *  
> PARALLEL_THREADS_PER_CPU.
>
> PARALLEL_THREADS_PER_CPU -> if you feel the need to adjust the  
> default parallel degree, change this parameter.  Normally the  
> default (2) is fine, but based on your use of the default and  
> concurrent parallel operations, you may want to drop it to 1.
>
> PARALLEL_MAX_SERVERS -> I may be missing something, but if you set  
> this to "8" you'll be allowing, AT MOST, 1 concurrent parallel  
> operation.  If that's what you want then you're fine.  If you'd like  
> to have more than one parallel operation running concurrently,  
> you'll need to increase this.  Determine what your default parallel  
> degree would be, then make this parameter a multiple of it.  Once  
> you reach this threshold, all operations requesting parallelism will  
> be dropped to serial operations.  And don't forget that plenty of  
> parallel operations will use 2 to 3 times the degree requested, when  
> a P->P handoff happening.  So you may have 8 set on a table but  
> based on the query you may busy 16 or 24 PX slaves.
>
> PARALLEL_AUTOMATIC_TUNING -> set this to TRUE to allow default  
> parallelism settings to work, along with having PX messages run  
> through the large pool instead of the shared pool.
>
> PARALLEL_EXECUTION_MESSAGE_SIZE -> under 10g this isn't as much of a  
> factor but under 9i I'd make sure to set this to AT LEAST 8192 and  
> if you see this threshold being reach perhaps increase to 16384.
>
> For me, I felt it was best to define any object over 200MB to have  
> it's degree set to PARALLEL (the default degree).  Development and  
> test servers usually are much smaller than production.  Setting it  
> to the default allows parallel operations to behave similarly in  
> each environment, relative to the hardware.  So on a smaller server  
> the degree could be 4 whereas, without any changes, the same  
> operation uses 8 on production.  Ideally your performance would more  
> closely match environment to environment.
>
> I chose 200MB as the line in the sand for parallelism based on  
> careful monitoring.  It's important to work off of size vs. row  
> count since the work for each PX slave is based on total blocks  
> involved in the operation (the old 9/13th / degree breakup, then  
> 1/13 after that).  So it really doesn't matter how many rows but how  
> big the object is.
>
> I also recommend against defining a parallel degree in hints.   
> Normally object sizes will vary and if you really want specific  
> parallel degrees involved, then any major changes in object sizes  
> would require query modifications for optimal operations.  In fact  
> normally the optimizer will properly pick a parallel operation  
> without having to be told so.  If this is not the case, the it's  
> more likely statistics are off (which you mentioned aren't be  
> gathered) and that dynamic sampling is off as well.
>
> I'm also curious as to what types of DML statements you're seeing  
> gains with when forcing parallel DML.  Normally only an INSERT /*+  
> APPEND */ will get you parallel benefits, in a non-partitioned  
> environment.  Beyond partitioning, you may be seeing benefits with  
> parallelism in general if UPDATEs and DELETEs are based on queries  
> who in turn are now using parallelism.  Again, more details would be  
> needed to assess this.
>
> HTH.
>
> David C. Herring  | DBA, Acxiom Database Services
>
> 630-944-4762 office | 630-430-5988 cell | 630-944-4989 fax
> 1501 Opus Pl | Downers Grove, IL, 60515 | U.S.A. | www.acxiom.com
>
>
> ________________________________________
> From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l- 
> bounce@xxxxxxxxxxxxx] On Behalf Of Neil Kodner
> Sent: Monday, September 14, 2009 8:22 AM
> To: oracle-l@xxxxxxxxxxxxx
> Subject: best way to invoke parallel in DW loads?
>
> I've taken over a 9ir2 data warehouse and its loading process.
>
> Here are some 'highlights' of the current 9i warehouse environment(I  
> didn't do this!)
>
> Oracle 9.2.0.7
> Solaris 9
> 4x1200mhz sparcv9 cpu
> parallel_max_servers - 70(!)
> parallel_adaptive_multi_user=false
> many, many, many PARALLEL and RULE hints in the warehouse load  
> procedures.
> NO table statistics.
>
> The warehouse was designed years ago by someone very well-versed in  
> RBO.  Management has insisted we leave it alone in its oddly- 
> configured state because well, it works. It might not be efficient,  
> and we've had moments with concurrency, but the thing works.  It  
> might take all weekend to process the load, but it runs, and it runs  
> consistently. It is important to note that the production warehouse  
> is NOT on a dedicated server.   It's odd to see up to 30 parallel  
> slaves on a 4-cpu machine but that's just the way it is.
>
> My first task is to prepare the data load for a 10g migration.  To  
> this end, I've created the 10g development environment and copied  
> all of the procs/data.  I gather statistics on the tables and the  
> procs seem to work fine with current statistics and the RULE hints  
> removed.
>
> In the past, the users have used PARALLEL hints to force parallel  
> loading, often with 6 or 8 slaves on a 4-cpu box, even with multiple  
> processes running.
>
> My question lies with the best practices for invoking parallel.  We  
> will be moving the production warehouse to a machine with at least 8  
> CPU.  In our development environment, I set
>
> parallal_adaptive_multi_user=TRUE
> parallel_max_servers=8
> cpu_count=8
>
> I'm wondering where to invoke parallel processing.  If I compile the  
> procs without hints and execute, they run without parallel.  If I  
> ALTER SESSION FORCE PARALLEL DML, I get terrific results.  I'm not  
> sure if that's the best way to enable parallel processing (and let  
> parallel_adaptive_multi_user do its thing), or if I should be  
> setting the degree of each and every segment.  I'm sure that in this  
> 10g and above world, parallel hints are not the way to go.
>
> Is there a 'best practice' for handling this?
>
>
>
************************************************************************
***
> The information contained in this communication is confidential, is
> intended only for the use of the recipient named above, and may be  
> legally
> privileged.
>
> If the reader of this message is not the intended recipient, you are
> hereby notified that any dissemination, distribution or copying of  
> this
> communication is strictly prohibited.
>
> If you have received this communication in error, please resend this
> communication to the sender and delete the original message or any  
> copy
> of it from your computer system.
>
> Thank You.
>
************************************************************************
****
>

--
//www.freelists.org/webpage/oracle-l


Other related posts: