RE: Parallel Distributed CTAS

  • From: "Mark W. Farnham" <mwf@xxxxxxxx>
  • To: <greg@xxxxxxxxxxxxxxxxxx>, <Laimutis.Nedzinskas@xxxxxx>
  • Date: Fri, 4 Feb 2011 13:16:19 -0500

Exactly. So if you did have two or more distinct "pipes" (yes, I believe he
was referring to networks) unbonded between the remote hosts, and if,
despite your understanding about the network not being an issue, it actually
was, then separate sessions each fielding half the source data (again
presuming there is a reasonable way to write that query) then you might
improve the speed with two (or more) non-parallel sessions.

 

IF network is actually the bottleneck, then it might be even worth it to
dump half the data into the real destination, half into another "fake"
destination and follow up with a local CTAS (which would be a simple select
* in bulk) from "fake" to final for that half. Of course if you have
partitioning and the pieces can usefully be dumped directly into sensible
(sub)partitions you avoid that as well.

 

Oracle does a damn fine job of PX, but there has to be some overhead for
coordination. If the job is big enough to look at eliminating that overhead
and looking shows that running separate sessions does not require more
effort in figuring out how to do it than the savings possible then you win.
Most of the time (Greg, I hope I'm saving you from having to add this back
in), perhaps even the vast preponderance of the time if Oracle's PX
engineering built in the product can't come pretty doggone close to the
theoretical maximum throughput, then you have some resource obstacle to
parallelism that is gating Oracle's performance and not the other way
around.

 

If you DO have two "pipes" available, then Oracle's using only one within a
single session is an exception. Of course you might be able to remove that
obstacle by "bonding" the "pipes." Your mileage may vary. As far as I know
there is no way other than "bonding" network resources so they appear to
Oracle to be a single resource to utilize multiple "pipes" within an Oracle
session in a way that can be exploited by PX. [Again, I think Greg said that
same thing a little differently and much shorter.]

 

The exception to the effort expenditure is when the cost of time is
different for preparation versus execution. For example if you have months
to prepare for a move and an uncomfortably small window for the production
move, then putting in a lot of effort to do better for your specific case
than Oracle has engineered to handle the general case might be worthwhile.

 

That's where I come in.   (wink)

 

If you don't have 2 pipes, then what is making you think PX would speed this
up anyway? Is the source data query complex and slow? Oh, and does the query
involve some grouping or ordering that means it has to do the extra work of
putting the pieces all back together before it can start spilling the
results across the pipe?

 

mwf

  _____  

From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx]
On Behalf Of Greg Rahn
Sent: Friday, February 04, 2011 11:36 AM
To: Laimutis.Nedzinskas@xxxxxx
Cc: oracle-l@xxxxxxxxxxxxx
Subject: Re: Parallel Distributed CTAS

 

What happens here is that you get a parallel select on the remote db, a
parallel insert on the local db, but the communication between the two nodes
over the db link is done via the QC so its 1:1 (or serial).

 

On Fri, Feb 4, 2011 at 4:23 AM, <Laimutis.Nedzinskas@xxxxxx> wrote:

What is a pipe ? You mean network ?
My understanding is that network is not an issue.

It is how oracle handles it. Execution plans do differ for remote and
local, that's true:

 

-- 
Regards,
Greg Rahn
http://structureddata.org

Other related posts: