Re: Sorting Large Amounts of Data- Oracle to SAS, Part Deux

I'm not sure I understand how using a sorted hash cluster eliminates
the need for the group by that you mention.  I still am also unclear
on why there is a sorting requirement for the outbound data to SAS?
What problem does a sorted "export" solve?  SAS can sort and group and
order data also, correct?

I'm not sure there is a need to incorporate parallel here.  If the
requirement is to have the rows be returned in a global sorted order,
that by definition, requires (at some point) a serialization point and
thus is not parallel.  If you have forever to load rows (loading hash
clusters is slower and is a serial process compared to heap tables)
and limited temp space or memory, then that is likely the best
solution.  Also, you have a single process sending rows to SAS
understand one lonely dude can only send so many rows per second.  One
doesn't need massive amounts of parallelism if all roads lead to a
single lane -- there is no point in having 32 exit lanes if there is
only 1 door, having 1 exit lane is likely enough to keep 1 door
"busy".

So look at it this way: if you were to use Parallel Execution and we
assume that the sort can not fit entirely into memory what would
happen is 1) read all the data 2) sort the data back to disk/temp 3)
read data on disk/temp back in sorted order.  Compare that to just
reading the data in sorted order from a sorted hash cluster.  And
remember in both scenarios there is just one processing sending rows
to SAS.  From a resource requirement it should be obvious which one
takes less given they can likely send rows at the same speed.


On Wed, May 12, 2010 at 12:05 PM, Kellyn Pedersen <kjped1313@xxxxxxxxx> wrote:
> This is a simple select with a pass thru query from SAS, over the wire.  It 
> is 1.4 million rows, using a sorted_hash_cluster to sort the data for them so 
> they won't do a group by and an order by in their query.
>
> The statement with both, without the sorted hash cluster takes an unknown 
> amount of time-  they just couldn't get it to complete.  The sorted hash 
> cluster completes in 4 hrs.
>
> Of course, now the boss wants to know-  can I incorporate parallel?  So much 
> for that idea...  We are pretty far into this design at this point, but any 
> other ideas are always welcome!



--
Regards,
Greg Rahn
http://structureddata.org
--
http://www.freelists.org/webpage/oracle-l


Other related posts: