Re: Sorting Large Amounts of Data- Oracle to SAS, Part Deux

  • From: Kellyn Pedersen <kjped1313@xxxxxxxxx>
  • To: greg@xxxxxxxxxxxxxxxxxx
  • Date: Wed, 12 May 2010 15:15:40 -0700 (PDT)

OK Greg, first-  you're preaching to the choir... :)
 
The group by is part of the larger aggregate function, but when we are not 
using the sorted hash cluster, then it appears in the mix.  You are right, it's 
not part of the sorted hash cluster though.
 
SAS can and in my opinion, should sort the the data on it's side, but per an 
agreement with earlier groups, this put too much "pressure" on the SAS server 
and someone decided that ALL DEMANDS should be on the database and performed by 
Oracle.
 
The users are demanding a sorted output of data for their SAS files.  I won't 
lie-  I have still not received an answer that explains this requirement other 
than if they do not pull the data sorted, SAS will sort it each time they run 
it on their side??  It doesn't make complete sense to me and being a newbie to 
SAS, none of my research has resulted in an explanation of this or an example 
of repeated sorting from SAS if the data isn't presented in a sorted format 
from Oracle.
 
I also agree on the lack of reasoning behind parallel and this process.  I'm 
having a more difficult time convincing folks of the reasons to not use 
parallel here, so the question was posed to me and I did want to ensure I 
wasn't missing something, so please accept my thanks for answering the original 
question and how you answered it.

Let me know if I missed anything here-  the snow and the cold medicine is 
getting to me today! :)
Kellyn 

--- On Wed, 5/12/10, Greg Rahn <greg@xxxxxxxxxxxxxxxxxx> wrote:


From: Greg Rahn <greg@xxxxxxxxxxxxxxxxxx>
Subject: Re: Sorting Large Amounts of Data- Oracle to SAS, Part Deux
To: "Kellyn Pedersen" <kjped1313@xxxxxxxxx>
Cc: "oracle Freelists" <oracle-l@xxxxxxxxxxxxx>
Date: Wednesday, May 12, 2010, 3:25 PM


I'm not sure I understand how using a sorted hash cluster eliminates
the need for the group by that you mention.  I still am also unclear
on why there is a sorting requirement for the outbound data to SAS?
What problem does a sorted "export" solve?  SAS can sort and group and
order data also, correct?

I'm not sure there is a need to incorporate parallel here.  If the
requirement is to have the rows be returned in a global sorted order,
that by definition, requires (at some point) a serialization point and
thus is not parallel.  If you have forever to load rows (loading hash
clusters is slower and is a serial process compared to heap tables)
and limited temp space or memory, then that is likely the best
solution.  Also, you have a single process sending rows to SAS
understand one lonely dude can only send so many rows per second.  One
doesn't need massive amounts of parallelism if all roads lead to a
single lane -- there is no point in having 32 exit lanes if there is
only 1 door, having 1 exit lane is likely enough to keep 1 door
"busy".

So look at it this way: if you were to use Parallel Execution and we
assume that the sort can not fit entirely into memory what would
happen is 1) read all the data 2) sort the data back to disk/temp 3)
read data on disk/temp back in sorted order.  Compare that to just
reading the data in sorted order from a sorted hash cluster.  And
remember in both scenarios there is just one processing sending rows
to SAS.  From a resource requirement it should be obvious which one
takes less given they can likely send rows at the same speed.


On Wed, May 12, 2010 at 12:05 PM, Kellyn Pedersen <kjped1313@xxxxxxxxx> wrote:
> This is a simple select with a pass thru query from SAS, over the wire.  It 
> is 1.4 million rows, using a sorted_hash_cluster to sort the data for them so 
> they won't do a group by and an order by in their query.
>
> The statement with both, without the sorted hash cluster takes an unknown 
> amount of time-  they just couldn't get it to complete.  The sorted hash 
> cluster completes in 4 hrs.
>
> Of course, now the boss wants to know-  can I incorporate parallel?  So much 
> for that idea...  We are pretty far into this design at this point, but any 
> other ideas are always welcome!



--
Regards,
Greg Rahn
http://structureddata.org
--
//www.freelists.org/webpage/oracle-l





      

Other related posts: