Re: partitioning options for a system that will add 50 million rows/day

  • From: Job Miller <jobmiller@xxxxxxxxx>
  • To: ryan_gaffuri@xxxxxxxxxxx, oracle-l@xxxxxxxxxxxxx
  • Date: Tue, 11 Sep 2007 05:33:02 -0700 (PDT)

range with sub-hash is quite valuable when you need to do large partition wise 
joins.    oracle can parallelize the join by sub hash partition.     it sounded 
like you would be doing simple key lookups that return 150 rows?  do you have 
anything meaningful that is part of the planned query to hash on?  are you 
expecting a query performance benefit from the prune, or is it just to 
introduce additional loading concurrency?

If you use hash partitions underneath range and you expect to get some kind of 
performance benefit you have to assure that the hash part key is part of the 
query.    even if it is, compare that to the range without the hash sub.  

my tests in the past have shown that the second level prune operation into a 
smaller index is equivalent to the i/o that would be done if there was just a 
little larger index without the sub partitions.  your mileage will vary, but 
just don't expect a big benefit from a second level prune, it should be for 
management reasons, i/o distributions, concurrency on load, etc..

corrections welcome.



ryan_gaffuri@xxxxxxxxxxx wrote:  I was quoted numbers of 50 Gbs/day so that 
assumes 1,000 bytes/row. I have not confirmed this. This database will probably 
grow to around 2 TBs since older data will be purged. All data will be added 
with bulk loads using external tables. There will be a query element that will 
use indexes. About 150 rows/query. Not sure what the load will be on this. 
Right now I am working on a partitioning strategy. We are going to use RAC and 
we are in 10.2.  
  
 We will definitely use date partitions with local indexes. There has been some 
talk of 1 partition/hour. Some data loads may take a while. This way newer 
loads can start on newer partitions. 
  
 Should I explore hash sub-partitions? I need to run tests, but I believe hash 
partitions will actually hurt performance of inserts since Oracle has to 
decided where to put the record. 
  
 I think we definitely need ASM because its too hard to manage spreading out 
partitions across the LUNs on the SAN manually and since we are adding and 
dropping alot of partitions we would need code to determine which LUN to put 
partitions on. 
  
 The data model is relatively simple. So there is not alot of complex business 
logic in the database. The rows are scrubbed before getting to the database. 
Its just straight array inserts off of external tables. 
  
 anyone have any suggestions or comments?  
  
  


       
---------------------------------
Luggage? GPS? Comic books? 
Check out fitting  gifts for grads at Yahoo! Search.

Other related posts: