Re: Horizontally scaling a database

  • From: Nigel Thomas <nigel_cl_thomas@xxxxxxxxx>
  • To: dubey.sandeep@xxxxxxxxx, oracle-l <oracle-l@xxxxxxxxxxxxx>
  • Date: Thu, 29 Mar 2007 02:04:16 -0700 (PDT)

Sadeep

Have you looked at Hibernate Shards (http://www.hibernate.org/414.html)? That 
gives some of the facilities that John Darrah mentions below. However, the key 
word is "some". 

The idea is that you can
- partition data across multiple databases (into "shards") - using application 
supplied strategies, which can be changed over time
- configure sequential and/or parallel selection strategies to retrieve data 
from those multiple shards
- use the normal Hibernate model to build your application 

It looks like a good start, but there are some pretty serious limitations as 
you run through the documentation. Perhaps most serious is in Chapter 6 
"Limitations" 
http://www.hibernate.org/hib_docs/shards/reference/en/html/shards-limitations.html:
"Hibernate Shards does not currently support cross-shard object graphs": That 
means you can't have inter-shard (as opposed to intra-shard) relationships 
handled automatically
"Hibernate Shards does not provide support for distributed transactions ..." - 
ie you have to plug in your own transaction manager
HQL support is very limited - they say "You're probably better off staying 
clear of HQL in this release if you can help it."
Replicated data is not (well) supported (see 6.6)
However, even given all that, you may still get some mileage out of it for your 
project - even if only as a learning exercise.

Regards Nigel



On 3/17/07, John Darrah <darrah.john@xxxxxxxxx> wrote:
Seems like you would need to logically partition your data onto separate 
servers using whatever on back end (mysql, BerkleyDB, etc).  You would then 
need application servers sitting in front that have an index into where the 
data is physically stored based on the partition key.  These application 
servers would act as routers and aggregaters sending the actual queries to 
appropriate servers and aggregating the results back to the requester.  I don't 
know of anything that would do this out of the box but it can be done.  If 
there wasn't a good key that could consistently reduce the number of back end 
servers to a small subset of the total then you would forget about the index 
routing strategy all together and just broadcast to all back end servers every 
time.  How long would you need to retain the data?  A much simpler strategy to 
implement would be to keep the data on one database and partition it  you can 
put a LOT of rows into a partitioned table in oracle, several years
 worth, even at your requirements.  I realize you are looking at shared nothing 
but it'll cost you to implement the solution I describe above.  You will also 
need exceptional developers and architects that have implemented systems like 
this before.

Other related posts: