Re: Horizontally scaling a database

  • From: "Zhu,Chao" <zhuchao@xxxxxxxxx>
  • To: darrah.john@xxxxxxxxx
  • Date: Thu, 29 Mar 2007 01:27:29 +0800

Very well said, John!
Another issue is how you partition your data, as your database grow, you may
need to split your database again --with minimum change to your app and
minimum downtime;

On 3/17/07, John Darrah <darrah.john@xxxxxxxxx> wrote:

Seems like you would need to logically partition your data onto separate
servers using whatever on back end (mysql, BerkleyDB, etc).  You would then
need application servers sitting in front that have an index into where the
data is physically stored based on the partition key.  These application
servers would act as routers and aggregaters sending the actual queries to
appropriate servers and aggregating the results back to the requester.  I
don't know of anything that would do this out of the box but it can be
done.  If there wasn't a good key that could consistently reduce the number
of back end servers to a small subset of the total then you would forget
about the index routing strategy all together and just broadcast to all back
end servers every time.  How long would you need to retain the data?  A much
simpler strategy to implement would be to keep the data on one database and
partition it  you can put a LOT of rows into a partitioned table in oracle,
several years worth, even at your requirements.  I realize you are looking
at shared nothing but it'll cost you to implement the solution I describe
above.  You will also need exceptional developers and architects that have
implemented systems like this before.

On 3/15/07, Sandeep Dubey <dubey.sandeep@xxxxxxxxx> wrote:
> Hi all,
> I am looking for ideas how to scale database horizontally as opposed
> to adding more CPUs, memory and disks on a single machine.
> Here are the main requirements:
> We need to store apporx 300 GB of data a month. It will be OLTP system.
> We want to use commodity hardware and open source database. we are
> willing to sacrifice performance for cost. E.g. a single row search
> from 2 billion rows table should be returned in 2 sec.
> Start with one server, when it reaches its capacity add one more
> server and so on.
> Oracle RAC doesn't fit as it is single storage system. MySQL cluster
> database replicates data to all its cluster- we don't want to
> replicate the data.
> I am trying to emulate google type file system with RDBMS capabilities
> something like a database farm where each database work independently.
> A query is submitted to all databases in the farm, matching data found
> and gathered at one place and returned as single output.
> Any ideas?
> Thanks
> Sandeep
> --

Zhu Chao

Other related posts: