The problem to solve is to create a very large db (> 1tb) of synthetic data using a cluster of machines. Once created, the db will be accessed by one machine only for predominantly read-only use. The filesystem is network attached.
One method to create the db is for each machine to create a dictionary of data and save it on the filesystem - this is pretty fast. Next, get one machine (only) to write each dictionary data to the db. One machine writing to lmdb on filesystem across a network should be okay but slow - yes?
Best ... Dinesh -------------------------------------------------- From: "David Wilson" <dw@xxxxxxxx> Sent: Tuesday, May 27, 2014 5:36 AM To: <py-lmdb@xxxxxxxxxxxxx> Subject: [py-lmdb] Re: py-lmdb write performance
On Tue, May 27, 2014 at 05:26:56AM -0700, Dinesh Vadhia wrote:Hi David That pretty much explains why it hasn't been working! When you say "Instead you should probably wrap the database in anapplication-layer protocol that hides the file access behind a single small request/response roundtrip." - does such a layer exist or will a custom onehave to be written?Hey Dinesh, There are many of these, but none of them can be used with py-lmdb. You could write a custom one around py-lmdb, but it would probably be a waste of time. There is a large list of systems at http://symas.com/mdb/ ("LMDB In Other Projects"). Which one makes sense really depends on the operations you require, e.g. do you need the ability to do range scans? Most of those apps don't support that. DavidBest ... Dinesh -------------------------------------------------- From: "David Wilson" <dw@xxxxxxxx> Sent: Tuesday, May 27, 2014 4:56 AM To: <py-lmdb@xxxxxxxxxxxxx> Subject: [py-lmdb] Re: py-lmdb write performance >Hey Dinesh, > >It is almost certainly unsafe to use LMDB over the network from multiple >clients, and even if it were safe, the performance is going to suck.. > >* The lock file will record incorrect information, assuming it > does not become corrupt > >* Each random page-in will involve at least 1 roundtrip and at least 4 > frames (1 tx, 3 rx), perhaps unless you've played with the MTU for > your network segment. > >* Each random page-out will involve similar numbers > >More generally, the bus speed of ethernet is vastly higher latency >(500usec vs. 0.1usec) and vastly lower bandwidth than RAM (320+Gbit/sec >vs. 1Gbit/sec). > >Accessing the raw database file over a network may incur this penalty >for every page of the file needing to be accessed. Instead you should >probably wrap the database in an application-layer protocol that hides >the file access behind a single small request/response roundtrip. > > >David > >On Tue, May 27, 2014 at 04:07:55AM -0700, Dinesh Vadhia wrote: >>Hi! The write performance from machines on a fast network cluster to a >>file >>system is not that great. The write code is: >> >> with env.begin(write=True) as txn: >> txn.put('a', 'b') >>The hardware and network will impact performance but is it also because >>lmdb is >>not geared for distributed computing? >> >>Best ... >>Dinesh >> > >