Hi David! Thanks a lot for your input! I realise that I should have included more info in my original e-mail. The app is a service that will back a web app. Reads will be performed on page requests, but (almost) all writes will be performed in background jobs. My app uses two main databases that will have the most read/writes. Both uses int64s as keys, and a list of int64s as values (using dupsort=True). The first DB maps object IDs to a list of user IDs which are associated with that object. The user ID list ranges in length between zero and a couple of thousands, and averages at about 120. The total number of unique keys in this DB is currently a little over 3 millions (so around 360 millions of user ID values), but is expected to grow 20x the upcoming months. The second DB maps user IDs to the list of object IDs that they are associated with. The total number of unique keys in this DB is currently about 70 million. The average number of values per key is about 5. The typical write scenario is where my service receives an object ID and a list of user IDs for that object. It will then construct one list of (object_id, user_id) tuples, and one list of (user_id, object_id) tuples, and then use putmulti() to write this data to the two databases. The service will also do this for existing object IDs, in which case it will also remove user IDs and object IDs (that was previously present, but no longer exists in the user ID list that was passed in) from the values in both databases. In the typical read scenario the service will read all user IDs for an object ID, and then it will retrieve the list of object IDs for each of these user IDs. The reads and writes will be pretty random, and there will be many more reads than writes. I'm not sure if I'll need MDB_SYNC. To loose the last minute of data in case of a crash would be acceptable, but getting a corrupt database would not. The documentation ( https://lmdb.readthedocs.org/en/release/#environment-class) seems to suggest that using sync=False would be okay as long as I also use writemap=False, and use a filesystem that preserves write order. Which filesystems preserves write order? How about ext4? To me, it seems like the only drawback I would get from using different environments would be that I might get the two DBs in an inconsistent state in case of a crash. Also I might get inconsistent backup dumps. This I could probably live with, in case it would greatly increase my write performance. In an ideal world I would be able to write extensive load tests that I could use for profiling different options, but as the only developer on a project with scaling pains, I'm aiming at "profiling it in production (TM)" (running side by side with the current system). Again, thanks a lot for your input. I've tried to provide as much details as possible about the system, and if you have any additional thoughts, I'd appreciate it greatly! Best, Jonatan On Tue, Feb 3, 2015 at 9:03 PM, David Wilson <dw@xxxxxxxx> wrote: > Hi Jonatan! > > There's not quite enough info to work with here, and probably it's > better if you just profile how your app behaves first. For example, > > * if you're writing all those keys from a sequential range of keys, then > that will perform much better than random completely random keys. If > your write txns are fast enough, then probably it's not worth the > effort of splitting your data into separate environments. Often > sequential write is faster by a factor of 2 or more. > > * Are your keys/values huge or tiny? To get an intuitive feel for > performance, it's sometimes helpful to think of txn size in terms of > total MB read/written.. this is especially true when the entire DB is > cached or on very fast media. Writing 100,000 8 byte key + 8 byte > values (1.52MiB) is going to be much faster than say, 512 byte values > (49.59MiB). > > * if you're overwriting existing keys during the write txn, then that > changes things a lot too, since DB growth will be bounded. > > * Similarly, the presence/absense of MDB_SYNC (if suitable for your app) > drastically changes write performance characteristics. > > * How many concurrent writers do you need? Is it a web app? Another > option is to push your large writer off to a task queue. > > * Perhaps tweak examples/dirtybench.py from the Git repository to create > keys/values roughly of your size. This will produce some nice example > numbers for you to work from for your hardware/disk combination. > > Drop me a line if you need some more ideas, but probably it's best for > you to just write some throwaway code to profile your specific use case. > > > David > > On Tue, Feb 03, 2015 at 08:50:43AM -0800, Jonatan Heyman wrote: > > Hi! > > > > I'm new to LMDB and python-lmdb, but so far I find it really promising > and well > > documented. > > > > I'm writing an app where I plan to use LMDB to store records in three > different > > named databases. One action in my app might result in hundreds or > thousands of > > read/writes to these databases. If I've understood LMDB correctly, there > is a > > global write lock for each Environment. Would it then make sense for me > to > > store my different databases in three different Environments, in order to > > decrease the time spent waiting for write locks? > > > > Best, > > Jonatan > >