[py-lmdb] Re: py-lmdb write performance

  • From: "Din Vadhia" <dineshvadhia@xxxxxxxxxxx>
  • To: <py-lmdb@xxxxxxxxxxxxx>
  • Date: Tue, 30 Sep 2014 12:04:42 -0700

Hi David

Do you remember from a few months back I was trying to get LMDB to work across network attached storage and got into a right mucking fuddle? Anyway, moving on ...

On a multi-core machine and/or cluster of multi-core machines with network attached storage, would it be possible for each core to run an LMDB db instance in-memory only. The data load into the LMDB on each core can take place at startup. Alternatively, pre-create the LMDB db for each core and store on the network storage; at startup, load the db into each core memory. At most each core LMDB db would have 1m records. Hope this makes sense!

Best ...

From: "David Wilson" <dw@xxxxxxxx>
Sent: Thursday, May 29, 2014 8:35 AM
To: <py-lmdb@xxxxxxxxxxxxx>
Subject: [py-lmdb] Re: py-lmdb write performance

On Thu, May 29, 2014 at 07:46:29AM -0700, Dinesh Vadhia wrote:

- Next, one machine writes each dictionary data to lmdb on filesystem
across network which takes ~2.5 hours per dictionary.

It sounds like you are still opening an LMDB database over a networked

Just to make this clear: mounting an NFS / SMB / CIFS / Ceph filesystem
then calling "lmdb.open(/path/to/that/filesystem)" is slow and
fundamentally broken, you should never do it. If you are experiencing
slowness in this configuration, it is because this configuration is slow
and fundamentally broken.

As previously discussed, you should stream the database over the network
using some alternative means to the machine that will open the LMDB
database on its local disk. Alternatively you could export the volume
over e.g. iSCSI or ATAoE, neither of which suffer the caching and
coherency problems of NFS.

Attached are output for dirtybench.py from Windows and Linux.

Are you experiencing the problem on Linux or Windows?

I cannot tell if your host environment is Windows or Linux - I asked for
dirtybench output only from the slow environment.

I really cannot help if you do not answer my questions or pay attention
to my responses.

>   * Are you still using a network filesystem? We already know that is
>     broken
>   * What OS?
>   * What filesystem?
>   * What host machine?
>   * Does your job start fast, and then slow down? If so, is your
>     dataset larger than RAM?
>   * Are there any other users of the machine that might cause it to be
>     slow?
>   * How large are your transactions? (how many records / how many GB).
>   * Have you tried splitting your writes into smaller txns?

You did not answer these questions

Other related posts: