[py-lmdb] Re: py-lmdb write performance

  • From: "Dinesh Vadhia" <dineshvadhia@xxxxxxxxxxx>
  • To: <py-lmdb@xxxxxxxxxxxxx>
  • Date: Thu, 29 May 2014 08:49:49 -0700

* I thought that opening an lmdb from one machine over a networked filesystem would work. Ok, looks like this is the problem and will look for an alternative method.

* The cluster is linux-based.

Thanks
Dinesh

--------------------------------------------------
From: "David Wilson" <dw@xxxxxxxx>
Sent: Thursday, May 29, 2014 8:35 AM
To: <py-lmdb@xxxxxxxxxxxxx>
Subject: [py-lmdb] Re: py-lmdb write performance

On Thu, May 29, 2014 at 07:46:29AM -0700, Dinesh Vadhia wrote:

- Next, one machine writes each dictionary data to lmdb on filesystem
across network which takes ~2.5 hours per dictionary.

It sounds like you are still opening an LMDB database over a networked
filesystem.

Just to make this clear: mounting an NFS / SMB / CIFS / Ceph filesystem
then calling "lmdb.open(/path/to/that/filesystem)" is slow and
fundamentally broken, you should never do it. If you are experiencing
slowness in this configuration, it is because this configuration is slow
and fundamentally broken.

As previously discussed, you should stream the database over the network
using some alternative means to the machine that will open the LMDB
database on its local disk. Alternatively you could export the volume
over e.g. iSCSI or ATAoE, neither of which suffer the caching and
coherency problems of NFS.


Attached are output for dirtybench.py from Windows and Linux.

Are you experiencing the problem on Linux or Windows?

I cannot tell if your host environment is Windows or Linux - I asked for
dirtybench output only from the slow environment.

I really cannot help if you do not answer my questions or pay attention
to my responses.


>   * Are you still using a network filesystem? We already know that is
>     broken
>   * What OS?
>   * What filesystem?
>   * What host machine?
>   * Does your job start fast, and then slow down? If so, is your
>     dataset larger than RAM?
>   * Are there any other users of the machine that might cause it to be
>     slow?
>   * How large are your transactions? (how many records / how many GB).
>   * Have you tried splitting your writes into smaller txns?

You did not answer these questions



Other related posts: