[py-lmdb] Re: Fw: Re: py-lmdb write performance

  • From: David Wilson <dw@xxxxxxxx>
  • To: py-lmdb@xxxxxxxxxxxxx
  • Date: Thu, 29 May 2014 12:25:17 +0000

Hey Dinesh,

On Thu, May 29, 2014 at 02:15:06AM -0700, Dinesh Vadhia wrote:

> - on Windows, the data.mdb is created with size=map_size
> - on Linux, data.mdb is created with size ~12K irrespective of map_size.
> Once db populatation starts then map_size disk space is allocated.
> 
> If so, then why is it taking 2.5 hours to write ~1gb of data?

Are you using writemap=True? On Linux and Windows, writemap=True relies
on sparse files. The "size" reported by Windows/Linux may be confusing,
depending on which tool you use, as some will report "allocated size"
and others "logical size".

E.g. the Linux "du" command will report allocated size, while the "ls"
command will report logical size.

IIRC Windows explorer will report logical size, whereas DOS "dir"
command will report allocated size.

Additionally even when writemap=False, LMDB holds some state in memory
while a transaction is alive, and so the disk usage may not be reflected
until .commit() is called.


Regarding your performance problem, you are not providing nearly enough
information for me to help you.
    * What OS?
    * What filesystem?
    * What host machine?
    * Is it a VM?
    * Have you run your script under 'cProfile --sort=cum' and verified
      LMDB is really the cause?
    * Are you still using a network filesystem? We already know that is
      broken
    * Does your job start fast, and then slow down? If so, is your
      dataset larger than RAM?
    * What kind of disks are you writing to?
    * Are there any other users of the machine that might cause it to be
      slow?
    * How large are your transactions? (how many records / how many GB).
    * Have you tried splitting your writes into smaller txns?

I have updated examples/dirtybench.py in the Git repository to work on
Windows. Please provide dirtybench.py output from your host environment
where the slowness is being observed.


David
> 
> 
> --------------------------------------------------
> From: "Dinesh Vadhia" <dineshvadhia@xxxxxxxxxxx>
> Sent: Wednesday, May 28, 2014 10:15 AM
> To: <py-lmdb@xxxxxxxxxxxxx>
> Subject: Re: [py-lmdb] Re: py-lmdb write performance
> 
> >Looks like the (centos-based) cluster is not creating the lmdb db with the
> >correct map_size.  A small map_size works fine but larger ones (eg. >
> >30gb) creates a 12K db !  Not sure what is going on but the admins are
> >looking into it.  Not sure if it is an lmdb or OS problem yet.
> >
> >
> >--------------------------------------------------
> >From: "David Wilson" <dw@xxxxxxxx>
> >Sent: Wednesday, May 28, 2014 5:57 AM
> >To: <py-lmdb@xxxxxxxxxxxxx>
> >Subject: [py-lmdb] Re: py-lmdb write performance
> >
> >>That's about 120kb/sec, which doesn't sound right.
> >>
> >>On Wed, May 28, 2014 at 05:48:47AM -0700, Dinesh Vadhia wrote:
> >>>Generated sorted keys to create local dictionaries on each cluster
> >>>machine;
> >>>next, one machine merges each dictionary in sorted order into db; but it
> >>>still takes ~2.5 hours to write ~1gb of data with append=True;
> >>>doesn't sound right does it or does it?
> >>
> >>
> 

Other related posts: