Eek! What binding version / Python version / OS? That is very broken. David On Tue, May 27, 2014 at 06:58:48AM -0700, Dinesh Vadhia wrote: > Do you see what is wrong with this put code? > > def put(env, db, append=False, key, value): > with env.begin(db, write=True) as txn: > txn.put(key, value, append=False) > return > > put(env=env, db=db, append=True, key='a', value='b') > > TypeError: put() got an unexpected keyword argument 'append' > > > > -------------------------------------------------- > From: "David Wilson" <dw@xxxxxxxx> > Sent: Tuesday, May 27, 2014 6:18 AM > To: <py-lmdb@xxxxxxxxxxxxx> > Subject: [py-lmdb] Re: py-lmdb write performance > > >Hi Dinesh, > > > >Your "divide and conquer" approach sounds interesting. In fact, assuming > >the merge step is literally just combining the partitions into one > >master database without any extra processing, LMDB includes a special > >'append' mode that would speed this operation up. > > > >A nice side effect of this approach is that the final database becomes > >optimally packed in the merge step, since it is written sequentially. > > > >Perhaps something like: > > > > def sorted_union(i1, i2): > > i1 = iter(i1) > > i2 = iter(i2) > > e1 = next(i1, None) > > e2 = next(i2, None) > > while e1 and e2: > > if e1 <= e2: > > yield e1 > > e1 = next(i1, None) > > else: > > yield e2 > > e2 = next(i2, None) > > > > for elem, it in (e1, i1), (e2, i2): > > if e: > > yield e > > for elem in it: > > yield elem > > > > def iterate_remote_db(num): > > """Do whatever necessary to call Cursor.iternext() on the remote > > database, returning an iterable of (key, value) pairs""" > > > > # Build a recursive union of all the cursor iterators > > merged = iter_local_db() > > for num in range(NUM_REMOTE_DBS): > > merged = sorted_union(merged, iterate_remote_db(num)) > > > > # Write sequentially to the final DB > > with master_env.begin(write=True) as txn: > > curs = txn.cursor() > > curs.putmulti(merged, append=True) > > > > > >David > > > >On Tue, May 27, 2014 at 05:57:53AM -0700, Dinesh Vadhia wrote: > >>The problem to solve is to create a very large db (> 1tb) of synthetic > >>data > >>using a cluster of machines. Once created, the db will be accessed by > >>one > >>machine only for predominantly read-only use. The filesystem is network > >>attached. > >> > >>One method to create the db is for each machine to create a dictionary of > >>data and save it on the filesystem - this is pretty fast. Next, get one > >>machine (only) to write each dictionary data to the db. One machine > >>writing > >>to lmdb on filesystem across a network should be okay but slow - yes? > > > > > > > > >