Oh!, Maybe there is a problem: Reading a dictionary (d) and writing key:value pair to db with: for key, value in d.items(): with env.begin(db, write=True) as txn: txn.put(key=key, value=value, append=True) and reading it back with: with env.begin(db) as txn: value = txn.get(key)But, the value = None for all keys. The dictionary has been checked and it has valid keys and values.
Using Python 2.7.5 on a Windows box. What am I missing? -------------------------------------------------- From: "David Wilson" <dw@xxxxxxxx> Sent: Tuesday, May 27, 2014 7:17 AM To: <py-lmdb@xxxxxxxxxxxxx> Subject: [py-lmdb] Re: py-lmdb write performance
Eek! What binding version / Python version / OS? That is very broken. David On Tue, May 27, 2014 at 06:58:48AM -0700, Dinesh Vadhia wrote:Do you see what is wrong with this put code? def put(env, db, append=False, key, value): with env.begin(db, write=True) as txn: txn.put(key, value, append=False) return put(env=env, db=db, append=True, key='a', value='b') TypeError: put() got an unexpected keyword argument 'append' -------------------------------------------------- From: "David Wilson" <dw@xxxxxxxx> Sent: Tuesday, May 27, 2014 6:18 AM To: <py-lmdb@xxxxxxxxxxxxx> Subject: [py-lmdb] Re: py-lmdb write performance >Hi Dinesh, > >Your "divide and conquer" approach sounds interesting. In fact, assuming >the merge step is literally just combining the partitions into one >master database without any extra processing, LMDB includes a special >'append' mode that would speed this operation up. > >A nice side effect of this approach is that the final database becomes >optimally packed in the merge step, since it is written sequentially. > >Perhaps something like: > > def sorted_union(i1, i2): > i1 = iter(i1) > i2 = iter(i2) > e1 = next(i1, None) > e2 = next(i2, None) > while e1 and e2: > if e1 <= e2: > yield e1 > e1 = next(i1, None) > else: > yield e2 > e2 = next(i2, None) > > for elem, it in (e1, i1), (e2, i2): > if e: > yield e > for elem in it: > yield elem > > def iterate_remote_db(num): > """Do whatever necessary to call Cursor.iternext() on the remote > database, returning an iterable of (key, value) pairs""" > > # Build a recursive union of all the cursor iterators > merged = iter_local_db() > for num in range(NUM_REMOTE_DBS): > merged = sorted_union(merged, iterate_remote_db(num)) > > # Write sequentially to the final DB > with master_env.begin(write=True) as txn: > curs = txn.cursor() > curs.putmulti(merged, append=True) > > >David > >On Tue, May 27, 2014 at 05:57:53AM -0700, Dinesh Vadhia wrote: >>The problem to solve is to create a very large db (> 1tb) of synthetic >>data >>using a cluster of machines. Once created, the db will be accessed by >>one>>machine only for predominantly read-only use. The filesystem is >>network>>attached. >>>>One method to create the db is for each machine to create a dictionary >>of >>data and save it on the filesystem - this is pretty fast. Next, get >>one>>machine (only) to write each dictionary data to the db. One machine >>writing >>to lmdb on filesystem across a network should be okay but slow - yes? > > > >