Given that everything is JITted, it might not be that difficult to calculate relative memory addresses. If I told my lua_state at start that it's base allocation was at 0xfffffffffffffffff000000000, it could continue to use only 31 bits, but calculate all its loads and stores at base + offset. This would take me to the moon. On Wed, Jan 22, 2014 at 5:06 PM, Dan Eloff <dan.eloff@xxxxxxxxx> wrote: > > With 32 cores on a box and 32 corresponding lua interpreters, we're > limited to 2^31/2^5 byes of addressable space. (2^26 -> 64MB). That can go > fast. > > That's exactly why I went to the trouble of implementing a custom mmap as > well. Expanding it to the full 32 bits is probably the low hanging fruit, > and it would buy us a couple more years at the rate processor core counts > are going up (maybe by then Mike would have found time/sponsors for the new > garbage collector.) > > Another option is to use seperate processes to encapsulate the LuaJITs, > but then you have to mess with shared memory data structures for sharing > data between Lua states. That's painful, especially as many data structure > implementations don't permit that. (Or you can use something slower for > IPC, but that's not an option for me.) It gets worse though because each > LuaJIT has access to huge amounts of memory in these data structures (64GB > in testing, but up to over 1TB in practise.) Duplicating the page tables > for that 32 times is both massively slow and massively wasteful. And it's > only going to get worse as Moore's law marches on. > > > > > > > > > -- Theo Schlossnagle http://omniti.com/is/theo-schlossnagle