Re: memory allocation policy

  • From: Laurent Deniau <Laurent.Deniau@xxxxxxx>
  • To: "<luajit@xxxxxxxxxxxxx>" <luajit@xxxxxxxxxxxxx>
  • Date: Fri, 26 Sep 2014 06:05:30 +0000

On Sep 26, 2014, at 6:09 AM, Юрий Соколов 
<funny.falcon@xxxxxxxxx<mailto:funny.falcon@xxxxxxxxx>> wrote:

Have you tried some kind of "autorelease" pool: allocation function puts all 
allocated objects into list (or pointers into array),

There are usually implemented as stacks (growing dynamic array) since their 
purpose is to have a user-defined stack of deferred released objects. It is 
also an elegant library solution to manage stack unwinding with exception (i.e. 
non-local jumps).

when calculation finished,

From the point of view of the library, how do you know that the calculation is 
finished?

you mark result as "needed" then call "free all objects in a list that doesn't 
marked as needed".

Semantically, it is not different from the "get" below, just more conservative. 
You can use Autorelease pools internally to the lib to group the "get" over a 
set of statements, but this is not their main purpose and may be very 
conservative (like the GC) or error prone.

It is how people in Objective C used to simplify reference counting for years: 
autorelease pool just decrements reference count, and incrementing reference 
count once more works as "mark as needed".

I have implemented a DSL in C similar to Objective-C (but better ;-) call C 
Object System which uses intensively autorelease pools. It is very useful, but 
the class of problem it solves is different: it's an explicit deferred 
non-local release.

Best,
Laurent.

25.09.2014 16:44 пользователь "Laurent Deniau" 
<Laurent.Deniau@xxxxxxx<mailto:Laurent.Deniau@xxxxxxx>> написал:
On Sep 24, 2014, at 11:05 PM, Javier Guerra Giraldez 
<javier@xxxxxxxxxxx<mailto:javier@xxxxxxxxxxx>> wrote:

> On Wed, Sep 24, 2014 at 3:50 PM, Cosmin Apreutesei
> <cosmin.apreutesei@xxxxxxxxx<mailto:cosmin.apreutesei@xxxxxxxxx>> wrote:
>> I think he means implementing a pool in Lua, so the function you pass
>> to ffi.gc() is Lua all the way down.
>
>
> i think the solution has more than one part (both are in Mike's answer)
>
> 1.- release resources as soon as possible.  The issue (and this
> happens both in Lua and LuaJIT), is that Lua only sees very small
> objects, barely bigger than a pointer; so there's not much pressure to
> collect garbage.  The solution is to add some 'release' method to your
> objects and call them the moment they're not needed.

It was my first observation in my post. Our first try was to bet on LJ GC speed 
(with success) until we had to scale to larger problems.

If I could call a release method explicitly, I would not have the problem even 
with a pool managed in C (I consider the cost of ffi calls to be negligible).

An intermediate approach would be to tag explicitly temporaries for immediate 
destruction or stealing, but it does not work without explicit intervention of 
the _user_. The common problem to this approach is:

assume a, b are matrices

c = a*b  -- c refer to a temporary created by *
d = 2*c -- * steals the temporary referenced by c
e = 3*c -- boom, c is not valid

in C++, we can overload the operator= (+copy-ctor+move-ctor+move-assign) to 
clear the tmp flag, but not in Lua AFAIK...

A possible (error-prone) solution would be to force the use of:
c = get(a*b)
d = get(2*c)
e = get(3*c)

In LuaJIT, we can use __index to write:
c = (a*b).get
d = (2*c).get
e = (3*c).get

where get is not defined
   __index = function (self, key)
     if key == "get" then
         self.tmp = false
         return self
     end
    error(…)
   end


> 2.- (de)allocators are slow, and LuaJIT don't compile calls to __gc.
> But if you don't wait for the GC to release, you can do much faster.

If I know when to release, I don't have the problem...

> What we do in SnabbSwitch is to allocate a big FFI array and then
> handle freelists (just an array of pointers to the elements).  just
> getting an element from the freelist and returning it later is _much_
> faster than any general-purpose allocator, no garbage is generated and
> there's no fragmentation.

This is what test with R=22 does, but in C.

I don't see the difference between managing it in C or LuaJIT for this purpose. 
I do see why Mike propose to manage it in LuaJIT, but it's only if the 
bottleneck is coming from the speed of the interpreter. But I suspect that it 
would not kill the performance by a factor 30, the problem is elsewhere.

Best,
Laurent.



Other related posts: