Re: alleviate the load of the GC

  • From: Stefano <phd.st.p@xxxxxxxxx>
  • To: luajit@xxxxxxxxxxxxx
  • Date: Wed, 2 Sep 2015 14:30:23 +0100

On 2 September 2015 at 10:49, Laurent Deniau <Laurent.Deniau@xxxxxxx> wrote:

Motivation:
Overloading other operators (beyond syntactic sugar) leads to some
optimisation like sharing temporaries, building lazy expressions, or flatten
expressions that alleviate a lot the GC. C++ libraries are using these kind
of optimisations for decades now with success. To be a bit provocative,
overloading operators has little sense without being able to overload '=',
except basic syntactic sugar. The operator '=' should be seen as a component
of the expression (or statement), and not as a special case, and there is no
reason to not be able to overload it.

Experience:
I have observed with LuaJIT (and other GC-ed languages including C/C++ and
BoehmGC) that I can get a speed up of x10-100 when I can properly manage and
share temporaries within expressions but this requires some _local_
"finalisation" when the results is assigned somewhere. The 'where' is not
important (local, global, key, whatever) because what matters is to know if
the user can reuse (assigned the resulting temporaries) or not (no access to
the temporary), that is if the result is semantically anchored. Without the
reuse of temporaries, the GC gets quadratically slower (specially with large
objects not sinked for LuaJIT).

Best,
Laurent.

I believe that what you are referring to is mainly the application of
expression template techniques as in the Eigen C++ matrix/vector
algebra library (http://eigen.tuxfamily.org/).
By overloading the arithmetic and assignment operators it is possible
to have expressions of vectors such as:
x += y + z + w;
that gets 'expanded' at compile time to a single loop and with no need
of temporaries.

Previous versions of my sci.alg (http://scilua.org/sci_alg.html)
module employed similar techniques and you could write:
x:set(x + y + z + w)
I tested different implementations, both via cdata and tables, that
would rely heavily on allocation sinking to achieve good performance.
The micro benchmarks (like iterating the above thousand times)
revealed competitive performance, indeed in some benchmarks it was
faster than Eigen.
However, when employed in more complex simulations it ended up putting
too much of a burden on the LuaJIT compiler so I have ditched the idea
for now (I suspect further LuaJIT optimisations and hyperblock
scheduling could make it work).

For now I am resorting to plain old loops for expressions like the
above, but I am working on extending the Lua syntax to allow for
things like:
x[] = x[] + y[] + z[] + w[]
The way this is implemented is via code-transform: a new Lua file with
inlined expressions is generated and executed.

Apologies if I am off-topic, it seemed to me that this is what you
wanted to achieve and wanted to share my (long) experience and
conclusions.

Btw, your proposal would introduce issues with function calls for
instance (__assign being called for function arguments?)

Stefano

Other related posts: