Re: How come simple "closures" aren't compiled?

From: Las <lasssafin@xxxxxxxxx>
To: luajit@xxxxxxxxxxxxx
Date: Sun, 4 Dec 2016 22:30:14 +0100

For reference:

local clock = os.clock;

local n = 0;
local now = clock();
for i = 1, 2^24 do
n = (function(i) return i + 2 end)(n);
end

now = clock() - now;
print("Time for 1: " .. now);

local n = 0;
local now = clock();
for i = 1, 2^24 do
n = n + 2;
end

now = clock() - now;
print("Time for 2: " .. now);

This code prints:
Time for 1: 1.104
Time for 2: 0.023

I don't know whether to be impressed that the JIT-compiler compiles
functions so quickly, or whether to be disappointed that it can't compile
the loop containing 'fn' efficiently.

BTW this was done on an AMD Athlon x4 760k clocked at 4.1 GHz.

Las

On 4 December 2016 at 22:24, Las <lasssafin@xxxxxxxxx> wrote:

That's a very good idea!
But I'd probably use _[index] for arguments, instead of 'a', 'b', etc.
(A bit related: I really hate the lua designers hate for complex syntax.
Like why shouldn't I be able to do {...}[2] instead of resorting to a
'select'?)

Then the question is if the JIT-compiler can compile it efficiently.

The code:
local memo = {} -- memoization table
function fn (expr)
   if memo[expr] == nil then
      local code = ("return function (...) _ = {...}; return %s
end"):format(expr)
      memo[expr] = assert(loadstring(code))()
   end
   return memo[expr]
end

local n = 0;
for i = 1, 2^16 do
    -- 1
    n = fn("_[1] + 2")(n);
    -- 2
    n = n + 2;
end

When 2 is commented, the generated IR is:
0036 ------ LOOP ------------
0037 >  p32 UREFO  test.lua:2  #0
0038 >  p32 EQ     0037  0000
0039 >  tab TNEW   #3    #0
0040    p32 FLOAD  0039  tab.array
0041    p32 AREF   0040  +1
0042    num ASTORE 0041  0033
0043    tab HSTORE 0028  0039
0044    nil TBAR   0024
0045  + num ADD    0033  +2
0046  + int ADD    0034  +1
0047 >  int LE     0046  +65536
0048    int PHI    0034  0046
0049    num PHI    0033  0045

When 1 is commented, the generated IR is:
0006 ------ LOOP ------------
0007  + num ADD    0003  +2
0008  + int ADD    0004  +1
0009 >  int LE     0008  +65536
0010    int PHI    0004  0008
0011    num PHI    0003  0007

Obviously it isn't very fast.

The times were measured like this for both:
local n = 0;
local now = clock();
for i = 1, 2^24 do
    --n = fn("_[1] + 2")(n);
    n = n + 2;
end

now = clock() - now;

Time for 1: 1.081
Time for 2: 0.022

Your way was ingenious, but the JIT-compiler is too stupid sadly.

Las

On 4 December 2016 at 20:58, Luke Gorrie <luke@xxxxxxxx> wrote:

On 4 December 2016 at 18:18, Las <lasssafin@xxxxxxxxx> wrote:

Yeah, that's what I meant.

Oh :). Well, I needed the practice at explaining how a tracing JIT
operates anyway. Starts to sound comical saying "Just read Thomas
Schilling's PhD thesis and you will have some initial idea..." too often.

I know it can compile calls, but it can't compile FNEW and UCLO of
simple functions.
The reason I'm asking is because obviously such "closures" can be used
to simplify APIs.

I see. Yes, in a perfect world the JIT would compile the closure creation
(and perhaps even sink the allocation.) I suppose one compensation is that
the loop inside the foreach() function could still be compiled and it is
only the caller that will suffer from the NYI.

One workaround could be to invent a new formulation that has the runtime
behavior that you want even if not the traditional syntax.

How about if you would replace the original code:

foreach(t, function(i, n) return n * 2 end)

with an alternative:

foreach(t, fn[[b*2]])

that does JIT efficiently and does not create new closures. Could be
implemented as:

-- fn(expr): Create a closure that returns the value of <expr>.
--           expr is a Lua expression with up to five arguments (a, b, c,
d, e).
--
-- Example: fn[[a*b]](21,2) => 42
memo = {} -- memoization table
function fn (expr)
   if memo[expr] == nil then
      local code = ("return function (a,b,c,d,e) return %s
end"):format(expr)
      memo[expr] = assert(loadstring(code))()
   end
   return memo[expr]
end

-- Example:
local acc = 0
for i = 1, 100 do
   acc = acc + fn[[a*b]](21,2)
end
assert(acc == 42*100)

References:
- How come simple "closures" aren't compiled?
  - From: Las
- Re: How come simple "closures" aren't compiled?
  - From: Luke Gorrie
- Re: How come simple "closures" aren't compiled?
  - From: Peter Cawley
- Re: How come simple "closures" aren't compiled?
  - From: Las
- Re: How come simple "closures" aren't compiled?
  - From: Luke Gorrie
- Re: How come simple "closures" aren't compiled?
  - From: Las

Re: How come simple "closures" aren't compiled?

Other related posts: