Re: Data-dependent slowdown in loop involving io.lines()

  • From: Florian Weimer <fw@xxxxxxxxxxxxx>
  • To: luajit@xxxxxxxxxxxxx
  • Date: Sat, 08 Nov 2014 23:29:58 +0100

* Tudor Bosman:

> Couldn't LuaJIT, for example, do full (dense) hashing for strings of length
> of less than 100 or so bytes, and assume that all longer strings are data
> and therefore likely to be unique (and intern them without any hashing /
> comparison check)? Is there an assumption that two strings are equal if
> *and only if* their interned ids are the same?

Once you have uninterned strings anywhere, you cannot use address
comparisons to check for string equality anymore, slowing everything
down.

Maybe it is possible to use a different collision resolution strategy
(e.g., a crit-bit tree per hash bucket).  That would only affect
string interning, and would not create changes throughout the VM.  I
think that in terms of performance, its performance impact would be
mostly restricted to strings whose hashes collide (where the benefits
would greatly outweight the costs), although there will be the
occasional hash bucket collision between strings of different hash
value.  All this seems to involve very difficult trade-offs.

Other related posts: