Re: Data-dependent slowdown in loop involving io.lines()

  • From: Coda Highland <chighland@xxxxxxxxx>
  • To: luajit@xxxxxxxxxxxxx
  • Date: Sat, 8 Nov 2014 14:51:03 -0800

On Sat, Nov 8, 2014 at 2:29 PM, Florian Weimer <fw@xxxxxxxxxxxxx> wrote:
> * Tudor Bosman:
>
>> Couldn't LuaJIT, for example, do full (dense) hashing for strings of length
>> of less than 100 or so bytes, and assume that all longer strings are data
>> and therefore likely to be unique (and intern them without any hashing /
>> comparison check)? Is there an assumption that two strings are equal if
>> *and only if* their interned ids are the same?
>
> Once you have uninterned strings anywhere, you cannot use address
> comparisons to check for string equality anymore, slowing everything
> down.
>
> Maybe it is possible to use a different collision resolution strategy
> (e.g., a crit-bit tree per hash bucket).  That would only affect
> string interning, and would not create changes throughout the VM.  I
> think that in terms of performance, its performance impact would be
> mostly restricted to strings whose hashes collide (where the benefits
> would greatly outweight the costs), although there will be the
> occasional hash bucket collision between strings of different hash
> value.  All this seems to involve very difficult trade-offs.

I think the conclusion to be made here is that you should drag in
fopen and fread through the FFI and treat the content as char arrays
instead of using lines() if you need that kind of performance.

/s/ Adam

Other related posts: