Re: Data-dependent slowdown in loop involving io.lines()

  • From: Юрий Соколов <funny.falcon@xxxxxxxxx>
  • To: luajit@xxxxxxxxxxxxx
  • Date: Fri, 7 Nov 2014 15:23:45 +0400

Mike, do you still think that over simplified string hash function is
sufficient?

2014-11-07 4:24 GMT+03:00 Tudor Bosman <tudorb@xxxxxxxxx>:

> This is a reduced test case from production code; we noticed that looping
> over a large list of filenames was taking a long time, so we decided to dig
> deeper. I tested this with LuaJIT 2.0.2 and 2.1, on Linux x86_64.
>
> I'm attaching two Lua files.
>
> gen.lua generates a 56MB file with 1 million lines. It can generate the
> file in one of two formats that only differ in the last few characters on
> each line (corresponding lines are of the same length in both formats). Run
> as luajit gen.lua 1 > /tmp/file1, luajit gen.lua 2 > /tmp/file2.
>
> wc.lua counts lines in stdin, similarly to running the Unix command "wc
> -l". Run as luajit wc.lua < /tmp/file1, luajit wc.lua < /tmp/file2.
>
> Running wc.lua on the first file (where lines end in JPEG.1) takes 0.4
> seconds (on my machine). Running wc.lua on the second file (where lines end
> in 1.JPEG) takes over 4 seconds (it's faster with LuaJIT 2.0.2, 4.2
> seconds, than with LuaJIT 2.1, 5.3 seconds).
>
> Moreover, if we allocate a lot of memory on the heap before the loop (a 10
> million entry, integer-keyed, integer-valued table), the first file takes a
> bit longer (3.2 seconds) whereas the second file... well, I interrupted
> after 30 seconds.
>
> I suspect this has something to do with the way strings are hashed, but I
> haven't dug any deeper.
>
> Thanks,
> -Tudor.
>
>

Other related posts: