This is a reduced test case from production code; we noticed that looping over a large list of filenames was taking a long time, so we decided to dig deeper. I tested this with LuaJIT 2.0.2 and 2.1, on Linux x86_64. I'm attaching two Lua files. gen.lua generates a 56MB file with 1 million lines. It can generate the file in one of two formats that only differ in the last few characters on each line (corresponding lines are of the same length in both formats). Run as luajit gen.lua 1 > /tmp/file1, luajit gen.lua 2 > /tmp/file2. wc.lua counts lines in stdin, similarly to running the Unix command "wc -l". Run as luajit wc.lua < /tmp/file1, luajit wc.lua < /tmp/file2. Running wc.lua on the first file (where lines end in JPEG.1) takes 0.4 seconds (on my machine). Running wc.lua on the second file (where lines end in 1.JPEG) takes over 4 seconds (it's faster with LuaJIT 2.0.2, 4.2 seconds, than with LuaJIT 2.1, 5.3 seconds). Moreover, if we allocate a lot of memory on the heap before the loop (a 10 million entry, integer-keyed, integer-valued table), the first file takes a bit longer (3.2 seconds) whereas the second file... well, I interrupted after 30 seconds. I suspect this has something to do with the way strings are hashed, but I haven't dug any deeper. Thanks, -Tudor.
Attachment:
gen.lua
Description: Binary data
Attachment:
wc.lua
Description: Binary data