Mike, do you still think that over simplified string hash function is sufficient? 2014-11-07 4:24 GMT+03:00 Tudor Bosman <tudorb@xxxxxxxxx>: > This is a reduced test case from production code; we noticed that looping > over a large list of filenames was taking a long time, so we decided to dig > deeper. I tested this with LuaJIT 2.0.2 and 2.1, on Linux x86_64. > > I'm attaching two Lua files. > > gen.lua generates a 56MB file with 1 million lines. It can generate the > file in one of two formats that only differ in the last few characters on > each line (corresponding lines are of the same length in both formats). Run > as luajit gen.lua 1 > /tmp/file1, luajit gen.lua 2 > /tmp/file2. > > wc.lua counts lines in stdin, similarly to running the Unix command "wc > -l". Run as luajit wc.lua < /tmp/file1, luajit wc.lua < /tmp/file2. > > Running wc.lua on the first file (where lines end in JPEG.1) takes 0.4 > seconds (on my machine). Running wc.lua on the second file (where lines end > in 1.JPEG) takes over 4 seconds (it's faster with LuaJIT 2.0.2, 4.2 > seconds, than with LuaJIT 2.1, 5.3 seconds). > > Moreover, if we allocate a lot of memory on the heap before the loop (a 10 > million entry, integer-keyed, integer-valued table), the first file takes a > bit longer (3.2 seconds) whereas the second file... well, I interrupted > after 30 seconds. > > I suspect this has something to do with the way strings are hashed, but I > haven't dug any deeper. > > Thanks, > -Tudor. > >