Re: Performance of ffi.new v/s malloc() arrays

From: Sean Conner <sean@xxxxxxxxxx>
To: luajit@xxxxxxxxxxxxx
Date: Sun, 19 Feb 2017 22:34:22 -0500

It was thus said that the Great Ammar Hakim once stated:

Hi All,

I have been working on a computational physics code that mostly uses
LuaJIT, with a few pieces written in C. The code performs very well, in
fact, for solution of some equations is actually 3x faster than
corresponding Fortran code (not written by me).

Anyway, I have found a strange issue on my Mac. Basically, we deal with
huge arrays and have to hence use malloc/calloc to manage the memory
ourselves. However, it seems that the performance of ffi.new() v/s
malloc/calloc allocated fields is different. I don't mean the allocator
efficiency which is not a big deal as most fields we allocate live for the
lifetime of the application.

I managed to boil the problem down to the following example pasted below.
If I run it with the "useFFIAlloc" flag set to "true" the code runs about
3x faster on my Mac! This is with LJ 2.1 beta2. It seems the difference is
not so much on a Linux box, but I have not tested on Linux extensively (I
do most of my dev work on a Mac). If anyone has any ideas or perhaps can
point to something I am not doing properly, it will be great.

  Try testing with using malloc() and see if the performance goes up.
calloc() zeros out the memory while malloc() doesn't.  There's an
optimization under Linux where if you allocate a large enough block (via
malloc() or calloc()) it will pull memory directly from the operating system
which ensures the pages of memory are already filled with zeros, thereby
avoiding a potentially expensive write operation.

  -spc

References:
- Performance of ffi.new v/s malloc() arrays
  - From: Ammar Hakim

Re: Performance of ffi.new v/s malloc() arrays

Other related posts: