Peter Colberg wrote: > The reason for not unrolling manually is to have dimension-independent > code, suitable for both two- and three-dimensional systems of particles. To achieve consistently high performance you'll need to unroll those small vector operations by hand. See the previous posts about tuning GSL on the Lua mailing list (it uses a template pre-processor to automate that). > I was actually surprised to see that the bound checks have no > influence on the performance (maybe the above “benchmark” is too > trivial and flawed...). Is ABCelim in lj_opt_fold.c responsible for > eliminating such bound checks? Bounds checks use the integer units of the CPU, whereas the actual computations use the floating-point units. With a super-scalar out-of-order CPU, the integer overhead is completely hidden. --Mike