[haiku-commits] Re: BRANCH pdziepak-github.scheduler [8ec8973] src/system/kernel/arch/x86 headers/private/kernel src/system/boot/platform/bios_ia32 src/add-ons/kernel src/system/kernel/arch/x86/32

  • From: Pawel Dziepak <pdziepak@xxxxxxxxxxx>
  • To: haiku-commits@xxxxxxxxxxxxx
  • Date: Wed, 2 Oct 2013 21:01:50 +0200

2013/10/2 pulkomandy <pulkomandy@xxxxxxxxxxxxx>:
>> > +nextPowerOf2(uint32 v)
>> [...]
>> > +countSetBits(uint32 v)
>>
>> Maybe put those in some header for reuse?
>
> It may be better to use GCC builtins when available (check for
> #ifdef __GNUC__):
> http://gcc.gnu.org/onlinedocs/gcc-4.4.2/gcc/Other-Builtins.html
>
> __builtin_popcount will replace countSetBits.
> There is no direct version of 'nextPowerOf2', but __builtin_clz (count leading
> zeros) may be used:
> http://locklessinc.com/articles/next_pow2/
>
> Since the builtins resolve to dedicated asm instructions, this should result 
> in
> smaller and faster code whenver possible, and to an algorithm similar to
> the one you used in other cases.

I should have expected they have some builtins for x86 instructions like bsr.

The article about finding the next power of 2 shows that using
__builtin_clz (what on x86 would be translated to bsr instruction)
produces slower code that the 'bit hack' with assignments, ors and
shifts. Because of that and the fact that __builtin_clz alone is not
enough to compute the next power of 2 I don't really see any point in
using it.

The situation with __buildin_popcount isn't much better. Unless GCC is
allowed to use popcnt instruction (which was introduced in SSE4) it is
replaced by a call to a helper function and in a result it is slower
than the 'bit hack'.

I am going to, as Axel suggested, move those functions to kernel/util
since it is not the first place in the kernel when they are used (and
certainly not the last). However, I don't really think that
unconditional use of builtins is a good idea.

Paweł

Other related posts: