2013/10/2 pulkomandy <pulkomandy@xxxxxxxxxxxxx>: >> > +nextPowerOf2(uint32 v) >> [...] >> > +countSetBits(uint32 v) >> >> Maybe put those in some header for reuse? > > It may be better to use GCC builtins when available (check for > #ifdef __GNUC__): > http://gcc.gnu.org/onlinedocs/gcc-4.4.2/gcc/Other-Builtins.html > > __builtin_popcount will replace countSetBits. > There is no direct version of 'nextPowerOf2', but __builtin_clz (count leading > zeros) may be used: > http://locklessinc.com/articles/next_pow2/ > > Since the builtins resolve to dedicated asm instructions, this should result > in > smaller and faster code whenver possible, and to an algorithm similar to > the one you used in other cases. I should have expected they have some builtins for x86 instructions like bsr. The article about finding the next power of 2 shows that using __builtin_clz (what on x86 would be translated to bsr instruction) produces slower code that the 'bit hack' with assignments, ors and shifts. Because of that and the fact that __builtin_clz alone is not enough to compute the next power of 2 I don't really see any point in using it. The situation with __buildin_popcount isn't much better. Unless GCC is allowed to use popcnt instruction (which was introduced in SSE4) it is replaced by a call to a helper function and in a result it is slower than the 'bit hack'. I am going to, as Axel suggested, move those functions to kernel/util since it is not the first place in the kernel when they are used (and certainly not the last). However, I don't really think that unconditional use of builtins is a good idea. Paweł