Added assembler optimized INT64 byteswapping and here is the latest results: (Using Visual C++ 6.0 Standard Edition on AMD Athlon XP 2200+ @ 1800 MHz) NOTE: USING x86 inline assembly swap functions! testing B_HOST_TO_LENDIAN_* ... testing B_LENDIAN_TO_HOST_* ... testing B_HOST_TO_BENDIAN_* ... testing B_BENDIAN_TO_HOST_* ... Correctness test complete. Now doing speed testing.... B_SWAP_INT16 exercise took 343ms to complete. B_SWAP_INT32 exercise took 312ms to complete. B_SWAP_INT64 exercise took 359ms to complete. B_SWAP_FLOAT exercise took 812ms to complete. B_SWAP_DOUBLE exercise took 1562ms to complete. --- NOTE: Using unoptimized C++ swap functions. testing B_HOST_TO_LENDIAN_* ... testing B_LENDIAN_TO_HOST_* ... testing B_HOST_TO_BENDIAN_* ... testing B_BENDIAN_TO_HOST_* ... Correctness test complete. Now doing speed testing.... B_SWAP_INT16 exercise took 500ms to complete. B_SWAP_INT32 exercise took 593ms to complete. B_SWAP_INT64 exercise took 1000ms to complete. B_SWAP_FLOAT exercise took 1109ms to complete. B_SWAP_DOUBLE exercise took 1875ms to complete.