2014-09-10 22:31 GMT+02:00 pdziepak-github.memcpy-v2 <community@xxxxxxxxxxxx >: > added 2 changesets to branch 'refs/remotes/pdziepak-github/memcpy-v2' > old head: bb159c73fb99410fb591ee25b1fe0b92cd5870a5 > new head: c4c07587ea65bf0a7854ca91204324301b72e8df > overview: https://github.com/pdziepak/Haiku/compare/bb159c7...c4c0758 > > > ---------------------------------------------------------------------------- > > f4c3362: kernel/x86_64: save fpu state at interrupts > > The kernel is allowed to use fpu anywhere so we must make sure that > user state is not clobbered by saving fpu state at interrupt entry. > There is no need to do that in case of system calls since all fpu > data registers are caller saved. > > We do not need, though, to save the whole fpu state at task swich > (again, thanks to calling convention). Only status and control > registers are preserved. This patch actually adds xmm0-15 register > to clobber list of task swich code, but the only reason of that is > to make sure that nothing bad happens inside the function that > executes that task swich. Inspection of the generated code shows > that no xmm registers are actually saved. > I admit, this is mess. Definitely, the next thing I am going to do for Haiku is rewriting kernel entry/exit code. > Signed-off-by: Paweł Dziepak <pdziepak@xxxxxxxxxxx> > > c4c0758: libroot/x86_64: new memcpy implementation > > This patch introduces new memcpy() implementation that improves the > performance when the buffer is small. It was written for processors that > support ERMSB, but performs reasonably well on older CPUs as well. > > The following benchmarks were done on Haswell i7 running Debian Jessie > with Linux 3.16.1. In each iteration 64MB buffer was copied, the > parameter "size" is the size of the buffer passed in a single call (i.e. > for "size: 2" memcpy() was called ~32 million times to copy the whole > 64MB). > > f - original implementation, g - new implementation, all buffers 16 byte > aligned > > cpy, size: 8, f: 79971 µs, g: 20419 µs, ∆: 74.47% > cpy, size: 32, f: 42068 µs, g: 12159 µs, ∆: 71.10% > cpy, size: 128, f: 13408 µs, g: 10359 µs, ∆: 22.74% > cpy, size: 512, f: 10634 µs, g: 10433 µs, ∆: 1.89% > cpy, size: 1024, f: 10474 µs, g: 10536 µs, ∆: -0.59% > cpy, size: 4096, f: 9419 µs, g: 8630 µs, ∆: 8.38% > > f - glibc 2.19 implementation, g - new implementation, all buffers 16 > byte > aligned > > cpy, size: 8, f: 26299 µs, g: 20919 µs, ∆: 20.46% > cpy, size: 32, f: 11146 µs, g: 12159 µs, ∆: -9.09% > cpy, size: 128, f: 10778 µs, g: 10354 µs, ∆: 3.93% > cpy, size: 512, f: 12291 µs, g: 10426 µs, ∆: 15.17% > cpy, size: 1024, f: 13923 µs, g: 10571 µs, ∆: 24.08% > cpy, size: 4096, f: 11770 µs, g: 8671 µs, ∆: 26.33% > > f - glibc 2.19 implementation, g - new implementation, all buffers > unaligned > > cpy, size: 16, f: 13376 µs, g: 13009 µs, ∆: 2.74% > cpy, size: 32, f: 11130 µs, g: 12171 µs, ∆: -9.35% > cpy, size: 64, f: 11017 µs, g: 11231 µs, ∆: -1.94% > cpy, size: 128, f: 10884 µs, g: 10407 µs, ∆: 4.38% > cpy, size: 256, f: 10826 µs, g: 10106 µs, ∆: 6.65% > cpy, size: 512, f: 12354 µs, g: 10396 µs, ∆: 15.85% > If anyone is interested this is the code used for measurements (both memcpy and memset): https://gist.github.com/pdziepak/2ae4e5ea88e8477d136a Signed-off-by: Paweł Dziepak <pdziepak@xxxxxxxxxxx> > > [ Paweł Dziepak <pdziepak@xxxxxxxxxxx> > ] > > > ---------------------------------------------------------------------------- > > 11 files changed, 244 insertions(+), 67 deletions(-) > headers/private/kernel/arch/x86/64/cpu.h | 10 +- > headers/private/kernel/arch/x86/64/iframe.h | 1 + > headers/private/kernel/arch/x86/arch_cpu.h | 11 +- > src/system/kernel/arch/x86/64/arch.S | 29 ----- > src/system/kernel/arch/x86/64/interrupts.S | 67 ++++++++-- > src/system/kernel/arch/x86/64/thread.cpp | 31 ++--- > src/system/kernel/arch/x86/arch_cpu.cpp | 5 +- > src/system/kernel/arch/x86/arch_thread.cpp | 4 + > .../kernel/arch/x86/arch_user_debugger.cpp | 23 +++- > src/system/kernel/arch/x86/asm_offsets.cpp | 1 + > .../posix/string/arch/x86_64/arch_string.cpp | 129 ++++++++++++++++++- > <snip, full diff>