[haiku-development] Re: Tricks to get SSE working??

  • From: Fredrik Holmqvist <fredrik.holmqvist@xxxxxxxxx>
  • To: haiku-development@xxxxxxxxxxxxx
  • Date: Mon, 26 Mar 2012 22:00:03 +0200

Den 26 mars 2012 21:37 skrev Axel Dörfler <axeld@xxxxxxxxxxxxxxxx>:
> On 26.03.2012 21:15, Clemens Zeidler wrote:
>> shouldn't memcpy use the fasted method by itself? or is that not
>> efficiently possible?
>
>
> It already does so at boot time. However, it might not be the optimal method
> for your machine (in which case the general mechanism should be improved) or
> your use case.
> In case of the app_server, I'm not sure whether a dedicated memcpy() is the
> correct solution. I guess we should have a benchmark for this, so that we
> can decide which memcpy() versions end up in our kernel.

At the moment we use the standard rep movs[l,b] copying. And as far as
I know gcc doesn't try
to replace this by default, because gcc trusts glibc by default to do it fast.

The interesting thing is that we have a cpu_module which allows to
setting other implementations of memcpy and memset depending on cpu
detection at least for X86.

The default memcpy is here:
http://haiku.it.su.se:8180/source/xref/src/system/kernel/arch/x86/arch_x86.S#179

And the checking for optimized version for the cpu is here:
http://haiku.it.su.se:8180/source/xref/src/system/kernel/arch/x86/arch_cpu.cpp#839

I've played around with this a bit, but since my asm optimization
knowledge is from 486/Pentium I need to learn how to do fast assembly
for modern cpus. If anyone is interested in playing with this libmicro
can help in testing and give you an idea about the performance of your
code. I used a script that called a few of the tests for my
experimentation.

/Fredrik Holmqvist, TQH

Other related posts: