[haiku-development] Re: Optimizing Painter::_DrawBitmapBilinearCopy32

  • From: Joseph Prostko <joe.prostko+haiku@xxxxxxxxx>
  • To: haiku-development@xxxxxxxxxxxxx
  • Date: Tue, 16 Jun 2009 19:58:22 +0000

On Tue, Jun 16, 2009 at 4:02 PM, Michael Lotz<mmlr@xxxxxxxx> wrote:

>On Mon, Jun 15, 2009 at 8:18 PM, Ingo Weinhold<ingo_weinhold@xxxxxx> wrote:
>> Or FreeBSD. I'm also not aware of any features disabled in Haiku's
>> gcc4, but Michael might know more about it.
>

> No features are disabled. The GCC4 builds of Haiku do not have full
> optimizations enabled, but that doesn't mean that the compiler wouldn't
> provide them. It's a standard build of GCC4.
>
> Regards
> Michael
>

I was going to chime in basically saying the same thing.  By default,
Haiku builds at -O2.  The only things that aren't utilized in the GCC4
builds of Haiku at that optimization level are the strict-aliasing and
tree-vrp optimizations.  Unfortunately, those will have to stay the
same way with the upcoming GCC 4.4.1 due to the nature of the Haiku
code and the quirkiness of the tree-vrp optimization.  That said, they
aren't crippled in GCC at all, just they aren't used when building
Haiku itself.  Architecture-specific oftimizations (like -march,
-mtune) should work as expected as well.  Same goes for other
optimizations in general.

Actually, due to this thread (and my curiousity), I decided to try my
current GCC 4.4 patch against the most recent GCC 4.5 snapshot, and
Haiku built with no issues.  I tried this since I wanted to see if I
could build Haiku with -march=atom (for the Intel Atom) this laptop
has.  The 4.4 branch may never merge in the changes for Atom, so I
decided to try out 4.5 for the fun of it.  Anyways, Haiku built, but
unfortunately rebooted before it even got to a bootscreen.  I then
tried with -mtune=atom , and that worked just fine (and I'm writing
this email from it).  Also, building with no processor
architecture-specific optimizations worked fine as well, so that is
good to know.  In any case, I won't dwell on that too much, since 4.5
is still in active development.

So yes, digressions aside, GCC 4.4 will give us three more
optimizations across -O1 and -O2, as well as the integrated register
allocator (IRA), which should speed things up a bit, generally
speaking.  The Graphite loop framework can also be built into GCC
(giving an additional three optimizations), but they aren't built or
enabled by default.  I get the impression they will add more register
pressure, so may counter any good done by the new register allocator
if enabled (at least for i386 and the like).  I think it'll be a  "try
and see" kind of thing.

Anyways, after a new snapshot of 4.4 comes out tonight, I want to try
building that natively on Haiku (and then build Haiku with it) to make
sure the changes I made to the patch lately don't negatively impact
anything.  GCC 4.4.1 is supposed to be released on or around Sunday,
so if all goes to plan and there is interest, we could have GCC 4.4.1
shortly after it comes out.

I'll be sure to post an enhancement ticket once I have all of my ducks in a row.

If anybody's curious, here's my out-of-context numbers for Haiku built
via cross-compile in Linux with the GCC 4.5 snapshot from June 11th,
with -mtune=atom and no Graphite optimizations.  As you can see, they
aren't much different from Urias' numbers in a previous email.

Benchmark: Haiku app_server bilinear copy
Compile date: Jun 14 2009 14:38:02
GCC version: 2.95.3-haiku-081024

CPU vendor ID: GenuineIntel
CPU:          Intel(R) Atom(TM) CPU N270   @ 1.60GHz
  SIMD instructions: MMX SSE SSE-Integer SSE2 SSE3 SSSE3

Can't lock process to CPU on this platform.
Estimated CPUID/RDTSC overhead: 144 clock cycles.
10 runs per benchmark.

                    --  Results  --

       Minimum    Average    Maximum
# 1:   1046676    1074435    1248300  - 'C, original'
# 2:   1164624    1172548    1195956  - 'C, precise'
# 3:   1288152    1291638    1298124  - 'C, precise DIV'
# 4:    520176     521718     529692  - 'MMX/SSE'
# 5:    454680     456427     471504  - 'MMX/SSE optim-test'
# 6:    482052     490591     534156  - 'SSE2'
# 7:    515844     517653     526800  - 'SSSE3'

- joe

Other related posts: