[linux-cirrus] Re: Yet another MaverickCrunch hardware bug?

  • From: Martin Guy <martinwguy@xxxxxxxx>
  • To: linux-cirrus@xxxxxxxxxxxxx
  • Date: Tue, 3 Mar 2009 15:29:58 +0000

Hi again
   A  qiuck update. It wasn't the funny values in constant pools that
threw FFTW - it turns out that there is an undocumented hardware bug
in the E1 revision of the silicon, presumably also in D1 to E2,
whereby sequences such as
    ldr     r2, [pc, #44]
    cfstrd  mvd1, [r2]
or
    ldr     r6, [r3]
    cfldrd  mvd1, [r6]
write 64-bit words at seemingly random locations in memory, usually
into the text segment of its own executable (which should be
read-only!). See wiki.debian.org/ArmEabiMaverickCrunch erratum 14

I've now worked around that, and rewritten the entire
avoid-bad-instruction-sequence stuff, removing broken rubbish and
unconditionally enabling the workarounds when generating maverick
hardfloat, since all existing silicon has the same problems, removing
the -mcirrus-fix-invalid-insns flag and the purported fixes for
revision d0 silicon which were rubbish.

I can now compile the  FFTW benchmark with -O[0123s] with or without
-ffast-math and run the exhaustive correctness tests (which only take
48 hours each...).

Thinking "no known bugs" again, I looked around for a 64-bit
arithmetic correctness test and hit on  the openssl testsuite. Guess
what. No, you guessed it.
The chip only shifts up to 31 bits left or 32 bits right, but GCC
thinks it can shift a 64 bit word by up to 64 bits. This is stated in
the manual for constant shifts, but it also applies to shifts by an
amount in an ARM register, where only the bottom 6 bits are examined
as a signed quantity, so asking for a shift left by 48 bits results in
an arithmetic shift right by 16.

I'm running out of steam here, so will probably just braindamage the
64-bit shift ops and let GCC cope with it, rather than handling it in
an efficient way, but if someone who cares about EP93xx speed would
like to step up with some funding that'd help me solve the problem
more effectively.
In fact the whole 64-bit shift area could be optimised in several ways

By the way, does anyone actually use revision d0 silicon, and does
anyone have a copy of the D0 errata document (it seems to have
disappeared from the cirrus website)

Cheers

    M

Other related posts: