[linux-cirrus] Re: Yet another MaverickCrunch hardware bug?

  • From: Martin Guy <martinwguy@xxxxxxxx>
  • To: linux-cirrus@xxxxxxxxxxxxx
  • Date: Thu, 26 Mar 2009 10:25:28 +0000

On 3/26/09, Hasjim Williams <linux-cirrus@xxxxxxxxxxxxxxxxx> wrote:
> Hi Martin (and everyone else),
Thanks for writing!

>  Looks like you made some more progress
>  on standalone gcc.
Found more bugs, fixed more bugs. Now found more bugs again.
One in GCC softfloat that also bites Maverick in libvorbisenc and LAME
32-bit float code
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39501
Yesterday another that bites LAME encoding 11025 samples-per-second
WAVs giving segfault, cause unknown.

>  NB, I haven't tested your patches yet.
They pass more testsuite than any other, but the more I test the more
bugs I find.

>  The binutils save/restore bug is simple enough to fix.  See attachment.
Cheers! Looks good.
Scuse me if I'm being stupid, but shouldn't GCC emit ".save
mvf/mvd/mvfx/mvdx" so that 32-bit saves can be done where appropriate?

>  IEEE 754 compliance can also be done at the glibc level (and/or the
>  kernel level), in a similar way to the VFP code does it.  I think the
>  DENORM exception is supposed to be done at the kernel level, since it is
>  usually done by microcode on most processors.  The other exceptions get
>  signalled up to glibc, and it handles them, or passes them onto the
>  running process.
>
>  IEEE 754 exceptions on MavCrunch = Inexact, Underflow, Overflow, Denorm

That's a nice idea, but I don't think it can be made to work for the
following reasons.

There is a Denorm *bit* in the DSPSC but no denorm exception (the
fourth exception is "Invalid operator" with no further explanation as
to what that means!)
In any case, the Denorm bit is *not* set when denormalized valued are
presented as input to add/sub/abs/neg/cpy  Test program attached.
Multiply does set the Denorm bit and raise Inexact exception
(presumably in response to inexact reult, not denorm inputs)... but it
is the only FP instruction that does correctly handle denormalized
values so that's totally useless!

What's worse, if you want to trap exceptions and pinpoint the
instruction that caused them, you have to run the FPU in synchronous
mode, which cuts its performance in half for all operations.

> I assume you haven't recompiled glibc with Maverick Crunch enabled
No, I looked at it but it needed hacking so I put it on the TODO list.

> The patch does fix the longjmp problem though.
Cool

>  Has anyone built a complete system with crunch enabled (e.g. in
>  OpenEmbedded), or is it still only in certain apps, without accelerated
>  libraries?

My own objective is building accelerated Debian packages (apps and
libraries), and I'm starting to have some success, but it seems that
the more I test things the more bugs are revealed.

I'd like to look at the 64-bit integer stuff too, as I think that
would speed up SSL. I can fix the brain-damaged shift counts (which
are truncated to 32 bits right or 31 bits left) but there are yet more
undocumented hardware bugs in that area.
At present doubles seem to work, and I'm weeding out single-precision
hardware bugs.

This has to be the crappest FPU in the world.
The only thing it seems to lack is a half-and-catch-fire instruction!

    M
/*
 *      Test whether denormalized values as parameters raise an
 *      exception or set the Denorm bit in the control register.
 *
 *      cc -mcpu=ep9312 -mfpu=maverick and -mfloat-abi=softfp if using EABI.
 *
 *      Results (tested on Revision E1 silicon):
 *      - cfaddd neither sets Denorm nor raises an exception for
 *        denormalized inputs, silently taking them as zero.
 *      - cfmuld raises IX(Inexact) and UF(underflow) if you multiply two
 *        denorms together, but doesn't set the Denorm bit either.
 *
 * Martin Guy <martinwguy@xxxxxxxx> 5 Oct 2007 - 22 Mar 2009
 */
#include <stdlib.h>
#include <string.h>
#include <stdio.h>

/* Contents of Maverick Crunch DSPSC register in little-endian mode.
 * See EP9307 Users Guide, Chapter 2 */
struct DSPSC {
#define u unsigned
        u IO:1; u RSVD1:1; u OF:1; u UF:1; u IX:1; u IOE:1; u RSVD6:1; u OFE:1;
        u UFE:1; u IXE:1; u RM:2; u Denorm:1; u Invalid:1; u FWDEN:1; u V:1;
        u FCC:2; u SAT:2; u AEXC:1; u INT:1; u UI:1; u ISAT:1;
        u RSVD24:2; u HVID:3; u DAID:3;
        u INST:32;
#undef u
} dspsc;

static void read_dspsc(void);
static void write_dspsc(void);
static void print_dspsc(void);

/* A little union for bit pattern<->float conversion.
 * We use double, not float, to avoid float->double lossage when values are
 * passed to printf.
 */
union u {
        double d;
        unsigned long long ull;
} u;

double one = 1.0;       /* Use to test cfmuld behaviour */

main(int argc, char **argv)
{
        u.ull = 0x0000000000000001ULL;  /* smallest denormalized value */

        puts("Before:");
        read_dspsc();
        print_dspsc();
        printf("u: 0x%016x = %g\n", u.ull, u.d);

        denorm_add();   /* or denorm_mul() */

        puts("After:");
        read_dspsc();
        print_dspsc();
        printf("u: 0x%016x = %g\n", u.ull, u.d);

        exit(0);
}

/* Perform Maverick operations on a denormalized value */
denorm_add()
{
        asm("ldr r3,=u");
        asm("nop");     /* ward off the evil eye */
        asm("cfldrd mvd0,[r3]");
        asm("cfaddd mvd1, mvd0, mvd1");
        asm("nop");     /* ward off the evil eye */
        asm("cfstrd mvd1,[r3]");
}

denorm_mul()
{
        asm("ldr r1,=one");
        asm("ldr r3,=u");
        asm("nop");     /* ward off the evil eye */
        asm("cfldrd mvd0,[r3]");
        asm("cfldrd mvd1,[r1]");
        asm("cfmuld mvd2, mvd0, mvd1"); /* try ,mvd0,mvd0 to see IX,UF */
        asm("nop");     /* ward off the evil eye */
        asm("cfstrd mvd2,[r3]");
}

/* Print the contents of the structure in human-readable form */
static void
print_dspsc()
{
#define d dspsc
        printf("INST=0x%08x\n", d.INST);
        printf("DAID=%d HVID=%d ISAT=%d UI=%d INT=%d AEXC=%d SAT=%d\n",
              d.DAID, d.HVID, d.ISAT, d.UI, d.INT, d.AEXC, d.SAT);
        printf("FCC=%d V=%d FWDEN=%d Invalid=%d Denorm=%d RM=%d\n",
              d.FCC, d.V, d.FWDEN, d.Invalid, d.Denorm, d.RM);
        printf("IXE=%d UFE=%d OFE=%d IOE=%d IX=%d UF=%d OF=%d IO=%d\n",
              d.IXE, d.UFE, d.OFE, d.IOE, d.IX, d.UF, d.OF, d.IO);
#undef d
}

/* Copy register contents into our structure */
static void
read_dspsc()
{
        /* If you put the ldr after the cfmv32sc, you can
         * trigger an undocumented Maverick hardware bug, whereby
         * ldr rN, foo; cfstr64 mvdX, [rN] corrupts memory at random.
         * (Tested on Rev E1 hardware).
         */
        asm("ldr        r3, =dspsc");           /* Get address of dspsc */
        asm("cfmv32sc   mvdx0, dspsc");         /* Get DSPSC contents */
        asm("cfstr64    mvdx0, [r3]"); /* Store dspcc contents in dspsc */
}

/* Copy our structure's contents into the register */
static void
write_dspsc()
{
        asm("ldr        r0, =dspsc");           /* Get address of dspsc */
        /* no-op to workaround Maverick timing bug: cfldr64 must not busy-wait.
         * Without this, rubbish is written into all 64 bits of DSPSC.
         * (Erratum 3, tested on Rev E1 hardware)
         */
        // asm("mov     r0, r0");
        asm("cfldr64    mvdx0, [r0]");          /* Read our variable into c0 */
        asm("cfmvsc32   dspsc, mvdx0");         /* Write to dspcc */
}

Other related posts: