On 3/26/09, Hasjim Williams <linux-cirrus@xxxxxxxxxxxxxxxxx> wrote: > Hi Martin (and everyone else), Thanks for writing! > Looks like you made some more progress > on standalone gcc. Found more bugs, fixed more bugs. Now found more bugs again. One in GCC softfloat that also bites Maverick in libvorbisenc and LAME 32-bit float code http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39501 Yesterday another that bites LAME encoding 11025 samples-per-second WAVs giving segfault, cause unknown. > NB, I haven't tested your patches yet. They pass more testsuite than any other, but the more I test the more bugs I find. > The binutils save/restore bug is simple enough to fix. See attachment. Cheers! Looks good. Scuse me if I'm being stupid, but shouldn't GCC emit ".save mvf/mvd/mvfx/mvdx" so that 32-bit saves can be done where appropriate? > IEEE 754 compliance can also be done at the glibc level (and/or the > kernel level), in a similar way to the VFP code does it. I think the > DENORM exception is supposed to be done at the kernel level, since it is > usually done by microcode on most processors. The other exceptions get > signalled up to glibc, and it handles them, or passes them onto the > running process. > > IEEE 754 exceptions on MavCrunch = Inexact, Underflow, Overflow, Denorm That's a nice idea, but I don't think it can be made to work for the following reasons. There is a Denorm *bit* in the DSPSC but no denorm exception (the fourth exception is "Invalid operator" with no further explanation as to what that means!) In any case, the Denorm bit is *not* set when denormalized valued are presented as input to add/sub/abs/neg/cpy Test program attached. Multiply does set the Denorm bit and raise Inexact exception (presumably in response to inexact reult, not denorm inputs)... but it is the only FP instruction that does correctly handle denormalized values so that's totally useless! What's worse, if you want to trap exceptions and pinpoint the instruction that caused them, you have to run the FPU in synchronous mode, which cuts its performance in half for all operations. > I assume you haven't recompiled glibc with Maverick Crunch enabled No, I looked at it but it needed hacking so I put it on the TODO list. > The patch does fix the longjmp problem though. Cool > Has anyone built a complete system with crunch enabled (e.g. in > OpenEmbedded), or is it still only in certain apps, without accelerated > libraries? My own objective is building accelerated Debian packages (apps and libraries), and I'm starting to have some success, but it seems that the more I test things the more bugs are revealed. I'd like to look at the 64-bit integer stuff too, as I think that would speed up SSL. I can fix the brain-damaged shift counts (which are truncated to 32 bits right or 31 bits left) but there are yet more undocumented hardware bugs in that area. At present doubles seem to work, and I'm weeding out single-precision hardware bugs. This has to be the crappest FPU in the world. The only thing it seems to lack is a half-and-catch-fire instruction! M
/* * Test whether denormalized values as parameters raise an * exception or set the Denorm bit in the control register. * * cc -mcpu=ep9312 -mfpu=maverick and -mfloat-abi=softfp if using EABI. * * Results (tested on Revision E1 silicon): * - cfaddd neither sets Denorm nor raises an exception for * denormalized inputs, silently taking them as zero. * - cfmuld raises IX(Inexact) and UF(underflow) if you multiply two * denorms together, but doesn't set the Denorm bit either. * * Martin Guy <martinwguy@xxxxxxxx> 5 Oct 2007 - 22 Mar 2009 */ #include <stdlib.h> #include <string.h> #include <stdio.h> /* Contents of Maverick Crunch DSPSC register in little-endian mode. * See EP9307 Users Guide, Chapter 2 */ struct DSPSC { #define u unsigned u IO:1; u RSVD1:1; u OF:1; u UF:1; u IX:1; u IOE:1; u RSVD6:1; u OFE:1; u UFE:1; u IXE:1; u RM:2; u Denorm:1; u Invalid:1; u FWDEN:1; u V:1; u FCC:2; u SAT:2; u AEXC:1; u INT:1; u UI:1; u ISAT:1; u RSVD24:2; u HVID:3; u DAID:3; u INST:32; #undef u } dspsc; static void read_dspsc(void); static void write_dspsc(void); static void print_dspsc(void); /* A little union for bit pattern<->float conversion. * We use double, not float, to avoid float->double lossage when values are * passed to printf. */ union u { double d; unsigned long long ull; } u; double one = 1.0; /* Use to test cfmuld behaviour */ main(int argc, char **argv) { u.ull = 0x0000000000000001ULL; /* smallest denormalized value */ puts("Before:"); read_dspsc(); print_dspsc(); printf("u: 0x%016x = %g\n", u.ull, u.d); denorm_add(); /* or denorm_mul() */ puts("After:"); read_dspsc(); print_dspsc(); printf("u: 0x%016x = %g\n", u.ull, u.d); exit(0); } /* Perform Maverick operations on a denormalized value */ denorm_add() { asm("ldr r3,=u"); asm("nop"); /* ward off the evil eye */ asm("cfldrd mvd0,[r3]"); asm("cfaddd mvd1, mvd0, mvd1"); asm("nop"); /* ward off the evil eye */ asm("cfstrd mvd1,[r3]"); } denorm_mul() { asm("ldr r1,=one"); asm("ldr r3,=u"); asm("nop"); /* ward off the evil eye */ asm("cfldrd mvd0,[r3]"); asm("cfldrd mvd1,[r1]"); asm("cfmuld mvd2, mvd0, mvd1"); /* try ,mvd0,mvd0 to see IX,UF */ asm("nop"); /* ward off the evil eye */ asm("cfstrd mvd2,[r3]"); } /* Print the contents of the structure in human-readable form */ static void print_dspsc() { #define d dspsc printf("INST=0x%08x\n", d.INST); printf("DAID=%d HVID=%d ISAT=%d UI=%d INT=%d AEXC=%d SAT=%d\n", d.DAID, d.HVID, d.ISAT, d.UI, d.INT, d.AEXC, d.SAT); printf("FCC=%d V=%d FWDEN=%d Invalid=%d Denorm=%d RM=%d\n", d.FCC, d.V, d.FWDEN, d.Invalid, d.Denorm, d.RM); printf("IXE=%d UFE=%d OFE=%d IOE=%d IX=%d UF=%d OF=%d IO=%d\n", d.IXE, d.UFE, d.OFE, d.IOE, d.IX, d.UF, d.OF, d.IO); #undef d } /* Copy register contents into our structure */ static void read_dspsc() { /* If you put the ldr after the cfmv32sc, you can * trigger an undocumented Maverick hardware bug, whereby * ldr rN, foo; cfstr64 mvdX, [rN] corrupts memory at random. * (Tested on Rev E1 hardware). */ asm("ldr r3, =dspsc"); /* Get address of dspsc */ asm("cfmv32sc mvdx0, dspsc"); /* Get DSPSC contents */ asm("cfstr64 mvdx0, [r3]"); /* Store dspcc contents in dspsc */ } /* Copy our structure's contents into the register */ static void write_dspsc() { asm("ldr r0, =dspsc"); /* Get address of dspsc */ /* no-op to workaround Maverick timing bug: cfldr64 must not busy-wait. * Without this, rubbish is written into all 64 bits of DSPSC. * (Erratum 3, tested on Rev E1 hardware) */ // asm("mov r0, r0"); asm("cfldr64 mvdx0, [r0]"); /* Read our variable into c0 */ asm("cfmvsc32 dspsc, mvdx0"); /* Write to dspcc */ }