[linux-cirrus] Re: Yet another MaverickCrunch hardware bug?

On Thu, 26 Mar 2009 16:29 +0000, "Martin Guy" <martinwguy@xxxxxxxx>
> > Do any of the exception flags get set for the other Maverick Crunch
> >  opcodes?
> Yes, I've seen the underflow and overflow and inexact bits go up.

Good to know.
> > Not support denorm is more of a problem.
> Not for me. I'm trying to make it do the best real time audio it can,
> so an dealing in samples in the range -1 to +1 (plus
> analysis/resynthesis of these).
> Which applications care whether the ground is at 2^-1022 or 2^-1074,
> or is it more a question of scientific apps thinking they know the
> smallest possible number, using it as an edge case and testing against
> it?

For real-time audio, it doesn't matter, because you're talking about
such low volumes.  You can just add a DC offset, or force
subnormals-to-zero.  Incidentally if you're forcing subnormals to zero,
then you shouldn't honor signed zeros.

denormal = subnormal = gradual underflow

Doesn't glibc and c99 require Full IEEE compliance?  
I guess it depends on what flags you're passing to gcc, and the

In any case, there appear to be a few properties/flags built into GCC,
that we can use, e.g.

HONOR_SIGNED_ZEROS - if this is disabled, then more of the
MaverickCrunch HW operations can be used, since -0 == +0, then.  Not
sure if any of them are actually faster than what you've replaced them
maybe we should do something similar

In any case, I think the trig functions in libm (glibc) need it.

I think there are other FPUs that have a -mieee flag, i.e. Alpha.  This
is in old gcc code, though.

http://www.tybor.com/ has a few C99 FPCE test suites that will probably
still fail.

> > Looks very evil.  May I propose that we modify the softfloat routines?
> That's an interesting idea. A faster trick might be to use one of the
> very instructions that take denormal inputs as zero and emit code
> fragments to test the parameters using them. For example, when -mieee,
> for
>     cfaddd mvd0, mvd1, mvd2
> with mvd3 as a scratch register, emit something like
>     cfcpyd mvd3, mvd1   @ test first parameter
>     cfcmpd mvd3, #0     @  zero or denorm?
>     beq L55             @ yes. use softfloat version
>     cfcpyd mvd3, mvd2   @ test second parameter
>     cfcmpd mvd3, #0    @  zero or denorm?
>     beq L55             @ yes. use softfloat version
>     cfaddd mvd0, mvd1, mvd2   @ neither param is subnormal.
> L56
> and at L55 have a code fragment that stacks ARM regs r0-r3, moves the
> arguments into them, calls the softfloat routines, moves the result
> back to mvd0,restores r0-r3 and branches to L56. That way the code
> would go in a straight line most of the time without even having to do
> function calls. However implementing that's not on my horizon since I
> don't need denormalized values.

An even better idea.  A little performance hit for all operations, but
could be fine-tuned to only do a post operation check or do both
pre/post operation, depending on the expected usage.

for: cfaddd %0, %1, %2

do this - bigger hit for normals, since comparison is pre&post:
    cfcmpd %0, #0 @ %0 zero/denorm?
    cfcmpdne %1, #0 @ %1 zero/denorm? only run if %0 is norm
    beq L55 @ one of the operands is zero/denorm
    cfaddd %0, %1, %2
    cfcmpd %2, #0 @ result zero/denorm - probably not possible for
    cfaddd, maybe possible for other operations?
    bne L56 @ recalculate
L55 @ do softfloat here

or this - bigger hit for subnormals or zero result, since comparison is
only post:
    cfaddd %0, %1, %2
    cfcmpd %2, #0 @ result zero/denorm?
    bne L56 @ recalculate
    @ do softfloat here

Anyhow, there's probably no point in implementing all this until
everything else is working correctly.

Other related posts: