This update includes two hand-optimised, double-pipe Altivec minters.
They're essentially identical in performance to the first one on my
7450, which suggests that it isn't going to get any faster on that
processor without actually reducing the amount of work to do, or
introducing even more parallel pipes using the scalar side. They might
run faster on a 7400 or (especially) 970, though, as they've been
carefully scheduled with all three processors' characteristics in mind.
The "compact" version is a little unusual in design. The Wf()
calculations are done using a hybrid technique, whereby the 16-entry
circular buffer from "Method B" is held in registers, eliminating a
rather large number of load instructions from the loop, but stored as
an 80-entry W buffer on the fly. The rounds are then calculated by
loading one entry from the W buffer per round, like a normal compact
routine - it turns out the overhead from this is completely hidden by
ILP, even on the G4 with it's rather weak out-of-order capability.
I think my next update will be to implement Malcolm's work-reduction
techniques into one or more of the minters. I should be able to
arrange things so the same minter can do general-purpose minting, as
well as optimised versions for counting in word 7 or 12. I'll also put
in options for the front-end routine to automatically pad the counter
in various ways.
Following that, I think I'll try to write a couple of optimised PowerPC
scalar routines, since the compiler is obviously not up to the task by
itself. I intend to get maximum performance out of older PowerPCs,
like the G3, 603e and 604e (though I don't have a 604e to test with,
and it may take some effort to get my 603e running). I also have a 601
that works, although I will need to make further adjustments to make
some of the helper routines run on Classic MacOS.
-------------------------------------------------------------- from: Jonathan "Chromatix" Morton mail: chromi@xxxxxxxxxxxxxxxxxxxxx website: http://www.chromatix.uklinux.net/ tagline: The key to knowledge is not to rely on people to teach you it.