This update includes two hand-optimised, double-pipe Altivec minters.
They're essentially identical in performance to the first one on my
7450, which suggests that it isn't going to get any faster on that
processor without actually reducing the amount of work to do, or
introducing even more parallel pipes using the scalar side. They might
run faster on a 7400 or (especially) 970, though, as they've been
carefully scheduled with all three processors' characteristics in mind.
The "compact" version is a little unusual in design. The Wf() calculations are done using a hybrid technique, whereby the 16-entry circular buffer from "Method B" is held in registers, eliminating a rather large number of load instructions from the loop, but stored as an 80-entry W buffer on the fly. The rounds are then calculated by loading one entry from the W buffer per round, like a normal compact routine - it turns out the overhead from this is completely hidden by ILP, even on the G4 with it's rather weak out-of-order capability.
I think my next update will be to implement Malcolm's work-reduction techniques into one or more of the minters. I should be able to arrange things so the same minter can do general-purpose minting, as well as optimised versions for counting in word 7 or 12. I'll also put in options for the front-end routine to automatically pad the counter in various ways.
Following that, I think I'll try to write a couple of optimised PowerPC scalar routines, since the compiler is obviously not up to the task by itself. I intend to get maximum performance out of older PowerPCs, like the G3, 603e and 604e (though I don't have a 604e to test with, and it may take some effort to get my 603e running). I also have a 601 that works, although I will need to make further adjustments to make some of the helper routines run on Classic MacOS.
-------------------------------------------------------------- from: Jonathan "Chromatix" Morton mail: chromi@xxxxxxxxxxxxxxxxxxxxx website: http://www.chromatix.uklinux.net/ tagline: The key to knowledge is not to rely on people to teach you it.