>> I recently sent Adam an updated version of libsha1.c, which is about >> 20% faster on x86 machines. After some more thought in this vein, I'd >> like to help write a set of "fast minting" routines which maximise >> sustained minting performance on a given class of machines. > > In that case, I'd like to help out if I may. Optimising assembly > language is > a hobby of mine. Although I mostly have experience of PPC so it seems > we > overlap a bit there. > >> The minter routine would then append the random salt string, which >> would be at least 64 (or 96) bits worth but extended (if necessary) to >> allow minting to progress on a single SHA-1 block - which is much more >> efficient than calculating SHA-1 afresh for the entire message. > > A couple of further suggestions: first, some of the hash computation > can be > calculated just once. For example, in the stamp > > 1:20:040428:foo::0298b26853c8ebb3:beca0 > > there are 34 bytes up to and including the last ':'. This means that > the > first eight words of the block are constant, so the first eight steps > of the > 80 can be calculated once and then re-used. This could mean a 10% > improvement, roughly. > > Second, the random salt string could even be extended to take up more > of the > block. Suppose the stamp above is extended to have 48 bytes up to the > last > ':', then 12 words remain constant and hence 12 steps out of 80 don't > need to > be repeated, a possible 15% speedup. Good catch. This could fairly easily be implemented by adding an inner loop to each routine, although it does make the code slightly messier. I'll keep it in mind. At the moment, I'm filling in random data until the following conditions are satisfied: - There are at least 16 characters of random (96 bits, as it's in base-64). - After a separator, an 8-character count field (enough for a 32-bit count in base-16) won't straddle a block boundary. It would be easy enough to change the second condition to something like "...will be aligned at the end of the block". > Obviously this leaves us looking at a trade-off between stamp size and > minting time. Indeed. In this scheme, the possible maximum-efficiency stamp sizes are 55, 55+64, 55+128, etc. (55 is the maximum byte count in the final block - there needs to be space for the end-of-message marker and the 8-byte message bit count). That means we could potentially be looking at a line-full (63 bytes) of extra padding near the end of the stamp, depending on the size of the resource and extension fields. Does this matter? >> Hexadecimal and base64 strings are both feasible, though the tail of >> the salt string may be more efficiently coded in some subset of >> base64, >> eg. the first 16 letters of the alphabet. > > I agree, this would make advancing through the counter values cleaner > and faster. My point exactly. At the moment, however, I'm just using an index into a base-64 alphabet string, which appears to be fast enough for now. When we get down to the really fine detail, we'll be able to see whether this overhead is significant or not. The compiler seems to boil it down to an indexed byte load and byte store, which is nearly trivial. It is also easily overlapped during the beginning of a multi-pipe algorithm. If the tail will be in a fixed position, it gets even cheaper. > This could also depend on the particular mint function, for example, > an optimised PPC G4 function might use one set of vector registers and > two > sets of integer registers to implement a six pipeline minter. Then it > might > be convenient to use the first 24 letters of the alphabet, and perhaps > a > simple matter to use both upper and lower case for a total of 48 > different > values per byte. (The G4 can carry out one vector and two integer ops > per > clock cycle, so this is a likely scenario). That depends on the particular G4 you're talking about (the 7400 and 745x are noticeably different), and the G5 is different again, but yes, the principle is valid. At the moment, I'm working on functions which try to be generically efficient on any compatible processor, rather than trying to micro-optimise for any particular model. I think I'll leave that part up to you. :) -------------------------------------------------------------- from: Jonathan "Chromatix" Morton mail: chromi@xxxxxxxxxxxxxxxxxxxxx website: http://www.chromatix.uklinux.net/ tagline: The key to knowledge is not to rely on people to teach you it.