Corrections: Accumulator ==== There is a signed 48 bit Accumulator associated with each lane. It saturates to MAX 0x7fffffffffff and MIN 0x800000000000. f_i: 0 1----- ENA Enable Normal Accumulation function. 0 -1---- HIGH Apply the accumulation to the top 32 bits of the Accumulator. 0 --1--- SIGN Accumulate values as if they were signed quantities. 0 ---1-- CLRA Clear accumulator before initial instruction execution (doesn't apply to repeated instruction phases). 0 ----1- WBA Write back new value to accumulator. 0 -----1 SUB Sub ALU result from Accumulator. Clearing accmulator: ==== CLRA Clear accumulator on initially entering instruction. Accumulation/Decumulation ==== UACC (ENA|WBA) Accumulate with unsigned value. UDEC (ENA|WBA|SUB) Decumulate with unsigned value. SACC (ENA|SIGN|WBA) Accumulate with signed value. SDEC (ENA|SIGN|WBA|SUB) Decumulate with unsigned value. UACCH (ENA|HIGH|WBA) Accumulate with unsigned value to high word of accumulator. UDECH (ENA|HIGH|WBA|SUB) Decumulate with unsigned value to high word of accumulator. SACCH (ENA|HIGH|SIGN|WBA) Accumulate with signed value to high word of accumulator. SDECH (ENA|HIGH|SIGN|WBA|SUB) Decumulate with signed value to high word of accumulator. The parts of the names are: U | S -> Unsigned, Signed value ACC -> Accumulate: ie ACC[i] += unsigned(D[i]), or ACC[i] += signed(D[i])) DEC -> Decumulate: ie ACC[i] -= unsigned(D[i]), or ACC[i] -= signed(D[i])) H -> Use high 32 bits of accumulator. Add/Sub Accumulator to result value (without accumulator update): === (The key difference being the ACC[i] values are _not_ updated by the operation). UADDA (ENA) D'[i] = ACC[i]+unsigned(D[i]). USUBA (ENA|SUB) D'[i] = ACC[i]-unsigned(D[i]). SADDA (ENA|SIGN) D'[i] = ACC[i]+signed(D[i]). SSUBA (ENA|SIGN|SUB) D'[i] = ACC[i]-signed(D[i]). UADDAH (ENA|HIGH) D'[i] = ACCH[i]+unsigned(D[i]). USUBAH (ENA|HIGH|SUB) D'[i] = ACCH[i]-unsigned(D[i]). SADDAH (ENA|HIGH|SIGN) D'[i] = ACCH[i]+signed(D[i]). SSUBAH (ENA|HIGH|SIGN|SUB) D'[i] = ACCH[i]-signed(D[i]). These names are speculative (I dont have a solid reference point). The parts of the names are: U, S -> Unsigned or signed ADD -> Add accumulator to result SUB -> Sub result from accumulator H -> Use high 32 bits of accumulator. I'll try and post the equivalent C code to make the computations clearer. We still have an unknown when ENA bit is clear. From inspection in the blob it may relate to the SETF/IFxx flags (Phire's idea). I'm still poking around. Cheers Herman