I went and rechecked the Scalar Register Update or SRU functionality of the Vector instructions. It only applies to Vector ALU ops (vops), not memory operations (vld, vst, ...). The two basic forms of vector operations: 1111 11 X v:6 r:3 d:10 a:10 F 0 b:10 f_d:6 f_a:6 Ra_x:4 Cond:3 f_i:7 f_b:6 1111 11 X v:6 r:3 d:10 a:10 F 1 k:10 f_d:6 f_a:6 Ra_x:4 Cond:3 f_i:7 j:6 With: Accumulator ==== There is a 32 bit Accumulator associated with each lane. eg. vmov H(0,0), H(16,0) CLRA UACC vmov H(1,0), H(17,0) UACC f_i: 0 1----- ENA Enable Accumulator. 0 -1---- HIGH Apply the accumulation to the top 16 bits of the Accumulator. 0 --1--- SIGN Accumulate values as if they were signed quantities. 0 ---1-- CLRA Clear accumulator before initial instruction execution (doesn't apply to repeated instruction phases). 0 ----1- ACCA Add ALU result to Accumulator. 0 -----1 SUB Sub ALU result from Accumulator. Generally (matching patents and so forth) the useful combinations are: CLRA Clear accumulator on initially entering instruction. UACC (ENA|ACCA) Accumulate as unsigned. UDEC (ENA|ACCA|SUB) Decumulate as unsigned. SACC (ENA|SIGN|ACCA) Accumulate as signed. SDEC (ENA|SIGN|ACCA|SUB) Decumulate as unsigned. UACCH (ENA|HIGH|ACCA) Accumulate as unsigned to high word of accumulator. SACCH (ENA|HIGH|SIGN|ACCA) Accumulate as signed to high word of accumulator. UDECH (ENA|HIGH|ACCA|SUB) Decumulate as unsigned to high word of accumulator. SDECH (ENA|HIGH|SIGN|ACCA|SUB) Decumulate as signed to high word of accumulator. More combinations may be useful. Stay posted. Scalar Register Updates ==== Calculate a function across elements and write to scalar register. Uses the ALU result values from the current operation. eg. vmov -, H(0,0) IMIN r0 ; r0 = i, 0<=i<=15, where H(0+i, 0) <= H(0+j, 0) forall 0<=j<=15 vmov H(1,0), H(0,0), SUMS r1 ; r1 = H(0,0)+H(1,0)+...+H(15,0) f_i: 1 000sss SUMU rs rs = unsigned sum of ALU result. (0 if no lanes active). 1 001sss SUMS rs rs = signed sum of ALU result. (1 if no lanes active). 1 010sss max2 rs 1 011sss IMIN rs rs = index of ALU lane containing smallest signed value (-1 if no lanes active). 1 100sss max4 rs 1 101sss IMAX rs rs = index of ALU lane containing largest signed value (-1 if no lanes active). 1 110sss max6 rs 1 111sss MAX rs rs = max signed element of ALU result (as a signed quantity) (MIN_INT if no lanes active). The maxn operations seem to produce same result as MAX but generally are not used in the existing blob. Cheers HH.