[raspi-internals] Review: 80 bit vector instructions - scalar register update & accumulator

  • From: Herman Hermitage <hermanhermitage@xxxxxxxxxxx>
  • To: "raspi-internals@xxxxxxxxxxxxx" <raspi-internals@xxxxxxxxxxxxx>
  • Date: Mon, 1 Jul 2013 14:51:57 +1200

I went and rechecked the Scalar Register Update or SRU functionality of the 
Vector instructions.
It only applies to Vector ALU ops (vops), not memory operations (vld, vst, ...).

The two basic forms of vector operations:

  1111 11 X v:6 r:3 d:10 a:10 F 0 b:10 f_d:6 f_a:6 Ra_x:4 Cond:3 f_i:7 f_b:6
  1111 11 X v:6 r:3 d:10 a:10 F 1 k:10 f_d:6 f_a:6 Ra_x:4 Cond:3 f_i:7 j:6

With:

Accumulator
====

There is a 32 bit Accumulator associated with each lane.

eg.
  vmov H(0,0), H(16,0) CLRA UACC
  vmov H(1,0), H(17,0) UACC


f_i:
  0 1-----  ENA   Enable Accumulator.
  0 -1----  HIGH  Apply the accumulation to the top 16 bits of the Accumulator.
  0 --1---  SIGN  Accumulate values as if they were signed quantities.
  0 ---1--  CLRA  Clear accumulator before initial instruction execution 
(doesn't apply to repeated instruction phases).
  0 ----1-  ACCA  Add ALU result to Accumulator.
  0 -----1  SUB   Sub ALU result from Accumulator.

Generally (matching patents and so forth) the useful combinations are:

  CLRA                               Clear accumulator on initially entering 
instruction.
  UACC  (ENA|ACCA)                   Accumulate as unsigned.
  UDEC  (ENA|ACCA|SUB)               Decumulate as unsigned.
  SACC  (ENA|SIGN|ACCA)              Accumulate as signed.
  SDEC  (ENA|SIGN|ACCA|SUB)          Decumulate as unsigned.
  UACCH (ENA|HIGH|ACCA)              Accumulate as unsigned to high word of 
accumulator.
  SACCH (ENA|HIGH|SIGN|ACCA)         Accumulate as signed to high word of 
accumulator.
  UDECH (ENA|HIGH|ACCA|SUB)          Decumulate as unsigned to high word of 
accumulator.
  SDECH (ENA|HIGH|SIGN|ACCA|SUB)     Decumulate as signed to high word of 
accumulator.

More combinations may be useful.  Stay posted.

Scalar Register Updates
====

Calculate a function across elements and write to scalar register.
Uses the ALU result values from the current operation.

eg.
  vmov -, H(0,0) IMIN r0       ; r0 = i, 0<=i<=15, where H(0+i, 0) <= H(0+j, 0) 
forall 0<=j<=15
  vmov H(1,0), H(0,0), SUMS r1 ; r1 = H(0,0)+H(1,0)+...+H(15,0)

f_i:
  1 000sss  SUMU rs    rs = unsigned sum of ALU result. (0 if no lanes active).
  1 001sss  SUMS rs    rs = signed sum of ALU result. (1 if no lanes active).
  1 010sss  max2 rs
  1 011sss  IMIN rs    rs = index of ALU lane containing smallest signed value 
(-1 if no lanes active).
  1 100sss  max4 rs
  1 101sss  IMAX rs    rs = index of ALU lane containing largest signed value 
(-1 if no lanes active).
  1 110sss  max6 rs
  1 111sss  MAX  rs     rs = max signed element of ALU result (as a signed 
quantity) (MIN_INT if no lanes active).

The maxn operations seem to produce same result as MAX but generally are not 
used in the existing blob.


Cheers
HH.                                       

Other related posts: