[raspi-internals] Re: GPU FFT Disassembly

  • From: Herman Hermitage <hermanhermitage@xxxxxxxxxxx>
  • To: "raspi-internals@xxxxxxxxxxxxx" <raspi-internals@xxxxxxxxxxxxx>
  • Date: Sun, 2 Feb 2014 14:12:37 +1200

- also recall there are 3 'horizontal' slots per instruction: add; mul; control;
see: https://github.com/hermanhermitage/videocoreiv-qpu

----------------------------------------
> From: hermanhermitage@xxxxxxxxxxx
> To: raspi-internals@xxxxxxxxxxxxx
> Subject: RE: [raspi-internals] GPU FFT Disassembly
> Date: Sun, 2 Feb 2014 13:39:03 +1200
>
> Reminder for anyone looking through the code:
> - branches have 3 delay slots.
> - registers (ra, rb) have a latency of a cycle or so.
> - accumulators (r0, ..., r3) are available back to back.
> - the next word from the uniform stream is fetched with: mov rn, unif
> - bra is branch absolute (ie really a jump)
> - brr is branch relative.
> - branches can store the link/return address in a registers.
> - .setf means update the cc flags
> - .nz, etc are predication (eg. not zero) to choose which SIMD lanes are 
> active based on the cc flags.
> - remember its all lock step (ie one 'PC' per QPU), so brr.allz means branch 
> if all are zero - ie there is no possibility of diverging flow of control 
> across the SIMD lanes.
> - texture unit 0 looks like its being used for random access (indexed) to the 
> source data.
> - vpm (vertex primitive memory)/vr_setup/vw_setup for vpm transfers.          
>                           

Other related posts: