[raspi-internals] Re: GPU FFT Disassembly

  • From: Herman Hermitage <hermanhermitage@xxxxxxxxxxx>
  • To: "raspi-internals@xxxxxxxxxxxxx" <raspi-internals@xxxxxxxxxxxxx>
  • Date: Sun, 2 Feb 2014 13:39:03 +1200

Reminder for anyone looking through the code:
- branches have 3 delay slots.
- registers (ra, rb) have a latency of a cycle or so.
- accumulators (r0, ..., r3) are available back to back.
- the next word from the uniform stream is fetched with: mov rn, unif
- bra is branch absolute (ie really a jump)
- brr is branch relative.
- branches can store the link/return address in a registers.
- .setf means update the cc flags
- .nz, etc are predication (eg. not zero) to choose which SIMD lanes are active 
based on the cc flags.
- remember its all lock step (ie one 'PC' per QPU), so  brr.allz means branch 
if all are zero - ie there is no possibility of diverging flow of control 
across the SIMD lanes.
- texture unit 0 looks like its being used for random access (indexed) to the 
source data.
- vpm (vertex primitive memory)/vr_setup/vw_setup for vpm transfers.            

Other related posts: