- also recall there are 3 'horizontal' slots per instruction: add; mul; control; see: https://github.com/hermanhermitage/videocoreiv-qpu ---------------------------------------- > From: hermanhermitage@xxxxxxxxxxx > To: raspi-internals@xxxxxxxxxxxxx > Subject: RE: [raspi-internals] GPU FFT Disassembly > Date: Sun, 2 Feb 2014 13:39:03 +1200 > > Reminder for anyone looking through the code: > - branches have 3 delay slots. > - registers (ra, rb) have a latency of a cycle or so. > - accumulators (r0, ..., r3) are available back to back. > - the next word from the uniform stream is fetched with: mov rn, unif > - bra is branch absolute (ie really a jump) > - brr is branch relative. > - branches can store the link/return address in a registers. > - .setf means update the cc flags > - .nz, etc are predication (eg. not zero) to choose which SIMD lanes are > active based on the cc flags. > - remember its all lock step (ie one 'PC' per QPU), so brr.allz means branch > if all are zero - ie there is no possibility of diverging flow of control > across the SIMD lanes. > - texture unit 0 looks like its being used for random access (indexed) to the > source data. > - vpm (vertex primitive memory)/vr_setup/vw_setup for vpm transfers. >