[raspi-internals] Re: GPU FFT Disassembly

  • From: Scott Mansell <phiren@xxxxxxxxx>
  • To: raspi-internals@xxxxxxxxxxxxx
  • Date: Mon, 3 Feb 2014 07:13:11 +1300

Looks like Broadcom have encoded some extra operands in the .never
conditional space of the ldi opcodes.
My guess would be that these control codes for the rotator that we never
found before.

The patent describes one rotator with 16 possible rotations, but it looks
like we have 2 rotators with 16 possible rotations, and each rotator is
always set to be the opposite of the other one (1 == 9, 2 == a, 3 == b, 4
== c, 5 == d, 6 == e, 7 == f) which would make a large chunk of that code a
jumptable just to set the correct rotation.

I would also be suspicious about the mov.never and add.never instructions.


____________
Scott Mansell


On Sun, Feb 2, 2014 at 3:12 PM, Herman Hermitage <
hermanhermitage@xxxxxxxxxxx> wrote:

> - also recall there are 3 'horizontal' slots per instruction: add; mul;
> control;
> see: https://github.com/hermanhermitage/videocoreiv-qpu
>
> ----------------------------------------
> > From: hermanhermitage@xxxxxxxxxxx
> > To: raspi-internals@xxxxxxxxxxxxx
> > Subject: RE: [raspi-internals] GPU FFT Disassembly
> > Date: Sun, 2 Feb 2014 13:39:03 +1200
> >
> > Reminder for anyone looking through the code:
> > - branches have 3 delay slots.
> > - registers (ra, rb) have a latency of a cycle or so.
> > - accumulators (r0, ..., r3) are available back to back.
> > - the next word from the uniform stream is fetched with: mov rn, unif
> > - bra is branch absolute (ie really a jump)
> > - brr is branch relative.
> > - branches can store the link/return address in a registers.
> > - .setf means update the cc flags
> > - .nz, etc are predication (eg. not zero) to choose which SIMD lanes are
> active based on the cc flags.
> > - remember its all lock step (ie one 'PC' per QPU), so brr.allz means
> branch if all are zero - ie there is no possibility of diverging flow of
> control across the SIMD lanes.
> > - texture unit 0 looks like its being used for random access (indexed)
> to the source data.
> > - vpm (vertex primitive memory)/vr_setup/vw_setup for vpm transfers.
>

Other related posts: