[raspi-internals] Re: 24 GFLOPS QPUs

  • From: Herman Hermitage <hermanhermitage@xxxxxxxxxxx>
  • To: "raspi-internals@xxxxxxxxxxxxx" <raspi-internals@xxxxxxxxxxxxx>
  • Date: Sun, 4 Aug 2013 15:12:34 +1200

I've now aligned the QPU mnemonics.

Here is a sample of the output:
- Note the first 'shader code' segment is the QPU fragment matching the 
fragment shader.
- The second fragment is the full vertex shader.
- The third is the "coordinate" shader fragment which corresponds to the 
portion of the vertex shader generating coordinates (used by the tiling engine, 
before deciding to commit vertices/fragments/triangles to a full vertex shader).
So the shaders are listed in the reverse order they would be used.

Meanings:
tlbc - tile buffer (fragment) color
unif - uniform memory.  this restarts each time the shader is called and is 
used to step thru the uniform values.
vpm - vertex and primitive memory
r4 - is the result fifo from math calculations (recip, log, etc...)

I put a whole host of samples at:
 https://github.com/hermanhermitage/videocoreiv-qpu/tree/master/qpu-sniff/fs

Hopefully in a week or so I'll get time to post some samples of injecting our 
own QPU code and DMA to load/store data from VPM to physical memory.

Cheers
HH.

$ ./qpu-sniff --testgl vs/simple.vs fs/simple.fs

vs/simple.vs:                                                                   
                                                                                
                                                       
attribute vec4 vertex;                                                          
                                                                                
                                                       
void main(void) {                                                               
                                                                                
                                                       
  gl_Position = vertex;                                                         
                                                                                
                                                       
}                                                                               
                                                                                
                                                       
                                                                                
                                                                                
                                                       
fs/simple.fs:                                                                   
                                                                                
                                                       
uniform vec4 c1;                                                                
                                                                                
                                                       
void main(void) {                                                               
                                                                                
                                                       
  gl_FragColor = c1;                                                            
                                                                                
                                                       
}  

('shader code' 18402720 56)
00000000: 80827036 114059e0 nop; mov r0.8a, unif
00000002: 80827036 415059e0 nop; mov r0.8b, unif; sbwait
00000004: 80827036 116059e0 nop; mov r0.8c, unif
00000006: 80827036 117059e0 nop; mov r0.8d, unif
00000008: 159e7000 30020ba7 mov tlbc, r0; nop; thrend
0000000a: 009e7000 100009e7 nop
0000000c: 009e7000 500009e7 nop; nop; sbdone

('shader code' 18402780 216)
00000000: 15827d80 10020c67 mov vr_setup, unif
00000002: 95c20dbf 100240f1 mov ra3, vpm; mov vw_setup, unif
00000004: 15c27d80 10020027 mov ra0, vpm
00000006: 950f0ff6 10025030 mov rb0, vpm; mov vpm, ra3
00000008: 20020037 100059e1 nop; fmul r1, ra0, unif
0000000a: 95030ff6 10024070 mov ra1, vpm; mov vpm, ra0
0000000c: 200e0037 100059e2 nop; fmul r2, ra3, unif
0000000e: 95040dbf 10024d30 mov recip, ra1; mov vpm, rb0
00000010: 15067d80 10020c27 mov vpm, ra1
00000012: 2080003e 100059e3 nop; fmul r3, rb0, unif
00000014: 20067034 100059e0 nop; fmul r0, ra1, r4
00000016: 02827c00 10020827 fsub r0, unif, r0
00000018: 209e7020 100059c2 nop; fmul ra2, r4, r0
0000001a: 009e7000 100009e7 nop
0000001c: 200a700e 100059e0 nop; fmul r0, r1, ra2
0000001e: 210a01d6 10024821 fadd r0, r0, unif; fmul r1, r2, ra2
00000020: 270a701e 100248a3 ftoi r2, r0; fmul r3, r3, ra2
00000022: 01827380 10020827 fadd r0, r1, unif
00000024: 079e7000 10120127 ftoi ra4.16a, r0
00000026: 81827792 10225804 fadd r0, r3, unif; mov ra4.16b, r2
00000028: 009e7000 100009e7 nop
0000002a: 15127d80 10020c27 mov vpm, ra4
0000002c: 159e7000 10020c27 mov vpm, r0
0000002e: 150a7d80 10020c27 mov vpm, ra2
00000030: 009e7000 300009e7 nop; nop; thrend
00000032: 009e7000 100009e7 nop
00000034: 009e7000 100009e7 nop

('shader code' 18d029e0 208)
00000000: 15827d80 10020c67 mov vr_setup, unif
00000002: 95c20dbf 100248b1 mov r2, vpm; mov vw_setup, unif
00000004: 20c20037 100059e1 nop; fmul r1, vpm, unif
00000006: 35c20d97 100248e2 mov r3, vpm; fmul r2, r2, unif
00000008: 35c20d9f 10024023 mov ra0, vpm; fmul r3, r3, unif
0000000a: 009e7000 100009e7 nop
0000000c: 15027d80 10020d27 mov recip, ra0
0000000e: 009e7000 100009e7 nop
00000010: 009e7000 100009e7 nop
00000012: 20027034 100059e0 nop; fmul r0, ra0, r4
00000014: 02827c00 10020827 fsub r0, unif, r0
00000016: 209e7020 100059c0 nop; fmul ra0, r4, r0
00000018: 009e7000 100009e7 nop
0000001a: 2002700e 100059e0 nop; fmul r0, r1, ra0
0000001c: 210201d6 10024821 fadd r0, r0, unif; fmul r1, r2, ra0
0000001e: 2702701e 100248a3 ftoi r2, r0; fmul r3, r3, ra0
00000020: 01827380 10020827 fadd r0, r1, unif
00000022: 079e7000 10120067 ftoi ra1.16a, r0
00000024: 81827792 10225801 fadd r0, r3, unif; mov ra1.16b, r2
00000026: 009e7000 100009e7 nop
00000028: 15067d80 10020c27 mov vpm, ra1
0000002a: 159e7000 10020c27 mov vpm, r0
0000002c: 15027d80 10020c27 mov vpm, ra0
0000002e: 009e7000 300009e7 nop; nop; thrend
00000030: 009e7000 100009e7 nop
00000032: 009e7000 100009e7 nop

----------------------------------------
> From: hermanhermitage@xxxxxxxxxxx
> To: raspi-internals@xxxxxxxxxxxxx
> Subject: RE: 24 GFLOPS QPUs
> Date: Thu, 1 Aug 2013 14:10:51 +1200
>
>> The assembly it spits out is still raw (no names for operations), will 
>> populate it again shortly... just working out which way to go with the names.
>> Unlike the VPU, for the 3d we know the names as some debug builds of the 
>> blob contain the relevant strings.
>
> Possibly contentious for a clean room effort - but I've decided to go with 
> the names as per the blob.  I will switch to those on the github wiki.
>
> They can be extracted (from very early releases) using:
>   strings /boot/vlls/khronos.vll
>
> Alternatively its possible to load the vll and run (quite tricky) the 
> included fragment disassembler:
>    329: 1002db80  1278 FUNC    GLOBAL DEFAULT    1 glsl_qdisasm_instruction
>    330: 1002e11c   416 FUNC    GLOBAL DEFAULT    1 glsl_qdisasm_with_uniform
>
> Feeding this lots of different combinations of bits allows the correct 
> instruction names to be inferred.
>
> Cheers
> HH.                                     

Other related posts: