[raspi-internals] Re: QPU Tutorials/Samples

  • From: Herman Hermitage <hermanhermitage@xxxxxxxxxxx>
  • To: "raspi-internals@xxxxxxxxxxxxx" <raspi-internals@xxxxxxxxxxxxx>
  • Date: Thu, 20 Feb 2014 03:11:35 +1200

> Can you also push the experiment fixture and code to the git? It will  
> make life easier for development later on (and test that we didn't  
> break anything) 

There is something basic there in the qpu-sniff directory.  It seems it was 
broken due to vcdbg switching some of its output from stdout to stderr.  I made 
a fix, hopefully it works now.

If you run dis-fs.sh it runs each shader in a fs/ directory and captures and 
disassembles the fragments.  I was mainly using it to understand more in the 
early days.  As it stands its just for capturing shaders rather than modifying 
them in memory.

$ ./dis-fs.sh
Disassembling fs/add.fs
...

$ nodejs ../qpu-tutorial/qpuasm.js --ignore-errors fs/add.fs.qdis
/* Exported Symbols */

/* Assembled Program */
/* 0x00000000: */ 0x15827d80, 0x10020827, /* mov r0, unif */
/* 0x00000008: */ 0x01827c00, 0x40020867, /* fadd r1, unif, r0; nop; sbwait */
/* 0x00000010: */ 0x15827d80, 0x10020827, /* mov r0, unif */
/* 0x00000018: */ 0x01827c00, 0x10020827, /* fadd r0, unif, r0 */
/* 0x00000020: */ 0x95827d80, 0x114248a0, /* mov r2, unif; mov r0.8a, r0 */
fs/add.fs.qdis:21
Assembly check failed. Got 0x95827d80 0x114248a0 expected 0x95827d80 0x114258a0
/* 0x00000028: */ 0x81827c89, 0x11524860, /* fadd r1, unif, r2; mov r0.8b, r1 */
fs/add.fs.qdis:22
Assembly check failed. Got 0x81827c89 0x11524860 expected 0x81827c89 0x11525860
/* 0x00000030: */ 0x95827d89, 0x11624860, /* mov r1, unif; mov r0.8c, r1 */
fs/add.fs.qdis:23
Assembly check failed. Got 0x95827d89 0x11624860 expected 0x95827d89 0x11625860
/* 0x00000038: */ 0x01827c40, 0x10020867, /* fadd r1, unif, r1 */
/* 0x00000040: */ 0x809e7009, 0x317049e0, /* nop; mov r0.8d, r1; thrend */
fs/add.fs.qdis:25
Assembly check failed. Got 0x809e7009 0x317049e0 expected 0x809e7009 0x317059e0
/* 0x00000048: */ 0x159e7000, 0x10020ba7, /* mov tlbc, r0 */
/* 0x00000050: */ 0x009e7000, 0x500009e7, /* nop; nop; sbdone */
/* 0x00000058: */ 0x15827d80, 0x10120027, /* mov ra0.16a, unif */
/* 0x00000060: */ 0x15827d80, 0x10220027, /* mov ra0.16b, unif */
/* 0x00000068: */ 0x15827d80, 0x10021c67, /* mov vw_setup, unif */
/* 0x00000070: */ 0x15827d80, 0x10020c27, /* mov vpm, unif */
/* 0x00000078: */ 0x15827d80, 0x10020c27, /* mov vpm, unif */
/* 0x00000080: */ 0x15827d80, 0x10020c27, /* mov vpm, unif */
/* 0x00000088: */ 0x15827d80, 0x10020c27, /* mov vpm, unif */
/* 0x00000090: */ 0x95020dbf, 0x10024c20, /* mov vpm, ra0; mov r0, unif */
/* 0x00000098: */ 0x01827c00, 0x10020c27, /* fadd vpm, unif, r0 */
/* 0x000000a0: */ 0x15827d80, 0x10020c27, /* mov vpm, unif */
/* 0x000000a8: */ 0x009e7000, 0x300009e7, /* nop; nop; thrend */
/* 0x000000b0: */ 0x009e7000, 0x100009e7, /* nop */
/* 0x000000b8: */ 0x009e7000, 0x100009e7, /* nop */
/* 0x000000c0: */ 0x15827d80, 0x10120027, /* mov ra0.16a, unif */
/* 0x000000c8: */ 0x15827d80, 0x10220027, /* mov ra0.16b, unif */
/* 0x000000d0: */ 0x15827d80, 0x10021c67, /* mov vw_setup, unif */
/* 0x000000d8: */ 0x95020dbf, 0x10024c20, /* mov vpm, ra0; mov r0, unif */
/* 0x000000e0: */ 0x01827c00, 0x10020c27, /* fadd vpm, unif, r0 */
/* 0x000000e8: */ 0x15827d80, 0x10020c27, /* mov vpm, unif */
/* 0x000000f0: */ 0x009e7000, 0x300009e7, /* nop; nop; thrend */
/* 0x000000f8: */ 0x009e7000, 0x100009e7, /* nop */
/* 0x00000100: */ 0x009e7000, 0x100009e7, /* nop */

The simplest change to the assembler to get it to stop complaining (and keep it 
compatible with the other test cases I'm throwing at it from the blob) would be 
to check that if wa and wb are both one of r0, r1, r2, r3, then ignore the X 
bit.

(Actually I just implemented it now,  use --strictmatch to force a bit-by-bit 
perfect match).

> eg. 
>   - X=0, or X=1 when a name (eg, r0..r3) is available in both bank a and  
> bank b as a destination.  eg.  mov r0, r1; mov r2, r3 can be packed  
> with X=0 or X=1. 
> Is there performance impact of any sort? 

I dont think so.

> I've been thinking of doing a reference vpuasm.js for the VPU, mainly  
>  ...

> This will be great, especially if it could be combined with a test  
> fixture like qpu-02.c, allowing simple execution and result collection  
> of VPU running kernels.  The real keys to the kingdom is CPU-VPU-QPU  
> data round-trip. Now, for that cooling tower for my Pi :) 

Hah.

Yes VPU fixture would be good - what I use at the moment isnt fit for human 
consumption.  Maybe even a VPU and QPU instruction simulator in the fullness of 
time to allow a "dry run" of code to be done with great debugging insight than 
a complete system lockup :)

/hh                                       

Other related posts: