[raspi-internals] Re: QPU Tutorials/Samples

  • From: Shachar Raindel <shacharr@xxxxxxxxx>
  • To: raspi-internals <raspi-internals@xxxxxxxxxxxxx>
  • Date: Wed, 19 Feb 2014 13:43:11 +0200

On Wed, Feb 19, 2014 at 12:09 PM, Herman Hermitage <
hermanhermitage@xxxxxxxxxxx> wrote:

> I've pushed an update to qpuasm.js with some experimental packing support.
Cool! Will try to play with this support later on.

I've tried round tripping some shader fragments sniffed from opengl,
> disassembled with qpudis.c and then assembled with qpuasm.js.  Mostly
> working, some issues with arbitrary way instructions may sometimes be
> packed:
Can you also push the experiment fixture and code to the git? It will make
life easier for development later on (and test that we didn't break

> eg.
>  - X=0, or X=1 when a name (eg, r0..r3) is available in both bank a and
> bank b as a destination.  eg.  mov r0, r1; mov r2, r3 can be packed with
> X=0 or X=1.
Is there performance impact of any sort?

 - raX.pack as a mulop destination can be packed with either packmode=0 or
> packmode=1 (if not constrained by other parts of the instruction).
> I was thinking that the OpenGL ES shader fragments are computer generated
> and not passing through an assembler, where as the Blob itself contains qpu
> code that was probably hand written in assembler.  As such I'll probably
> try to get a bit match for a round trip (disassemble then reassemble) of
> Blob routines, and let the dynamic shaders slide.
> (Of course the assembler could be extended to have a instruction
> equivalence test in addition to a simple bit test...).

That would be nice, it will also help us make sure we understand everything
there (and start the field of GPU kernels obfuscation ;) )

> I've been thinking of doing a reference vpuasm.js for the VPU, mainly
> because I like to have a simple too to experiment with, and it might be a
> useful syntax reference for those working on more serious tools.

This will be great, especially if it could be combined with a test fixture
like qpu-02.c, allowing simple execution and result collection of VPU
running kernels.  The real keys to the kingdom is CPU-VPU-QPU data
round-trip. Now, for that cooling tower for my Pi :)


Other related posts: