[raspi-internals] Re: QPU Tutorials/Samples

  • From: Shachar Raindel <shacharr@xxxxxxxxx>
  • To: raspi-internals <raspi-internals@xxxxxxxxxxxxx>
  • Date: Sat, 15 Feb 2014 14:40:32 +0200

Hi HH,

First of all, kudos on getting the QPU assembly to this level.

I have been working on getting the shader_256.s example to compile. Got few
fixes in the pipe, and now the assembler doesn't crash when compiling. Sent
a pull request with what I have so far. I am still missing the rotator and
packer support - any chance you can document this in the bit level?

Now, I'm trying to understand the source for the differences I see between
the assembly result and the original binary.
So far, I have spotted the following differences:
ldi with 32 bit argument:
Original     - ldi.never -, 0x00000019 #/* 00000180: 00000019 e80009e7 */
Compiled  - ldi.never -, 0x00000019 #/* 00000180: 00000019 e00009e7 */

Note the 8 in the second word, which appears in the original binary but not
in our decompile-recompile result. The assembly documents these 8 bits as
"unknown", any guess as for the meaning of this field?

bra with return address:
The original file contains the line:
brr rb4, after_write_qpu_1_7 #// 0x00000268 #/* 00000210: 00000038 f0f81127
*/
However, the description of branches in
https://github.com/hermanhermitage/videocoreiv-qpu is


  addr:32, 1111 0000 cond:4 relative:1 register:1 ra:5 X:1 wa:6 wb:6

which means wa=4, wb=39 ("-"), X=1. Should the description of brr be
"writes to the register number specified by wa bits, selects the bank to
write to by the X bit"?

Thanks,
--Shachar



On Sat, Feb 15, 2014 at 4:51 AM, Herman Hermitage <
hermanhermitage@xxxxxxxxxxx> wrote:

> I've pushed a basic assembler (written in js) to github.  See:
>
>
> https://github.com/hermanhermitage/videocoreiv-qpu/blob/master/qpu-tutorial/qpuasm.md
>
> https://github.com/hermanhermitage/videocoreiv-qpu/blob/master/qpu-tutorial/qpuasm.js
>
> I would caution its probably going to be a couple of days before its
> robust enough even for the most basic usage!
>
> /HH
>
> Simple Assembler:
>
> A rudimentary assembler (very alpha at this stage, will improve with use).
> It needs Node.js to run (will target web page later).
>
> NOTE: Rotator and Pack/Unpack are not yet supported.
>
>
> Usage:
> =====
>   node[js] qpuasm.js [--showbits] [--dumpglobals] [--dumpsymbols]
> [--verbose] [--in]filename
>
>
> Source Syntax:
> ===========
>   {[label:] [instruction/directive] [# comment] LF}
>
>
> Instructions:
> =========
>
> [addop] [; mulop] [; op]
>
> Where addop, mulop and op are:
>   op [dst [, src1 [, src2]]
>
> addop:
>   nop, fadd, fsub, fmin, fmax, fminabs, fmaxabs, ftoi, itof, add, sub,
> shr, asr
>   ror, shl, min", max, and, or, xor, not, clz, v8adds, v8subs, mov
>
> mulop:
>   nop, fmul, mul24, v8muld, v8min, v8max, v8adds, v8subs, mov
>
> op:
>   bkpt, nop, thrsw, thrend, sbwait, sbdone, lthrsw, loadcv, loadc, ldcend,
> ldtmu0, ldtmu1, loadam,
>   ldi, bra, brr
>
> dst, src1, src2:
>   a register reference: r0...r5 or ra0...ra63, or rb0...rb63, or special
> reg (vpm, unif, ...).
>   a small constant
>
>
> Directives:
> ========
>
>   .set    symbol, jsexpr
>   .global symbol
>
>
> Example.
> =======
>
> nodejs qpuasm.js qpu-02.s
>
> 'qpu-02.s':
> ------------
>
> .set vw_layout, function vw_layout(row_step, element_stride, offset) {
> return (offset | 0xa00 | row_step << 12 | element_stride << 20); } vw_layout
> .set vw_setup0, function vw_setup0(x, y) { return (2<<30|y<<23|x<<16); }
> vw_setup0
> .set vw_setup1, function vw_setup1(x, y) { return (3<<30|x<<16|y); }
> vw_setup1
>
> .global entry
> .global exit
>
> entry:
>         # Determine if this QPU will signal on completion (flag is from
> uniforms)
>         mov rb3, unif
>
>         # Configure access to vpm
>         ldi vw_setup, vw_layout(1, 1)
>
>         # Write 5x16 words into vpm
>         mov vpm, 1
>         mov vpm, 2
>         mov vpm, 4
>         mov vpm, 8
>         mov vpm, elem_num
>
>         # Configure vpm write to memory
>         ldi vw_setup, vw_setup0(5, 16)
>         ldi vw_setup, vw_setup1(0, 0)
>
>         # Trigger transfer to destination in memory (address is from
> uniforms)
>         nop; mov vw_addr, unif
>
>         # Wait for vpm transfer to finish
>         mov.never -, vw_wait
>
> exit:
>         # Signal done
>         mov irq, rb3
>         nop; nop; thrend
>         nop
>         nop
>
>
> produces:
> =======
>
> /* Exported Symbols */
> #define qpu_symbol_entry 0x00000000
> #define qpu_symbol_exit 0x00000058
>
> /* Assembled Program */
> /* entry: */
> /* 0x00000000: */ 0x15827d80, 0x100210e7, /* mov rb3, unif */
> /* 0x00000008: */ 0x00101a00, 0xe0021c67, /* ldi vw_setup, vw_layout(1, 1)
> */
> /* 0x00000010: */ 0x159c1fc0, 0xd0020c27, /* mov vpm, 1 */
> /* 0x00000018: */ 0x159c2fc0, 0xd0020c27, /* mov vpm, 2 */
> /* 0x00000020: */ 0x159c4fc0, 0xd0020c27, /* mov vpm, 4 */
> /* 0x00000028: */ 0x159c8fc0, 0xd0020c27, /* mov vpm, 8 */
> /* 0x00000030: */ 0x159a7d80, 0x10020c27, /* mov vpm, elem_num */
> /* 0x00000038: */ 0x88050000, 0xe0021c67, /* ldi vw_setup, vw_setup0(5,
> 16) */
> /* 0x00000040: */ 0xc0000000, 0xe0021c67, /* ldi vw_setup, vw_setup1(0, 0)
> */
> /* 0x00000048: */ 0x80827036, 0x100049f2, /* nop; mov vw_addr, unif */
> /* 0x00000050: */ 0x159f2fc0, 0x100009e7, /* mov.never -, vw_wait */
> /* exit: */
> /* 0x00000058: */ 0x159c3fc0, 0x100209a7, /* mov irq, rb3 */
> /* 0x00000060: */ 0x009e7000, 0x300009e7, /* nop; nop; thrend */
> /* 0x00000068: */ 0x009e7000, 0x100009e7, /* nop */
> /* 0x00000070: */ 0x009e7000, 0x100009e7, /* nop */
>

Other related posts: