[raspi-internals] Re: Update: Vector Memory Operations

  • From: Scott Mansell <phiren@xxxxxxxxx>
  • To: raspi-internals@xxxxxxxxxxxxx
  • Date: Sun, 14 Jul 2013 07:34:26 +1200

> 00001 lookupmh Gather values. D[i] = *(rb+ACCH[i]*width)

Does this mean sru006 is simply clearing the lower part of ACC?

____________
Scott Mansell


On Sun, Jul 14, 2013 at 3:20 AM, Herman Hermitage <
hermanhermitage@xxxxxxxxxxx> wrote:

> These are the memory operations I now have:
>
> (define-table M ["ld", "lookupmh", "lookupml", "mem03", "st",
> "indexwritemh", "indexwriteml", "mem07", "readlut", "writelut", "mem10",
> "mem11", "mem12", "mem13", "mem14", "mem15", "mem16", "mem17", "mem18",
> "mem19", "mem20", "mem21", "mem22", "mem23", "readacc", "mem25", "mem26",
> "mem27", "mem28",  "mem29", "mem30", "mem31"])
>
> Load/Store:
> ==
>
> 00000 ld           Load vector from memory.  D[i] = *(rb+i*width)
> 00001 lookupmh Gather values. D[i] = *(rb+ACCH[i]*width)
> 00010 lookupml       Gather values.            D[i] = *(rb+ACC[i]*width)
>
> 00100 st           Store vector to memory.   *(rb+i*width) = A[i]
> 00101 indexwritemh  Scatter values.  *(rb+ACCH[i]*width) = A[i].
> 00110 indexwriteml Scatter values. *(rb+ACC[i]*width) = A[i].
>
> LUT:
> ==
>
> There is a 1024 byte lookup-table for fast lookups.
>
> 01000 readlut lookup values in LUT.  D[i] = lut(B[i]*width). eg. readlut
> V(0,0), -, V(16,0)
> 01001 writelut write values to LUT.  lut(B[i]*width) = A[i].  eg. writelut
> 0, V(0,0), (r0)
>
> Accumulator Access:
>
> 11000 readacc      Read from accumulator.    D[i] = ACC[i]>>> B[i]
>
> Width:
> ==
>   ld, lookup (gather), st, indexwrite (scatter), readlut, writelut,  all
> support 8, 16, 32 bit operations (00, 01, 10).  Width (11) seems to be a
> no-op.
>
> For readacc, the width field behaves differently.
>  00 -> reads acc without saturation
>  01 -> reads acc saturating to 32 bits
>  10 -> no-op
>  11 -> reads acc saturating to 16 bits
>
> Others memory operations:
> ==
> I've probed the other memory operations and they seem to return 0 for
> reads.  Also no obvious other operations have appeared during preliminary
> examination of the start.elf blob.
>
> I think (hope? :) this close to concludes the semantics of the memory
> operations.
>
> Cheers
> Herman Hermitage
>
> > And for Memory operations:
> >    v<mop><memsize> H(yd,xd)[+rs], H(ya,xa)[+rs], (rb) [SETF]
> >    v<mop><memsize> V(yd,xd)[+rs], V(ya,xa)[+rs], (rb)  [SETF]
> >    v<mop><memsize> H(yd,xd)[+rs], H(ya,xa)[+rs], H(yb,xb)[+rs]
> >    v<mop><memsize> V(yd,xd)[+rs], V(ya,xa)[+rs], H(yb,xb)[+rs]
> >    v<mop><memsize> H(yd,xd)[+rs], H(ya,xa)[+rs], #immediate  [SETF]
> > [IFZ|IFNZ|IFN|IFNN|IFC|IFNC]
> >    v<mop><memsize> V(yd,xd)[+rs], V(ya,xa)[+rs], #immediate  [SETF]
> > [IFZ|IFNZ|IFN|IFNN|IFC|IFNC]
> >
> > Where <memsize> is B (8), H (16), or L (32), and determines the size of
> > transfers to memory.
> >
> > And <mop> is defined as:
> >    00000 ld           Load vector from memory.  D[i] = *(rb+i*width)
> >    00010 lookup       Gather values.            D[i] = *(rb+ACC[i]*width)
> >    00100 st           Store vector to memory.   *(rb+i*width) = A[i]
> >    11000 readacc      Read from accumulator.    D[i] = ACC[i]>>> B[i]
> >
> > (more to follow in coming days).
> >
> > I have written H(), or V(), but all sizes are valid - HX(), HY(), VX(),
> > VY().  Or alternatively HB(), HH(), HL() (or H8(), H16(), H32()) -
> > depending on which way we go with the names.
> >
> > Cheers
> > Herman.
> >
> > --
> > Encoding Definitions:
> >
> > # Scalar register position/value offset
> > (define-table r ["+r0", "+r1", "+r2", "+r3", "+r4", "+r5", "+r6", "+r7"])
> >
> > # Lane selection, check each lanes flags and use ALL, Never, Zero, Non
> > Zero, Negative, Non Negative, Carry, Non Carry.
> > (define-table P ["", " NV", " IFZ", " IFNZ", " IFN", " IFNN", " IFC", "
> > IFNC"])
> >
> > # Update Zero, Negative and Carry flags at end of operation (in each
> > active lane).
> > (define-table F ["", " SETF"]
> >
> > # Operation Size (for sign extension)
> > # alt: (define-table X ["16", "32"])
> > (define-table X ["H", "L"])
> >
> > # Operation Width (memory transfer)
> > # alt: (define-table W ["8", "16", "32", "<width-unknown>")
> > (define-table W ["B", "H", "L", "<width-unknown>")
> >
> > # Memory Operations
> > (define-table M ["ld", "mem01", "lookup", "mem03", "st", "mem05",
> > "mem06", "mem07", "mem08", "mem09", "mem10", "mem11", "mem12", "mem13",
> > "mem14", "mem15", "mem16", "mem17", "mem18", "mem19", "mem20", "mem21",
> > "mem22", "mem23", "readacc", "mem25", "mem26", "mem27", "mem28",
> > "mem29", "mem30", "mem31"])
> >
> >
> > # These below are out of date, but give the general idea of bit positions
> > #
> > # *WARNING* the vertical/horizontal issues and scalar register addition
> > are not handled correctly.
> > #
> >
> > Memory Access:
> > 1111 00MM MMMW Wrrr DDDD dddd ddAA Axaa aaaa z011 1Fbb bbbb ";
> > v%s{W}%s{M} %s{D}%d{d}%s{r}, %s{A}%d{a}%s{r}, (r%d{b}) %s{F}"
> > 1111 00MM MMMW Wrrr DDDD dddd ddAA Axaa aaaa z0BB Bybb bbbb ";
> > v%s{W}%s{M} %s{D}%d{d}%s{r}, %s{A}%d{a}%s{r}, %s{B}%d{b}%s{r}"
> > 1111 00MM MMMW Wrrr DDDD dddd ddAA Axaa aaaa z1PP PFii iiii ";
> > v%s{W}%s{M} %s{D}%d{d}%s{r}, %s{A}%d{a}%s{r}, #%d{i} %s{P}%s{F}"
> >
> > Operations:
> > 1111 01Xv vvvv vrrr DDDD dddd ddAA Axaa aaaa r011 1Fbb bbbb ";
> > v%s{X}%s{v} %s{D}%d{d}%s{r}, %s{A}%d{a}%s{r}, r%d{b} %s{F}"
> > 1111 01Xv vvvv vrrr DDDD dddd ddAA Axaa aaaa r0BB Bybb bbbb ";
> > v%s{X}%s{v} %s{D}%d{d}%s{r}, %s{A}%d{a}%s{r}, %s{B}%d{b}%s{r}"
> > 1111 01Xv vvvv vrrr DDDD dddd ddAA Axaa aaaa r1PP PFii iiii ";
> > v%s{X}%s{v} %s{D}%d{d}%s{r}, %s{A}%d{a}%s{r}, #%d{i} %s{P}%s{F}"
> >
> >
>

Other related posts: