[raspi-internals] Re: caches?

  • From: Herman Hermitage <hermanhermitage@xxxxxxxxxxx>
  • To: "raspi-internals@xxxxxxxxxxxxx" <raspi-internals@xxxxxxxxxxxxx>
  • Date: Thu, 23 May 2013 16:44:49 +1200

> I am experiencing the following problem:
> 
> Many times when I load a new GPU image (with a loader derived from 
> mailbox.c) after having run another one before, it will hang or somehow 
> not work correctly. After rebooting, it will work as expected. Loading 
> another program will then probably cause problems again unless I reboot 
> inbetween.
> 
> Loading and running the same program several times works.
> 
> After quite some amount of debugging, I have not been able to pinpoint 
> the problem.
> 
> Currently, I am suspecting cache effects. I allocate one buffer which is 
> used for code and data. In subsequent runs, I usually get the same 
> buffer with the same GPU addresses. The complete buffer is written every 
> time a program is started.

Eizo-san has reported problems in the past.  I cant remember entirely how he 
resolved them.  I recall he found if he used base+512 and not the first 512 
bytes of allocated memory for instructions he was ok.  I havent had any issues, 
but I'm quitting out and re-running a ARM linux process each time I try a 
different test.

We can ask dom here:
https://github.com/raspberrypi/firmware/issues?state=open
if problems persist.

https://github.com/raspberrypi/firmware/wiki/Mailbox-property-interface 
(Execute Code), has the only public documentation I know of.

If you can reduce the problem to the smallest repeatable sequence, I can have a 
shot at debugging it.

> 
> Is it possible that parts of the buffer remain cached in the GPU?
> 
> What is known about GPU instruction and data caching?

> The mailbox interface documentation suggests that the instruction cache 
> is flushed before every code execution. It does not mention a data 
> cache, however.
> 
> Is it possible to manually flush caches?

i-cache flush is the default when calling into videocore with the current 
interface (set LSB of execution address to 1, for the dispatcher to tell it to 
skip the i-cache flush).

I think the data addresses allocated by  are L2 cacheable but not L1.  So they 
should be in sync between ARM and VC4.

You can generate L1 addresses by changing the higher order bits:

BCM2835 ARM Peripherals manual (page 5):
  0x00000000 - '0' Alias, L1 and L2 cached.
  0x40000000 - '4' Alias, L2 cache coherent (non allocating)
  0x80000000 - '8', Alias, L2 cached only
  0xC0000000 - 'C', alias, direct uncached

(These are addresses as seen by the VC4 side - ie VC CPU Bus Addresses.   ARM 
Addresses are munged into VC CPU Bus Addresses by:
  busAddress = VC-ARM-MMU(ARM-MMU(armVirtualAddress)) ).

Let me send you a link to some code re cache flushing.

Herman.


                                          

Other related posts: