[retroforth] Source compression, Bytecode and faster startup for huger apps.

  • From: "Helmar Wodtke" <helmwo@xxxxxx>
  • To: retroforth@xxxxxxxxxxxxx
  • Date: Sat, 16 Oct 2004 20:09:46 +0200

-- Attached file included as plaintext by Ecartis --

Hi,

I currently think a lot about a kind of Bytecode.
I've seen
http://www.forthfreak.net/wiki/index.cgi?ForthScripts

Charles, you made a comment there and now I'm interested to know, if you are 
have some plans to support things like Bytecode with RetroForth (not in core 
but in extension).

I've an idea on how to implement a very simple kind of bytecode that is 
absolutely portable between FORTH systems since it does not define any 
semantics to a special encoding. Here is my idea:

1) FORTH programs consists of words. A bytecode for FORTH have to encode words. 
Nothing more.
2) There are such things as strings in FORTH. So a bytecode have also to encode 
arbitrary data.

Here my encoding:

-----------------------------------------------------
Byte value

0  End byte code.

1..14  followed by 1 to 14 bytes that build a word that is terminated by space.
15  15 bytes that are not followed by a space.

If a bytecode 1..15 occurs while decoding, the decoder takes the contents and 
remembers the contents. The decoder assigns a number from 32..254 to the 
contents, beginning at 32 and assigning numbers up to 254. After 254 is 
assigned, the decoder starts at 32 again and overwrites the remembered contents.

16  Insert a space

17..30  followed by 1 to 14 bytes that build a word that is terminated by space.
31  15 bytes that are not followed by a space.

The bytecodes 17..31 differ to 1..15 in that they do not remember the contents.

32..254 Insert a content that was stored by bytecodes 1..15.

255  followed by a single byte that represents a number in the range 0..255 
(eg. a literal).
-----------------------------------------------------

At the first thinking this looks only like simple (and not very efficent) 
compression of the source. But this code could be abused to minimize dictionary 
searches while decoding the bytecode and compiling it. The "interpret" loop of 
a FORTH system does essentially something like

32 parse find ...

again and again. The assigned bytecodes could be used to avoid seaching a word 
in dictionary. For this, with each bytecode a position in dictionary could be 
cached.

One problem with RetroForth I see at the moment is the two-dictionary design, 
that makes optimization of the dictionary searches a little more complicated 
than they would have to be - so for each bytecode possibly two entries in the 
different dictionaries could match. A way out of this (but reducing compilation 
speed) would be to only honour the forth dictionary.

So far I've written an encoder in Perl and a decoder in x86 assembly. Now I'm 
on the big way to make it running inside an interpret-loop to do the 
dictionary-search optimizations.

Bis dann,
Helmar


-- Binary/unsupported file stripped by Ecartis --
-- Type: text/x-vCard
-- File: Helmar Wodtke.vcf



-- Binary/unsupported file stripped by Ecartis --
-- Type: application/x-pkcs7-signature
-- File: smime.p7s
-- Desc: S/MIME Cryptographic Signature



Other related posts: