[gameprogrammer] Re: Scripting engines: Upvalues and functionrefs

From: Bob Pendleton <bob@xxxxxxxxxxxxx>
To: Gameprogrammer Mailing List <gameprogrammer@xxxxxxxxxxxxx>
Date: Tue, 06 Apr 2004 11:24:55 -0500
On Tue, 2004-04-06 at 07:18, David Olofson wrote:
> On Monday 05 April 2004 20.20, Bob Pendleton wrote:
> > On Mon, 2004-04-05 at 06:42, David Olofson wrote:
> [...]
> > > What I mean is just that if I go for the C/C++ variant, I'd like
> > > to have the compiler and/or language prevent such situations,
> > > preferably without removing other language features.
> >
> > Good, I follow you. Notice that in C all functions are essentially
> > "static" and in C++ you can only get pointers to static methods and
> > they can only access static data.
> 
> Yes... And some OO languages have a special type for "function of 
> object" (Object Pascal, for example), which seems really rather 
> similar to a reference to a local function from the implementation 
> POV. Both can have implicit or explicit "context management", so I 
> guess which way you do it depends a bit on whether the language uses 
> GC (or similar) or explicit memory management.

Yep. Having a GC makes life so much easier for both the language
developer and the programmer! But, using a GC based language for a task
with RT constraints is difficult. 

> 
> 
> [...]
> > Yeah, I read several at one time. I can't think of them right now.
> > Look for anything written about Scheme by Guy Steele about the
> > original thinking behind scheme will help. One of the motivations
> > behind scheme was to support exactly the feature you are looking at
> > in a lexically scoped language.
> >
> > This looks like a good place to start looking:
> > http://library.readscheme.org/page1.html
> >
> > LISP and all of its relatives and variants have this feature.
> 
> Ok, I'll have a look at that. (Probably later; trying to get things 
> going ASAP here...)
> 
> 
> > > > > Anyway, that's a matter of taste, I guess... The main reason
> > > > > why I'm doing it this way is that I want the VM to be RT
> > > > > safe, without opening up the can of words that is RT safe GC.
> > > > > (Maybe later... Right now, I just need this stuff to do the
> > > > > job ASAP.)
> > > >
> > > > There are actually several questions here. One is, how do I do
> > > > RT GC and the other is how do I save the semantics of the
> > > > language.
> > > >
> > > > The answer is not as elegant as what I would like, but it
> > > > works. When you take a reference to a function you save enough
> > > > of the context as is needed to run the function.
> > >
> > > Right away, as part of the "get funcref" operation?
> >
> > Yep, right then. That is the only time you know exactly what it is
> > supposed to be.
> 
> Yes...
> 
> 
> > > > That is generally a snap
> > > > shot of some portion of the stack. When you call the function
> > > > you push the context on the stack call the function, and when
> > > > it returns save the context. Then you provide the equivalent of
> > > > free() for contexts. That makes clean up the programmers
> > > > problem.
> > >
> > > Yes... However, that makes it very important to grab those
> > > function references (and context snapshots) at the exact right
> > > moment, so you don't accidentally get a snapshot when the context
> > > is in an undesired state. Feels a bit weird...
> >
> > I thought eating alligator was a bit weird, but it tastes pretty
> > good. :-)
> 
> I'll have try that some time. (Not that I think it's more weird than 
> eating any other animal; we just don't have many alligators around 
> here. ;-)

We have farms that raise them for both meat and leather. In the wild
they are (were?) a protected species. But their numbers have rebounded
due to federal protection. When it became illegal to hunt wild
alligators people (mostly) stopped hunting them and started raising them
on farms. Turns out that an alligator will eat lots of stuff that is
normally considered to be waste and turn it into valuable leather and
meat. A lot of egg farms also raise alligators.

So, alligator is becoming fairly popular in the southern US. It doesn't
taste like chicken... Well sort of like greasy chicken :-) Any
restaurant that sells Cajun style food will have alligator on the menu.
I'm only about 300 miles from the Texas/Louisiana border so of course
Cajun food is popular here.

If I remember correctly you are somewhere in the far northern part of
Europe? I eat a lot more pickled herring and bratwurst (not together)
than I do alligator... 

> 
> 
> > It is all in what you are used to. I "grew up" with LISP so it
> > seems perfectly normal to me.
> 
> I grew up with 68k asm (though I started with BASIC, like most home 
> computer people at the time) - so *both* ways seem sensible to 
> me... :-)

The 68K came out a while after I graduated from college.

> 
> It's just a matter of whether you think of upvalues as accessing a 
> parent's "locals", or as implicit arguments. Both ways have their 
> uses, but I don't know which one is more useful or which one is less 
> dangerous - and I don't think I want both in the same language. (At 
> least not with remotely similar syntax...)
> 
> How about thinking of functions as classes? The function itself is 
> both the class definition and the constructor. (That is, the only way 
> to create an instance is to call the function.) Local variables and 
> functions are members of the class. A reference to a local function 
> is effectively a reference to "function of object". Refcounting, GC 
> or similar ensures that instances are kept around until all 
> references are gone. A function that returns it's public interface 
> (or something) is actually a traditional OO style constructor + class 
> definition.
> 
> ---8<--------------------------------------------------
> public function (instance)my_class(argument)
> {
>       // Private stuff
>       some_variable;
> 
>       // Interface
>       public function set(value)
>       {
>               some_variable = value;
>       }
>       public function (value)get
>       {
>               value = some_variable;
>       }
> 
>       // Constructor body
>       set argument;
>       print "Created an instance of my_class.\n";
>       instance = context;
> }
> 
> o = my_class 5;
> print "o.get returns ", o.get, "\n";
> o.set 42;
> print "o.get returns ", o.get, "\n";
> o = nil;
> -------------------------------------------------->8---
> 

I've seen something very much like this in early versions of FORTRAN. It
supported functions with multiple entry points. Worked just like what
you describe. I think you are working through the thought process that
finally gave use objects. A lot of languages have static local variables
that allow functions to have hidden state and act like "mini" objects.

> 
> > > OTOH, if you think of the upvalues as a form of implicit
> > > arguments (many languages implement them as invisible arguments
> > > and similar "hacks"), this actually makes a lot more sense than
> > > managing contexts as real (and shared) objects that hang around
> > > for as long as you need them.
> >
> > Six of one half a dozen of the other. Semantically the same.
> 
> Except that one treats the calling context like an object passed by 
> reference while the other effectively passes the used upvalues by 
> value. One allows multiple reference instancnes to share and modify 
> upvalues, whereas the other makes upvalues local and/or read-only, 
> like normal function arguments.

I see your point. 

What it comes down to is that each piece of code has a context in which
it executes. That context consists of a set of named values. Each name
can be bound in a different way. So, some can be bound to values
dynamically allocated on a stack (arguments, return values, and local
variables) or bound to a persistent value (static global variables,
static locals variables, and variables in objects). Persistent values
can by dynamically allocated or statically allocated.

Decide what you want and design a syntax to match.

> 
> 
> > The
> > thing is that you want to treat the context the same way no matter
> > how the function is called. Having two ways of doing it will
> > introduce bugs.
> 
> Well... What's "the same way", actually? Is it "getting the reference" 
> or "calling via the reference" that is supposed to be the same as a 
> direct call? :-)

Calling through a reference and a direct call are the same thing. The
only difference is that you have to do one level of indirection to get
the address of the code when jumping through a reference. 

> 
> 
> > > The "grabbing a refence to f() always gives you the same thing"
> > > logic doesn't apply anyway, so why not break the rules properly
> > > while at it? ;-)
> >
> > Why doesn't it always act the same way?
> 
> Well, it always does the same thing, but the behavior of the function 
> will depend on the current state when the reference is taken. That 
> is, two references to the exact same function may do very different 
> things when called with the same (explicit) arguments. Of course, 
> that's the idea, but it could also be a bit confusing, as it looks 
> like you're just calling a function.

You have to capture the reference into a named variable which will be
different for each reference. Not much chance of confusion.

> 
> 
> [...]
> > > Yes, especially compared to some other solutions for this
> > > problem... I'm going to need one eventually anyway, so maybe I
> > > should just leave this stuff alone until then. (That gives me the
> > > "illegal context - let's crash!" behavior of C/C++, which is ok
> > > for now.)
> >
> > As long as you can be sure that it causes a crash. You don't want
> > it to work by accident.
> 
> Right. (Though not even that is a showstopper at this point.) What's 
> the simplest way of ensuring that a function call with an illegal 
> context will always fail?

Design the language so that the default value of a reference is
something you can check to see if it is invalid and so that once
assigned a valid value the value can not become invalid.

> 
> 
> > > I'm thinking about the basic N pools of power-of-two sized
> > > blocks, perhaps with block splitting and merging. That has the
> > > disadvantage of restricting the maximum block size to a fraction
> > > of the size of the memory pool, but that won't be much of an
> > > issue if arrays are fragmented at the VM level. (No need for
> > > copying data around when resizing dynamic arrays, which is kinda'
> > > nice in an RT system...)
> >
> > I dislike wasting, on average, a quarter of allocated memory. I
> > prefer using a vector of free lists indexed by size with block
> > merging at the time when free() is called. Very fast allocations
> > and frees with good block merging behavior. And, if the free lists
> > are created so that the most recently freed block on the list is
> > also the first to be allocated you get good virtual memory and
> > cache behavior as well.
> 
> That sounds a lot like what I'm thinking about, except I guess you'd 
> recommend other than power-of-two block sizes. Would you suggest just 
> using other block sizes, or some sort of dynamic selection of block 
> sizes?

I have usually use d a fixed length table combined with a generic "big
blocks" list. Lets say the minimum allocation unit is 16 bytes (the size
of an allocation header) then you never allocate anything smaller than
32 bytes and always allocate multiples of 16 bytes. Then free[0] points
to blocks that are 32 bytes long, free[1] points to block 48 bytes long,
and so on. A table with 16K entries should handle most allocations you
are going to run into in code. Then really big blocks go on a big block
list. You can make the free list table start out small and expand if you
like. But, you do want to put some maximum size limit on it. 

Most programs use the same size blocks over and over while multimedia
apps tend to do the same thing but also through in big blocks of random
sizes.

> 
> 
> [...]
> > Consider this example. I have 4 variables in the local context, x1,
> > y1, x2, y2 and a function drawLine(). Draw line draws a line from
> > x1, y1 to x2, y2. So I set those variables to the values I want and
> > get a reference to drawLine() and store it in dl1. Now, I set the
> > variables to new value and get another reference to drawLine() and
> > store it in dl2.
> [...]
> 
> This requires late binding by name or something like it... So far, all 
> binding is done at compile time in my engine, for speed and 
> simplicity, but that doesn't mix very well with full dynamic typing. 
> (If you look carefully at the example above, you'll notice that it 
> won't work unless the "set" and "get" functions are bound at run 
> time. "o" is dynamically typed...)
> 
> I'm considering just keeping an <index, name> table, where relevant 
> names from imported interfaces are added. The table would be used 
> when identifier look-ups fail, so that "new" (previously unknown) 
> names can be mapped to unique indices when their type is not known at 
> compile time.

That is called and alist or "associative list" systems like what you are
describing have been used for this task since at least the 1950s.

> 
> Example, based on the code above:
> 
>       * The definition of my_class adds <0, get> and
>         <1, set> to the name table.
> 
>       * When the compiler sees o.get and realizes it
>         doesn't know what type o is (well, it could
>         figure it out in this simple example, but let's
>         ignore that for now), it looks it up in the
>         name table. Thus, o.get evaluates to an element
>         of type NAMEINDEX with a value of 0. This is
>         used as the index argument to get the actual
>         funcref from o.
> 
>       * When the resulting code is executed, the VM
>         grabs o (a local variable) and then tries to
>         index it with a NAMEINDEX value of 0. Any
>         object that has something named "get" will
>         return that when name-indexed by 0. Anything
>         else evaluates to nil.
> 
> 
> The problem is that this gets ugly if/when objects way down the line 
> have fields with names that are already in the table - and spread all 
> over it... Making it a hash table seems like the obvious answer, but 
> I'm open to suggestions. (Speed is important, but bounded look-up 
> time is absolutely critical.)

Have a table/per object instead of a global table. When you see o.get
just look up get in the table belonging to "o". If o doesn't point to a
class object then it doesn't have a table and o.get is illegal. 

That keeps all the tables local to their objects, no global interactions
to worry about. Also, since the tables are small you can get good
performance with a sequential search of the tables so that you don't
have to use space for a hash table. OTOH, hash tables can be pretty
small so use a hash table per class.

> 
> 
> [...]
> > > Yes. I just have to decide what I actually want to achieve. :-)
> >
> > That is the hardest part :-)
> 
> Yeah... *hehe* That's why I'm developing this engine step by step 
> while using it for various stuff. (The first generation was the 
> source interpretting thingy that runs the AGW scripts in Audiality.)
> 
> 
> [...]
> > > Yeah... But I need function references. Or maybe I just *think* I
> > > do, as a result of hacking too much C, C++, Pascal and the like?
> >
> > I suppose you can live without it. But, I would hate to have to.
> > :-) to many things that are just too easy to implement using
> > function pointers.
> 
> Yeah, that's what I'm thinking, and I can't really think of a sensible 
> alternative.
> 
> 
> [...]
> > P.S.
> >
> > At one time I was a compiler writer. I got into graphics by working
> > on tools for developing microcode for graphic accelerators.
> 
> I think you've mentioned compilers before (or I read it somewhere), so 
> I figured there would be at least one person on this list with 
> serious experience with this stuff. :-)
> 
> 
> //David Olofson - Programmer, Composer, Open Source Advocate

                Bob Pendleton
> 
> .- Audiality -----------------------------------------------.
> |  Free/Open Source audio engine for games and multimedia.  |
> | MIDI, modular synthesis, real time effects, scripting,... |
> `-----------------------------------> http://audiality.org -'
>    --- http://olofson.net --- http://www.reologica.se ---
> 
> 
-- 
+---------------------------------------+
+ Bob Pendleton: writer and programmer. +
+ email: Bob@xxxxxxxxxxxxx              +
+ web:   www.GameProgrammer.com         +
+---------------------------------------+
Follow-Ups:
- [gameprogrammer] Re: Scripting engines: Upvalues and function refs
  - From: David Olofson
References:
- [gameprogrammer] Scripting engines: Upvalues and function refs
  - From: David Olofson
- [gameprogrammer] Re: Scripting engines: Upvalues and function refs
  - From: David Olofson
- [gameprogrammer] Re: Scripting engines: Upvalues and functionrefs
  - From: Bob Pendleton
- [gameprogrammer] Re: Scripting engines: Upvalues and function refs
  - From: David Olofson
[gameprogrammer] Re: Scripting engines: Upvalues and functionrefs

Other related posts: