SECD-like (explicit stack) machine for executing Joy?

symstym — 2003-06-28 07:34:54

Greetings all!

I'm interested in using Joy (or a modified version of it) as a kind of scripting language
for some projects I am working on. I had searched far and wide for a language (or
any model of computation) that was simple, powerful, flexible, and most importantly
was strongly reflective (in the sense that programs could easily modify their own code
or the code of other programs). I'm very impressed with Joy, and it seems like it
might fit the bill.

The problem I ran into is that in the current C implementation of Joy, evaluation of
quotations (by combinators) is done using recursion in C. For example, the
implementation of dip 'recursively' calls the exeterm function to execute the
quotation it popped off the stack. So, the current state of execution (dare I say, the
continuation?) is stored implicitly on the C stack.

For my (C++) application, I want a Joy interpreter that does one small (O(1)?) unit of
work each time I invoke it. This seems to imply that I need to do more explicit
bookeeping on some kind of stack. I looked into SECD machines a bit, and I think I'm
looking for something exactly like that: a simple machine where the evaluation of
each instruction affects a simple transformation of the data structures, so there is no
recursive invocation of the evaluation function.

So I'm wondering if anyone has already figured out such a machine for Joy, or has any
ideas as to the best way to design one. I've come up with a few ideas but they are
somewhat unsatisfying. It seems like we must introduce some new instructions, or
even an entire intermediate language to make things work. Take for example the ifte
combinator. Part of its evaluation would be setting up some kind of new 'frame' to
execute the IF quotation. But after that quotation finishes executing, there has to be
some kind of instruction lying around to do the remaining work of checking the
resulting true/false, popping that frame, and instating the correct THEN or ELSE
frame. Other combinators (map?) may be more difficult. So is the best solution to
introduce new (internal-use only?) 'RISC' instructions that these 'CISC' combinators
would be broken down into? Or better to introduce an entire new intermediate
execution language? Or is there a simple solution I am missing?

I'm not sure that such an explicit-stack implementation would compete performance-
wise with the current implementation, but it seems like it might be worth looking
into. Also, I don't really understand continuations, but it seems like it might make
the implementation of continuations easier.

Any input would be greatly appreciated.

Russ
symstym@...

p.s. Just searching the archive some more now, and I'm wondering if Monkey or Toy
might be close to what I am looking for?

Martin Young — 2003-06-28 17:18:23

Hi,

On Sat, 2003-06-28 at 08:34, symstym wrote:
> p.s. Just searching the archive some more now, and I'm wondering if Monkey or Toy
> might be close to what I am looking for?

Monkey certainly behaves, in this aspect, in the way you describe: all
primitive words are O(1) in time and, aside from the heap, the
interpreter runs in constant space. Beware though, Monkey is not Joy -
I've changed the syntax to be more visually pleasing (to me) and some of
the word semantics are very slightly changed. Most of the functional
changes I've made are (I hope) toward making Monkey compilable.

Rather than recurse on the C stack I have a second stack (which I call
the dip stack). When a new program is executed (generally with the i
combinator) the remainder of the current program is pushed onto the dip
stack. Whenever the end of a program is reached it is replaced with the
top item from the dip stack. If the dip stack is empty, the whole
program is finished.

A couple of new words make the dip stack explicit. 'move' and 'unmove'
transfer data between the top of the normal stack and the top of the dip
stack. This allows control words, defined in the usual manner, to
manipulate the current continations (which is what the dip stack
effectively is) and hence affect program flow.

Here's how some of the more complex combinators are built up:

<< Helper words. >>
swapdup == {{ swap dup }}
swapif == {{ swap if }}
pcon ==
{{
{ dip swapdup { swapif } dip unmove swap { dup unmove swap move swap
move move 0 } if drop }
cons cons
dup move move
}}

The word 'pcon' is a combinator which takes two items from the top of
the stack (T P pcon --> ~). It executes T first. If boolean true is
left on top of the stack it executes P then starts over again, else it
finishes.

One of the difference between Monkey and Joy is that I don't set up a
"local" stack for execution of the condition program - they execute on
the normal stack and are expected to leave exactly one boolean result.

Here's ifte. 'if' is a primitve word which takes two programs. If the
first leaves a boolean True on top of the stack, it executes the second.

ifte ==
{{
{ i } dipd roll
dup
{
{ drop i } if
} dip
not
{
{ drop } dip
i
} if
}}

Here's map. This would work in normal Joy - it relies only on having
tailrec (defined below). Again, however, it doesn't use "local" stacks
so "badly behaved" words will leave the normal stack in a state
(although the dip stack is immune to badness on the normal stack).

map ==
{{
swap [ ]
{ { dup [ ] = } dip swap }
{ }
{
{ uncons { swap dup { i } dip swap } dip swap } dip cons
}
tailrec
{ drop drop } dip
reverse
}}

Something a bit more complex:

tailrec ==
{{
unit cons cons << ... [ {P} {T} {R1} ] >>
{ << body: >>
dup
{ rest rest first i } dip
} ... [ {P} {T} {R1} ] >>
{ << test: >>
dup first swap << ... {P} [ {P} {T} {R1} ] >>
{ i } dip swap << ... [ {P} {T} {R1} ] bool >>
unit programise << ... [ {P} {T} {R1} ] {bool} >>
{ rest first i false } << ... false >>
{ true } << ... [ {P} {T} {R1} ] true >>
ifte
}
pcon
}}

linrec ==
{{
[ 1 ] cons cons cons cons
{ << body: >>
dup
{ rest rest first i } dip
}
{ << test: >>
dup first swap
{ i } dip swap
unit programise
{ dup { rest first i } dip false }
{ uncons uncons uncons uncons first 1 + unit cons cons cons cons
true }
ifte
}
pcon
rest rest rest
unswons swap first
repeat
}}

The word 'programise' is Monkey fluff. I differentiate between words
which are programs (and hence potentially compilable) and data lists
(which are not compilable). Programise changes a data list into a
program list but is simply a typecast thing - lists and programs are
still fully reflective.

For performance purposes I can optionally use primitive versions of many
of the helper words (in particular I can split pcon into two O(1)
primitive words).

Garbage collection in Monkey will break the O(1) thing. It's done
continuously but it is a recursive algorithm (but at least runs using
only the memory of the list cells which are being collected).

The source code isn't exceptionally pretty (and is always a work in
progress) but if you (or anybody else) wants a copy, do let me know.

--
Martin Young, at home.