Concatenative Hardware

Christopher Diggins — 2007-11-02 00:39:43

So I've been thinking about concatenative hardware lately. Here is a
single-stack
machine instruction set:

0: JZ (val fxn -> ) // jump to fxn if value is below zero
1: CALL (fxn) // push return address and jump to fxn
2: LTEQ (val val -> bool) // compare values
3: AND (val val -> val) // perform bitwise and
4: XOR (val val -> val) // perform XOR operation
5: SHL (val val -> val) // shift-left
6: DUP (a -> a a) // duplicate top-value
7: QTE (val -> fxn) // create a thunk
8: COMP (fxn fxn -> fxn) // compose two functions (append their contents)
9: POP (a -> ) // pop value from stack
10: DIP (a fxn -> a) // applies function to below top of stack
11: SWP (a b -> b a) // swap top two items

So essentially we can use the data stack also as a return address
stack. This is a zero operand instruction set, so it is very compact
and can be implemented efficiently, but it takes a lot of instructions
to do simple things (e.g. accessing items deep in the stack).

The "dip" instruction is so important I'm tempted to include variants:

12: DIP2 : (a b fxn -> a b)
13: DIP3 : (a b c fxn -> a b c)
14: DIP4 : (a b c d fxn -> a b c d)
15: open for suggestions

Of course "DIP" can be implemented as "SWP QTE COMP CALL", but I'm
concerned about performance. The "DIP" is one of the most frequent
instructions in my code.

Anyway, hopefully this makes sense to my fellow concatenators. I'd
love to hear thoughts and suggestions. I'm pretty naive when it comes
to hardware.

- Christopher

Daniel Ehrenberg — 2007-11-02 00:46:03

This is pretty high-level for an instruction set, don't you think?
Would it be that easy to implement QTE, COMP or DIP? How would
quotations be represented? This looks like just a small vocabulary for
a concatenative langauge, or I'm missing something.

Daniel Ehrenberg

On 11/1/07, Christopher Diggins <cdiggins@...> wrote:
>
> So I've been thinking about concatenative hardware lately. Here is a
> single-stack
> machine instruction set:
>
> 0: JZ (val fxn -> ) // jump to fxn if value is below zero
> 1: CALL (fxn) // push return address and jump to fxn
> 2: LTEQ (val val -> bool) // compare values
> 3: AND (val val -> val) // perform bitwise and
> 4: XOR (val val -> val) // perform XOR operation
> 5: SHL (val val -> val) // shift-left
> 6: DUP (a -> a a) // duplicate top-value
> 7: QTE (val -> fxn) // create a thunk
> 8: COMP (fxn fxn -> fxn) // compose two functions (append their contents)
> 9: POP (a -> ) // pop value from stack
> 10: DIP (a fxn -> a) // applies function to below top of stack
> 11: SWP (a b -> b a) // swap top two items
>
> So essentially we can use the data stack also as a return address
> stack. This is a zero operand instruction set, so it is very compact
> and can be implemented efficiently, but it takes a lot of instructions
> to do simple things (e.g. accessing items deep in the stack).
>
> The "dip" instruction is so important I'm tempted to include variants:
>
> 12: DIP2 : (a b fxn -> a b)
> 13: DIP3 : (a b c fxn -> a b c)
> 14: DIP4 : (a b c d fxn -> a b c d)
> 15: open for suggestions
>
> Of course "DIP" can be implemented as "SWP QTE COMP CALL", but I'm
> concerned about performance. The "DIP" is one of the most frequent
> instructions in my code.
>
> Anyway, hopefully this makes sense to my fellow concatenators. I'd
> love to hear thoughts and suggestions. I'm pretty naive when it comes
> to hardware.
>
> - Christopher
>

Christopher Diggins — 2007-11-02 01:01:56

On 11/1/07, Daniel Ehrenberg <microdan@...> wrote:
> This is pretty high-level for an instruction set, don't you think?

Well part of the fun of designing hardware is you get to do what you want. :-)
It is very high-level, which is what I want to experiment with.

> Would it be that easy to implement QTE, COMP or DIP?

I don't know. A cons cell implementation wouldn't be too bad, but I
fear the performance might be horrendous.

> How would
> quotations be represented?

This is an open question, I'm fishing for ideas. Some possibilities
that I am aware of:
- cons cells
- linked arrays
- vlists

> This looks like just a small vocabulary for
> a concatenative langauge, or I'm missing something.

That's one way of looking at it. It is also similar to the language
minimal instruction set computers (e.g.
http://www.cs.uiowa.edu/~jones/arch/cisc/), except for higher-order
instructions.

It appears that higher-order instructions can allow for significantly
more compact code than is possible with traditional stack-oriented
architectures, which is an area of research I am interested in
pursuing.

Anyway, I'd like to hear suggestions on how to implement quotations in
hardware, or ways to improve the effectiveness of the instruction set
while maintaining higher-order instructions.

I'm open to the idea of 1 or even 2 operand instruction sets as well.

Cheers,
Christopher

Don Groves — 2007-11-02 02:22:46

Hi Christopher,

I like this idea.

Any thoughts of emulating this set and building Cat on that virtual
machine (or is that already how Cat is implemented)? It seems to
me that doing so would yield significant insights into the utility of
your instruction set.
--
Don Groves



On Nov 1, 2007, at 19:39 , Christopher Diggins wrote:

> So I've been thinking about concatenative hardware lately. Here is a
> single-stack
> machine instruction set:
>
> 0: JZ (val fxn -> ) // jump to fxn if value is below zero
> 1: CALL (fxn) // push return address and jump to fxn
> 2: LTEQ (val val -> bool) // compare values
> 3: AND (val val -> val) // perform bitwise and
> 4: XOR (val val -> val) // perform XOR operation
> 5: SHL (val val -> val) // shift-left
> 6: DUP (a -> a a) // duplicate top-value
> 7: QTE (val -> fxn) // create a thunk
> 8: COMP (fxn fxn -> fxn) // compose two functions (append their
> contents)
> 9: POP (a -> ) // pop value from stack
> 10: DIP (a fxn -> a) // applies function to below top of stack
> 11: SWP (a b -> b a) // swap top two items
>
> So essentially we can use the data stack also as a return address
> stack. This is a zero operand instruction set, so it is very compact
> and can be implemented efficiently, but it takes a lot of instructions
> to do simple things (e.g. accessing items deep in the stack).
>
> The "dip" instruction is so important I'm tempted to include variants:
>
> 12: DIP2 : (a b fxn -> a b)
> 13: DIP3 : (a b c fxn -> a b c)
> 14: DIP4 : (a b c d fxn -> a b c d)
> 15: open for suggestions
>
> Of course "DIP" can be implemented as "SWP QTE COMP CALL", but I'm
> concerned about performance. The "DIP" is one of the most frequent
> instructions in my code.
>
> Anyway, hopefully this makes sense to my fellow concatenators. I'd
> love to hear thoughts and suggestions. I'm pretty naive when it comes
> to hardware.
>
> - Christopher
>
>
>
> Yahoo! Groups Links
>
>
>
>

Christopher Diggins — 2007-11-02 03:22:44

Hi Don,

I'm glad you like the idea. The current Cat implementation has over a
hundred built-in primitives, because I was obsessed with performance in the
early stages. This is no longer the case, so the design is a bit weird, some
parts are heavily optimized while others are extremely naive.

I do plan on writing an VM emulator for the instruction set once I get some
more feedback. I figure many people might have some insights to share, to
help reduce the development time. I want to make sure that I am not going
down a particularly boneheaded path.

Thanks,
Christopher

On 11/1/07, Don Groves <dgroves@...> wrote:
>
> Hi Christopher,
>
> I like this idea.
>
> Any thoughts of emulating this set and building Cat on that virtual
> machine (or is that already how Cat is implemented)? It seems to
> me that doing so would yield significant insights into the utility of
> your instruction set.
> --
> Don Groves
>
> On Nov 1, 2007, at 19:39 , Christopher Diggins wrote:
>
> > So I've been thinking about concatenative hardware lately. Here is a
> > single-stack
> > machine instruction set:
> >
> > 0: JZ (val fxn -> ) // jump to fxn if value is below zero
> > 1: CALL (fxn) // push return address and jump to fxn
> > 2: LTEQ (val val -> bool) // compare values
> > 3: AND (val val -> val) // perform bitwise and
> > 4: XOR (val val -> val) // perform XOR operation
> > 5: SHL (val val -> val) // shift-left
> > 6: DUP (a -> a a) // duplicate top-value
> > 7: QTE (val -> fxn) // create a thunk
> > 8: COMP (fxn fxn -> fxn) // compose two functions (append their
> > contents)
> > 9: POP (a -> ) // pop value from stack
> > 10: DIP (a fxn -> a) // applies function to below top of stack
> > 11: SWP (a b -> b a) // swap top two items
> >
> > So essentially we can use the data stack also as a return address
> > stack. This is a zero operand instruction set, so it is very compact
> > and can be implemented efficiently, but it takes a lot of instructions
> > to do simple things (e.g. accessing items deep in the stack).
> >
> > The "dip" instruction is so important I'm tempted to include variants:
> >
> > 12: DIP2 : (a b fxn -> a b)
> > 13: DIP3 : (a b c fxn -> a b c)
> > 14: DIP4 : (a b c d fxn -> a b c d)
> > 15: open for suggestions
> >
> > Of course "DIP" can be implemented as "SWP QTE COMP CALL", but I'm
> > concerned about performance. The "DIP" is one of the most frequent
> > instructions in my code.
> >
> > Anyway, hopefully this makes sense to my fellow concatenators. I'd
> > love to hear thoughts and suggestions. I'm pretty naive when it comes
> > to hardware.
> >
> > - Christopher
> >
> >
> >
> > Yahoo! Groups Links
> >
> >
> >
> >
>
>
>


[Non-text portions of this message have been removed]

Don Groves — 2007-11-02 03:48:30

Not boneheaded at all , imho. Forth hardware has been around for a
long time.
With the ubiquity of Java, I'm expecting to see JVM hardware someday
also.

Lately, I've been re-reading the ideas behind Ralph Griswold's Icon
language.
Icon has higher level semantics than most programming languages and some
excellent underlying ideas, such as generators, which implement
laziness.

I'm planning on including generators in my experimental concatenative
language
as well as other Icon ideas. For example, [1 2 ...] is a lazy list of
positive integers.
The ellipsis is a generator which unconses the next integer and
produces [2 3 ...],
[3 4 ...], etc.

Icon can be found at http://www.cs.arizona.edu/icon/ for those
interested.
--
Don



On Nov 1, 2007, at 19:22 , Christopher Diggins wrote:

> Hi Don,
>
> I'm glad you like the idea. The current Cat implementation has over a
> hundred built-in primitives, because I was obsessed with
> performance in the
> early stages. This is no longer the case, so the design is a bit
> weird, some
> parts are heavily optimized while others are extremely naive.
>
> I do plan on writing an VM emulator for the instruction set once I
> get some
> more feedback. I figure many people might have some insights to
> share, to
> help reduce the development time. I want to make sure that I am not
> going
> down a particularly boneheaded path.
>
> Thanks,
> Christopher
>
> On 11/1/07, Don Groves <dgroves@...> wrote:
>>
>> Hi Christopher,
>>
>> I like this idea.
>>
>> Any thoughts of emulating this set and building Cat on that virtual
>> machine (or is that already how Cat is implemented)? It seems to
>> me that doing so would yield significant insights into the utility of
>> your instruction set.
>> --
>> Don Groves
>>
>> On Nov 1, 2007, at 19:39 , Christopher Diggins wrote:
>>
>>> So I've been thinking about concatenative hardware lately. Here is a
>>> single-stack
>>> machine instruction set:
>>>
>>> 0: JZ (val fxn -> ) // jump to fxn if value is below zero
>>> 1: CALL (fxn) // push return address and jump to fxn
>>> 2: LTEQ (val val -> bool) // compare values
>>> 3: AND (val val -> val) // perform bitwise and
>>> 4: XOR (val val -> val) // perform XOR operation
>>> 5: SHL (val val -> val) // shift-left
>>> 6: DUP (a -> a a) // duplicate top-value
>>> 7: QTE (val -> fxn) // create a thunk
>>> 8: COMP (fxn fxn -> fxn) // compose two functions (append their
>>> contents)
>>> 9: POP (a -> ) // pop value from stack
>>> 10: DIP (a fxn -> a) // applies function to below top of stack
>>> 11: SWP (a b -> b a) // swap top two items
>>>
>>> So essentially we can use the data stack also as a return address
>>> stack. This is a zero operand instruction set, so it is very compact
>>> and can be implemented efficiently, but it takes a lot of
>>> instructions
>>> to do simple things (e.g. accessing items deep in the stack).
>>>
>>> The "dip" instruction is so important I'm tempted to include
>>> variants:
>>>
>>> 12: DIP2 : (a b fxn -> a b)
>>> 13: DIP3 : (a b c fxn -> a b c)
>>> 14: DIP4 : (a b c d fxn -> a b c d)
>>> 15: open for suggestions
>>>
>>> Of course "DIP" can be implemented as "SWP QTE COMP CALL", but I'm
>>> concerned about performance. The "DIP" is one of the most frequent
>>> instructions in my code.
>>>
>>> Anyway, hopefully this makes sense to my fellow concatenators. I'd
>>> love to hear thoughts and suggestions. I'm pretty naive when it
>>> comes
>>> to hardware.
>>>
>>> - Christopher
>>>
>>>
>>>
>>> Yahoo! Groups Links
>>>
>>>
>>>
>>>
>>
>>
>>
>
>
> [Non-text portions of this message have been removed]
>
>
>
>
> Yahoo! Groups Links
>
>
>
>

Christopher Diggins — 2007-11-02 04:11:27

> I'm planning on including generators in my experimental concatenative
> language > as well as other Icon ideas. For example, [1 2 ...] is a lazy list of
> positive integers. The ellipsis is a generator which unconses the next integer and
> produces [2 3 ...], > [3 4 ...], etc.

Cat has lazy lists too. Cat is sometimes lazy and sometimes eager
depending on the constructors used. The "unfold" function generates a
Haskell style lazy list. It takes a successor function and a
termination predicate (which can be always return true) and a next
function. In other words the type signature is:

unfold : ('a ('a -> 'a) ('a -> bool) -> list)

- Christopher

Joe Bowbeer — 2007-11-02 04:14:01

On 11/1/07, Don Groves <dgroves@...> wrote:
> With the ubiquity of Java, I'm expecting to see JVM hardware someday
>

ARM's Jazelle-enabled processors do this in the small

http://www.arm.com/products/esd/jazelle_home.html

Azul Compute Appliance does this in the large

http://www.azulsystems.com/products/compute_appliance.htm

--Joe

Don Groves — 2007-11-02 05:57:13

Thanks, Joe, both look interesting. I sure wish the ARM processor family
had been available before I retired from embedded systems programming!
--
Don


On Nov 1, 2007, at 19:14 , Joe Bowbeer wrote:

> On 11/1/07, Don Groves <dgroves@...> wrote:
>> With the ubiquity of Java, I'm expecting to see JVM hardware someday
>>
>
> ARM's Jazelle-enabled processors do this in the small
>
> http://www.arm.com/products/esd/jazelle_home.html
>
> Azul Compute Appliance does this in the large
>
> http://www.azulsystems.com/products/compute_appliance.htm
>
> --Joe
>
>
>
> Yahoo! Groups Links
>
>
>
>

Don Groves — 2007-11-02 06:00:16

Most interesting. Haskell is another language I know a bit about
and need to learn much more. So little time, so many languages ;-)
--
Don


On Nov 1, 2007, at 19:11 , Christopher Diggins wrote:

>> I'm planning on including generators in my experimental concatenative
>> language > as well as other Icon ideas. For example, [1 2 ...] is
>> a lazy list of
>> positive integers. The ellipsis is a generator which unconses the
>> next integer and
>> produces [2 3 ...], > [3 4 ...], etc.
>
> Cat has lazy lists too. Cat is sometimes lazy and sometimes eager
> depending on the constructors used. The "unfold" function generates a
> Haskell style lazy list. It takes a successor function and a
> termination predicate (which can be always return true) and a next
> function. In other words the type signature is:
>
> unfold : ('a ('a -> 'a) ('a -> bool) -> list)
>
> - Christopher
>
>
>
> Yahoo! Groups Links
>
>
>
>

Don Groves — 2007-11-02 07:38:25

Hi, All --

Since I've mentioned generators and my lazy list example, I may as well
present the entire design for discussion. Also, I've decided on a name
change. Instead of Catenate, I've decided on ACL. ACL could stand for
A Concatenative Language, or Another Concatenative Language, or
Anterior Cruciate Ligament, but it doesn't have to. It will be
referred to
simply as ACL.

A generator is a persistent function. When activated, it pushes a
copy of
itself on the stack. The copy performs the generative function and
remains
on the stack until specifically dropped. Each time it comes to the
top of
the stack it generates a value according to its definition.

Example: [1 2 ...] defgen Z+

Z+ is now a generator a copy of which will generate positive integers as
long as it remains active (until dropped from the stack).

stack:
=> Z+

yields -

stack:[2 3 ...] 1
=>

The next time it comes to the top of the stack, it will generate a 2,
thusly -

stack:[3 4 ...] 2
=>

This behaviour continues until the user/program decides to stop it.
Thus,
the ellipsis generator, ..., can mimic an infinite list. Z+ is
included in the
ACL standard library.

This form of generator may also be finite: [1 3 ... 99] defgen odd<100
behaves like this -

stack:
=> odd<100

stack:[3 5 ... 99] 1
=>
... and so on until 99 is produced, at which point the generator copy
self-destructs.

ACL will have other generator forms, such as the Fibonacci sequence.

Comments, suggestions, etc., are solicited and greatly appreciated,
Don Groves

"Computers are like Old Testament gods; lots of rules and no mercy."
-- Joseph Campbell

pml060912 — 2007-11-02 09:25:15

--- In concatenative@yahoogroups.com, "Christopher Diggins"
<cdiggins@...> wrote:
>
> So I've been thinking about concatenative hardware lately. Here is a
> single-stack
> machine instruction set:
>
> 0: JZ (val fxn -> ) // jump to fxn if value is below zero
> 1: CALL (fxn) // push return address and jump to fxn
> 2: LTEQ (val val -> bool) // compare values
> 3: AND (val val -> val) // perform bitwise and
> 4: XOR (val val -> val) // perform XOR operation
> 5: SHL (val val -> val) // shift-left
> 6: DUP (a -> a a) // duplicate top-value
> 7: QTE (val -> fxn) // create a thunk
> 8: COMP (fxn fxn -> fxn) // compose two functions (append their
contents)
> 9: POP (a -> ) // pop value from stack
> 10: DIP (a fxn -> a) // applies function to below top of stack
> 11: SWP (a b -> b a) // swap top two items
>
> So essentially we can use the data stack also as a return address
> stack.

Maybe I'm missing a layer of sophistication hidden within the
capabilities of one of these instructions (COMP, maybe?), but doesn't
that restrict the virtual machine to the capabilities of a pure single
stack machine? As I mentioned the other day, those aren't Turing
equivalent - I had a quick look at the wikipedia entry on stack
machines to confirm it - although you get that with two stack machines
and machines with more generalised instructions like PICK or ROLL or
data structuring/accessing instructions (ignoring finite physical
restrictions for the moment).

This is a zero operand instruction set, so it is very compact
> and can be implemented efficiently, but it takes a lot of instructions
> to do simple things (e.g. accessing items deep in the stack).

For what it's worth, I contributed the original versions of the high
level PICK and ROLL used in Ting's Eforth. They work using recursion
and the return stack for temporary storage. That approach doesn't use
many instructions in the code or many different opcodes, but it does
use a lot of cycles to get very deep.

>
> The "dip" instruction is so important I'm tempted to include variants:
>
> 12: DIP2 : (a b fxn -> a b)
> 13: DIP3 : (a b c fxn -> a b c)
> 14: DIP4 : (a b c d fxn -> a b c d)
> 15: open for suggestions
>
> Of course "DIP" can be implemented as "SWP QTE COMP CALL", but I'm
> concerned about performance. The "DIP" is one of the most frequent
> instructions in my code.
>
> Anyway, hopefully this makes sense to my fellow concatenators. I'd
> love to hear thoughts and suggestions. I'm pretty naive when it comes
> to hardware.
>
> - Christopher
>

Christopher Diggins — 2007-11-02 12:02:12

On 11/2/07, pml060912 <pml540114@...> wrote:
> --- In concatenative@yahoogroups.com, "Christopher Diggins"
> <cdiggins@...> wrote:
> >
> > So I've been thinking about concatenative hardware lately. Here is a
> > single-stack
> > machine instruction set:
> >
> > 0: JZ (val fxn -> ) // jump to fxn if value is below zero
> > 1: CALL (fxn) // push return address and jump to fxn
> > 2: LTEQ (val val -> bool) // compare values
> > 3: AND (val val -> val) // perform bitwise and
> > 4: XOR (val val -> val) // perform XOR operation
> > 5: SHL (val val -> val) // shift-left
> > 6: DUP (a -> a a) // duplicate top-value
> > 7: QTE (val -> fxn) // create a thunk
> > 8: COMP (fxn fxn -> fxn) // compose two functions (append their
> contents)
> > 9: POP (a -> ) // pop value from stack
> > 10: DIP (a fxn -> a) // applies function to below top of stack
> > 11: SWP (a b -> b a) // swap top two items
> >
> > So essentially we can use the data stack also as a return address
> > stack.
>
> Maybe I'm missing a layer of sophistication hidden within the
> capabilities of one of these instructions (COMP, maybe?), but doesn't
> that restrict the virtual machine to the capabilities of a pure single
> stack machine?

QTE and COMP allows you to treat function like stacks, by adding data
to them, which you can accesss later by calling CALL. So
computationally it is a multi-stack machine.

> As I mentioned the other day, those aren't Turing
> equivalent - I had a quick look at the wikipedia entry on stack
> machines to confirm it - although you get that with two stack machines
> and machines with more generalised instructions like PICK or ROLL or
> data structuring/accessing instructions (ignoring finite physical
> restrictions for the moment).
>
> This is a zero operand instruction set, so it is very compact
> > and can be implemented efficiently, but it takes a lot of instructions
> > to do simple things (e.g. accessing items deep in the stack).
>
> For what it's worth, I contributed the original versions of the high
> level PICK and ROLL used in Ting's Eforth. They work using recursion
> and the return stack for temporary storage. That approach doesn't use
> many instructions in the code or many different opcodes, but it does
> use a lot of cycles to get very deep.

What are the stack effects of PICK and ROLL? (I don't really know
FORTH that well).

I wonder if PICK and ROLL would be a better choice than DIP, QTE, COMP, etc.

- Christopher

pml060912 — 2007-11-02 13:15:37

--- In concatenative@yahoogroups.com, "Christopher Diggins"
<cdiggins@...> wrote:
>
> On 11/2/07, pml060912 <pml540114@...> wrote:
.
.
.
> > Maybe I'm missing a layer of sophistication hidden within the
> > capabilities of one of these instructions (COMP, maybe?), but doesn't
> > that restrict the virtual machine to the capabilities of a pure single
> > stack machine?
>
> QTE and COMP allows you to treat function like stacks, by adding data
> to them, which you can accesss later by calling CALL. So
> computationally it is a multi-stack machine.

Could you describe their behaviour a bit more?
.
.
.
> > > to do simple things (e.g. accessing items deep in the stack).
> >
> > For what it's worth, I contributed the original versions of the high
> > level PICK and ROLL used in Ting's Eforth. They work using recursion
> > and the return stack for temporary storage. That approach doesn't use
> > many instructions in the code or many different opcodes, but it does
> > use a lot of cycles to get very deep.
>
> What are the stack effects of PICK and ROLL? (I don't really know
> FORTH that well).
>
> I wonder if PICK and ROLL would be a better choice than DIP, QTE,
COMP, etc.

They provide means of accessing items arbitrarily deep in the stack,
e.g. 7 PICK puts a copy of the 7th item on top and 7 ROLL moves the
7th item to the top withe higher ones each moving down one, a
generalised DUP and a generalised SWAP. But they don't do the other stuff.

However, my personal taste would be to separate the different kinds of
functionality during the research phase anyway, at least until I found
the combinations that seemed to come up the way you found with the DIP
generalisations. That would mean, say, using PICK and I rather than a
generalised DIP.

William Tanksley, Jr — 2007-11-02 17:57:06

Christopher Diggins <cdiggins@...> wrote:
> So I've been thinking about concatenative hardware lately. Here is a
> single-stack machine instruction set:

Interesting. I've played a lot with stack hardware (designed a
complete processor for a HW class), so your twist on the subject is
very thought-provoking.

> 0: JZ (val fxn -> ) // jump to fxn if value is below zero
> 1: CALL (fxn) // push return address and jump to fxn
> 2: LTEQ (val val -> bool) // compare values
> 3: AND (val val -> val) // perform bitwise and
> 4: XOR (val val -> val) // perform XOR operation
> 5: SHL (val val -> val) // shift-left
> 6: DUP (a -> a a) // duplicate top-value
> 7: QTE (val -> fxn) // create a thunk
> 8: COMP (fxn fxn -> fxn) // compose two functions (append their contents)
> 9: POP (a -> ) // pop value from stack
> 10: DIP (a fxn -> a) // applies function to below top of stack
> 11: SWP (a b -> b a) // swap top two items

Very high-level... You'll need more detailed semantics for that to
work at all (memory management will be a BEAR for you). Of course,
you're entirely missing ADD and such. Oh, and I see you've got a
comparison instruction, but no conditional call or branch. I assume
you intended to imply those, of course.

You might want to look at Henry Baker's papers on a similar topic.
Check out his paper on O(1) instructions which could implement a VERY
flexible VM that doesn't require GC;
http://home.pipeline.com/~hbaker1/LinearLisp.html. "Hash cons"ing is
very important, and might even fit into hardware.

Once you get memory management a bit more simple, you need to
determine the implications of your instruction encoding. You don't
explain your encoding, which leads me to suspect that each instruction
is encoded as a byte. Needless to say, bit packing is a LOT nicer,
even when you're doing a VM.

The worst implication of your instruction encoding -- and this isn't
something you've made explicit -- is that your instructions _appear_
to be stored in CONS cells. This implies that there's one per cell.
That's MAJORLY inefficient in every way, needless to say. It's obvious
that one optimization would be to pack multiple instructions into a
single CONS cell, but that could well give you worse problems, as then
you have to decide how to handle CALLs. Perhaps the answer should be
that a CALL is always the CDR of a CONS pair...

> So essentially we can use the data stack also as a return address
> stack. This is a zero operand instruction set, so it is very compact
> and can be implemented efficiently, but it takes a lot of instructions
> to do simple things (e.g. accessing items deep in the stack).

"Doctor, it hurts when I do this." ;-)

It's possible that you shouldn't consider "accessing items deep in the
stack" to be a "simple thing".

> The "dip" instruction is so important I'm tempted to include variants:
> 12: DIP2 : (a b fxn -> a b)
> 13: DIP3 : (a b c fxn -> a b c)
> 14: DIP4 : (a b c d fxn -> a b c d)

The DIP instruction is so complex, high-overhead, and dependent on
precise details of user needs that I'm tempted to not include it.

> 15: open for suggestions

YES! 4 bits! (Assuming you don't want math or conditionals...)

> Of course "DIP" can be implemented as "SWP QTE COMP CALL", but I'm
> concerned about performance. The "DIP" is one of the most frequent
> instructions in my code.

It seems to me that DIP is inherently complex -- you don't want to
stretch your clock cycle to accommodate a single instruction. Much
better to let users implement DIP the way they need it.

Although I just noticed you use CONCAT instead of CONS. Hmm. That's
another inherently complex instruction (it takes linear time).

> Anyway, hopefully this makes sense to my fellow concatenators. I'd
> love to hear thoughts and suggestions. I'm pretty naive when it comes
> to hardware.

Think about this carefully... Generally speaking, the minimal
requirement for hardware is that all instructions should be O(1). I've
seen that requirement skipped... Chuck Moore's processors use cycle
times so fast that there's isn't time for the carry flag to propagate
on addition, so when you do addition you have to allow the stack to
settle in the ALU before you read the result of the addition (it's
simpler in practice than it sounds). Most CISCs have varying
instruction times. But in general it makes things messy.

Whatever you do, you do NOT want any single instruction to NOT take
constant time.

> - Christopher

-Billy

William Tanksley, Jr — 2007-11-02 18:26:56

Don Groves <dgroves@...> wrote:
> A generator is a persistent function. When activated, it pushes a
> copy of
> itself on the stack. The copy performs the generative function and
> remains
> on the stack until specifically dropped. Each time it comes to the
> top of the stack it generates a value according to its definition.

"Comes to the top of the stack" is unclear. It would make more sense
to say that every time GENERATE is called on it, it produces the next
value. (You could also overload UNCONS to work that way... but that
seems dangerous.)

> Example: [1 2 ...] defgen Z+
> Z+ is now a generator a copy of which will generate positive integers as
> long as it remains active (until dropped from the stack).

As long as you're defining new syntax, why not have "[x ...]" be
defined in the language as a "generator quotation", so you don't need
'defgen'? Just use 'const' or whatever.

> stack:
> => Z+
> yields -
> stack:[2 3 ...] 1
> =>

> The next time it comes to the top of the stack, it will generate a 2,
> thusly -
> stack:[3 4 ...] 2
> =>

Specifically, does this mean that:

=> Z+ SWAP SWAP
yields -
stack:1 2 [4 5 ...] 3

?

If so, is there any way to drop a generator from the stack?

> This behaviour continues until the user/program decides to stop it.

How?

> Thus,
> the ellipsis generator, ..., can mimic an infinite list. Z+ is
> included in the
> ACL standard library.

Obviously the ellipsis generator is either very clever with lots of
special cases, or it can only handle additive sequences.

> ... and so on until 99 is produced, at which point the generator copy
> self-destructs.

Without being dropped? How will client code know that their stack's
been changed?

> Don Groves

-Billy

Christopher Diggins — 2007-11-02 18:32:58

On 11/2/07, William Tanksley, Jr <wtanksleyjr@...> wrote:
> Christopher Diggins <cdiggins@...> wrote:
> > So I've been thinking about concatenative hardware lately. Here is a
> > single-stack machine instruction set:
>
> Interesting. I've played a lot with stack hardware (designed a
> complete processor for a HW class), so your twist on the subject is
> very thought-provoking.
>
> > 0: JZ (val fxn -> ) // jump to fxn if value is below zero
> > 1: CALL (fxn) // push return address and jump to fxn
> > 2: LTEQ (val val -> bool) // compare values
> > 3: AND (val val -> val) // perform bitwise and
> > 4: XOR (val val -> val) // perform XOR operation
> > 5: SHL (val val -> val) // shift-left
> > 6: DUP (a -> a a) // duplicate top-value
> > 7: QTE (val -> fxn) // create a thunk
> > 8: COMP (fxn fxn -> fxn) // compose two functions (append their contents)
> > 9: POP (a -> ) // pop value from stack
> > 10: DIP (a fxn -> a) // applies function to below top of stack
> > 11: SWP (a b -> b a) // swap top two items
>
> Very high-level... You'll need more detailed semantics for that to
> work at all (memory management will be a BEAR for you).

Yes, this is where I am stuck.

> Of course,
> you're entirely missing ADD and such. Oh, and I see you've got a
> comparison instruction, but no conditional call or branch. I assume
> you intended to imply those, of course.

Whoops some mistakes. Especially the missing ADD.

> You might want to look at Henry Baker's papers on a similar topic.
> Check out his paper on O(1) instructions which could implement a VERY
> flexible VM that doesn't require GC;
> http://home.pipeline.com/~hbaker1/LinearLisp.html. "Hash cons"ing is
> very important, and might even fit into hardware.

Great, I'll look into it.

> Once you get memory management a bit more simple, you need to
> determine the implications of your instruction encoding. You don't
> explain your encoding, which leads me to suspect that each instruction
> is encoded as a byte.

4 bits is my goal.

> Needless to say, bit packing is a LOT nicer,
> even when you're doing a VM.
>
> The worst implication of your instruction encoding -- and this isn't
> something you've made explicit -- is that your instructions _appear_
> to be stored in CONS cells.

I'm looking for alternatives, I don't like the idea of CONS cells. I'm
thinking of a linked list of mini-stacks, but this is just a vague
notion at this point.

> This implies that there's one per cell.
> That's MAJORLY inefficient in every way, needless to say.

Yep.

> It's obvious
> that one optimization would be to pack multiple instructions into a
> single CONS cell,

I'm thinking of 32 bit stack, which can contain 8 4 bit instructions.

> but that could well give you worse problems, as then
> you have to decide how to handle CALLs. Perhaps the answer should be
> that a CALL is always the CDR of a CONS pair...

Interesting idea.

> > So essentially we can use the data stack also as a return address
> > stack. This is a zero operand instruction set, so it is very compact
> > and can be implemented efficiently, but it takes a lot of instructions
> > to do simple things (e.g. accessing items deep in the stack).
>
> "Doctor, it hurts when I do this." ;-)
>
> It's possible that you shouldn't consider "accessing items deep in the
> stack" to be a "simple thing".
>
> > The "dip" instruction is so important I'm tempted to include variants:
> > 12: DIP2 : (a b fxn -> a b)
> > 13: DIP3 : (a b c fxn -> a b c)
> > 14: DIP4 : (a b c d fxn -> a b c d)
>
> The DIP instruction is so complex, high-overhead, and dependent on
> precise details of user needs that I'm tempted to not include it.

Good point. Besides I need those slots for ADD, a PUSH, and possibly a NOOP.

> > 15: open for suggestions
>
> YES! 4 bits! (Assuming you don't want math or conditionals...)

JZ was the conditional (which I am thinking of combining with LTEQ, to
make JLE (Jump if less than or equal) when do we use LTEQ outside of
a branch anyway, if we need a true LTEQ we can say: JLE

> > Of course "DIP" can be implemented as "SWP QTE COMP CALL", but I'm
> > concerned about performance. The "DIP" is one of the most frequent
> > instructions in my code.
>
> It seems to me that DIP is inherently complex -- you don't want to
> stretch your clock cycle to accommodate a single instruction.

That is a good point.

> Much
> better to let users implement DIP the way they need it.

Its just that when I generate Cat from C code algorithmically it
generates a ton of DIPs.

> Although I just noticed you use CONCAT instead of CONS. Hmm. That's
> another inherently complex instruction (it takes linear time).

Not neccessarily, if we use lists with tail pointers. However, that is ugly.

> > Anyway, hopefully this makes sense to my fellow concatenators. I'd
> > love to hear thoughts and suggestions. I'm pretty naive when it comes
> > to hardware.
>
> Think about this carefully... Generally speaking, the minimal
> requirement for hardware is that all instructions should be O(1).

Good idea.

> I've
> seen that requirement skipped... Chuck Moore's processors use cycle
> times so fast that there's isn't time for the carry flag to propagate
> on addition, so when you do addition you have to allow the stack to
> settle in the ALU before you read the result of the addition (it's
> simpler in practice than it sounds). Most CISCs have varying
> instruction times. But in general it makes things messy.
>
> Whatever you do, you do NOT want any single instruction to NOT take
> constant time.

Great point. Thank you for mentioning it.

Thanks a lot for sharing your ideas!

- Christopher

Rodney D Price — 2007-11-02 20:11:58

Chris,

This suggestion may be somewhat orthogonal to your memory management
issue (or not), but it's something I've been interested in
for some time: Could you build a machine with capability-based
addressing? That is, instead of building an MMU (not that you said
you were going to) could you protect a process' memory from other
processes with capabilities?

In this context, a "capability" is like a conventional pointer into
memory, but it also includes other information, such as the size of
the memory area it points to, some permission bits (read/write/
execute user code/execute system code), and an extra bit to indicate
that this is a capability.

See Levy, "Capability-based Computing Systems" for a somewhat dated
review, at http://www.cs.washington.edu/homes/levy/capabook/

So why do I suggest this for your Cat machine? Cat has a real static
type system (not just "types"). There's a very interesting
interaction between capabilities and type systems: if your machine
had capabilities, the Cat type system might be able to handle memory
management for you. The suggestion below for LinearLisp is somewhat
in this vein, but doesn't go as far as some current research in
handling the problem. For a start, have a look at Greg Morrisett,
"Linear regions are all you need", and "Monadic regions", both
available at his web page, http://www.eecs.harvard.edu/~greg/papers/
index.html. Another site of interest is the Typed Assembly Language
site at http://www.cs.cornell.edu/talc/

This way, you "pay" for memory management, and you get MMU-like
process separation for free. Or vice versa.

-Rod


On Nov 2, 2007, at 12:32 PM, Christopher Diggins wrote:

> On 11/2/07, William Tanksley, Jr <wtanksleyjr@...> wrote:
> > Christopher Diggins <cdiggins@...> wrote:
> > > So I've been thinking about concatenative hardware lately. Here
> is a
> > > single-stack machine instruction set:
> >
> > Interesting. I've played a lot with stack hardware (designed a
> > complete processor for a HW class), so your twist on the subject is
> > very thought-provoking.
> >
> > > 0: JZ (val fxn -> ) // jump to fxn if value is below zero
> > > 1: CALL (fxn) // push return address and jump to fxn
> > > 2: LTEQ (val val -> bool) // compare values
> > > 3: AND (val val -> val) // perform bitwise and
> > > 4: XOR (val val -> val) // perform XOR operation
> > > 5: SHL (val val -> val) // shift-left
> > > 6: DUP (a -> a a) // duplicate top-value
> > > 7: QTE (val -> fxn) // create a thunk
> > > 8: COMP (fxn fxn -> fxn) // compose two functions (append their
> contents)
> > > 9: POP (a -> ) // pop value from stack
> > > 10: DIP (a fxn -> a) // applies function to below top of stack
> > > 11: SWP (a b -> b a) // swap top two items
> >
> > Very high-level... You'll need more detailed semantics for that to
> > work at all (memory management will be a BEAR for you).
>
> Yes, this is where I am stuck.
>
> > Of course,
> > you're entirely missing ADD and such. Oh, and I see you've got a
> > comparison instruction, but no conditional call or branch. I assume
> > you intended to imply those, of course.
>
> Whoops some mistakes. Especially the missing ADD.
>
> > You might want to look at Henry Baker's papers on a similar topic.
> > Check out his paper on O(1) instructions which could implement a
> VERY
> > flexible VM that doesn't require GC;
> > http://home.pipeline.com/~hbaker1/LinearLisp.html. "Hash cons"ing is
> > very important, and might even fit into hardware.
>
> Great, I'll look into it.
>
> > Once you get memory management a bit more simple, you need to
> > determine the implications of your instruction encoding. You don't
> > explain your encoding, which leads me to suspect that each
> instruction
> > is encoded as a byte.
>
> 4 bits is my goal.
>
> > Needless to say, bit packing is a LOT nicer,
> > even when you're doing a VM.
> >
> > The worst implication of your instruction encoding -- and this isn't
> > something you've made explicit -- is that your instructions _appear_
> > to be stored in CONS cells.
>
> I'm looking for alternatives, I don't like the idea of CONS cells. I'm
> thinking of a linked list of mini-stacks, but this is just a vague
> notion at this point.
>
> > This implies that there's one per cell.
> > That's MAJORLY inefficient in every way, needless to say.
>
> Yep.
>
> > It's obvious
> > that one optimization would be to pack multiple instructions into a
> > single CONS cell,
>
> I'm thinking of 32 bit stack, which can contain 8 4 bit instructions.
>
> > but that could well give you worse problems, as then
> > you have to decide how to handle CALLs. Perhaps the answer should be
> > that a CALL is always the CDR of a CONS pair...
>
> Interesting idea.
>
> > > So essentially we can use the data stack also as a return address
> > > stack. This is a zero operand instruction set, so it is very
> compact
> > > and can be implemented efficiently, but it takes a lot of
> instructions
> > > to do simple things (e.g. accessing items deep in the stack).
> >
> > "Doctor, it hurts when I do this." ;-)
> >
> > It's possible that you shouldn't consider "accessing items deep
> in the
> > stack" to be a "simple thing".
> >
> > > The "dip" instruction is so important I'm tempted to include
> variants:
> > > 12: DIP2 : (a b fxn -> a b)
> > > 13: DIP3 : (a b c fxn -> a b c)
> > > 14: DIP4 : (a b c d fxn -> a b c d)
> >
> > The DIP instruction is so complex, high-overhead, and dependent on
> > precise details of user needs that I'm tempted to not include it.
>
> Good point. Besides I need those slots for ADD, a PUSH, and
> possibly a NOOP.
>
> > > 15: open for suggestions
> >
> > YES! 4 bits! (Assuming you don't want math or conditionals...)
>
> JZ was the conditional (which I am thinking of combining with LTEQ, to
> make JLE (Jump if less than or equal) when do we use LTEQ outside of
> a branch anyway, if we need a true LTEQ we can say: JLE
>
> > > Of course "DIP" can be implemented as "SWP QTE COMP CALL", but I'm
> > > concerned about performance. The "DIP" is one of the most frequent
> > > instructions in my code.
> >
> > It seems to me that DIP is inherently complex -- you don't want to
> > stretch your clock cycle to accommodate a single instruction.
>
> That is a good point.
>
> > Much
> > better to let users implement DIP the way they need it.
>
> Its just that when I generate Cat from C code algorithmically it
> generates a ton of DIPs.
>
> > Although I just noticed you use CONCAT instead of CONS. Hmm. That's
> > another inherently complex instruction (it takes linear time).
>
> Not neccessarily, if we use lists with tail pointers. However, that
> is ugly.
>
> > > Anyway, hopefully this makes sense to my fellow concatenators. I'd
> > > love to hear thoughts and suggestions. I'm pretty naive when it
> comes
> > > to hardware.
> >
> > Think about this carefully... Generally speaking, the minimal
> > requirement for hardware is that all instructions should be O(1).
>
> Good idea.
>
> > I've
> > seen that requirement skipped... Chuck Moore's processors use cycle
> > times so fast that there's isn't time for the carry flag to
> propagate
> > on addition, so when you do addition you have to allow the stack to
> > settle in the ALU before you read the result of the addition (it's
> > simpler in practice than it sounds). Most CISCs have varying
> > instruction times. But in general it makes things messy.
> >
> > Whatever you do, you do NOT want any single instruction to NOT take
> > constant time.
>
> Great point. Thank you for mentioning it.
>
> Thanks a lot for sharing your ideas!
>
> - Christopher
>
>



[Non-text portions of this message have been removed]

William Tanksley, Jr — 2007-11-02 20:31:22

William Tanksley, Jr <wtanksleyjr@...> wrote:
> You might want to look at Henry Baker's papers on a similar topic.
> Check out his paper on O(1) instructions which could implement a VERY
> flexible VM that doesn't require GC;
> http://home.pipeline.com/~hbaker1/LinearLisp.html. "Hash cons"ing is
> very important, and might even fit into hardware.

While re-reading this, I was thinking how this might not only fit into
hardware, it might even allow multiple parallel processors on the same
chip talking to external memory. Whoops, looks like Baker beat me to
it! See his section titled "Implications for Real Multiprocessors".

If you also want to handle arrays in such an architecture, you might
have to explore his suggestion in
http://home.pipeline.com/~hbaker1/ShallowArrays.html. That might have
the nice side effect of giving you machine-addressable code (the most
common architecture right now).

This looks like a really interesting processor arch.

-Billy

William Tanksley, Jr — 2007-11-02 20:51:57

William Tanksley, Jr <wtanksleyjr@...> wrote:
> If you also want to handle arrays in such an architecture, you might
> have to explore his suggestion in
> http://home.pipeline.com/~hbaker1/ShallowArrays.html. That might have
> the nice side effect of giving you machine-addressable code (the most
> common architecture right now).

Ugh. I read that, and have NO idea what he's trying to say. If anyone
else does, please speak up...

-Billy

Stevan Apter — 2007-11-02 22:16:39

who IS henry baker anyway?





[Non-text portions of this message have been removed]

John Cowan — 2007-11-02 21:38:42

William Tanksley, Jr scripsit:

> Ugh. I read that, and have NO idea what he's trying to say. If anyone
> else does, please speak up...

I understand it, but it wasn't easy....

Basically the idea is that you represent each state of an array by a
linked list holding the history of changes made to that state.
There can be several different lines of change which may share
things; if I give you an array with 5 changes, you can add 3 more
without interfering with the changes I make.

The algorithm is a clever way to implement this without giving up
the idea of a having a linear array. The underlying linear array
is used as a cache for one particular state: when we want to work
with a different state, we permute the data structure so that the
new state is now what appears in the cache.

--
Mark Twain on Cecil Rhodes: John Cowan
I admire him, I freely admit it, http://www.ccil.org/~cowan
and when his time comes I shall cowan@...
buy a piece of the rope for a keepsake.

John Cowan — 2007-11-02 21:43:29

Stevan Apter scripsit:
> who IS henry baker anyway?

He's probably best known for the Baker garbage collector, which divides
the available space into Tospace and Fromspace. Each gc run lazily
copies some non-garbage from Fromspace to Tospace until nothing is left
in Fromspace: then the two semi-spaces swap identities. It was the
first stop and copy (as opposed to mark and sweep or mark-sweep-compact)
garbage collector.

One of his papers also inspired Chicken Scheme.

--
John Cowan cowan@... http://ccil.org/~cowan
Half the lies they tell about me are true.
-- Tallulah Bankhead, American actress

Don Groves — 2007-11-02 21:53:18

Sorry for not changing the subject line earlier...

On Nov 2, 2007, at 19:26 , William Tanksley, Jr wrote:

> Don Groves <dgroves@...> wrote:
>> A generator is a persistent function. When activated, it pushes a
>> copy of
>> itself on the stack. The copy performs the generative function and
>> remains
>> on the stack until specifically dropped. Each time it comes to the
>> top of the stack it generates a value according to its definition.
>
> "Comes to the top of the stack" is unclear. It would make more sense
> to say that every time GENERATE is called on it, it produces the next
> value. (You could also overload UNCONS to work that way... but that
> seems dangerous.)

Good point! I could just use the I or DIP combinator to execute a
generator.
This would make it more generally useful.


>> Example: [1 2 ...] defgen Z+
>> Z+ is now a generator a copy of which will generate positive
>> integers as
>> long as it remains active (until dropped from the stack).
>
> As long as you're defining new syntax, why not have "[x ...]" be
> defined in the language as a "generator quotation", so you don't need
> 'defgen'? Just use 'const' or whatever.

That was my original design and I'll return to that, thanks!


>> stack:
>> => Z+
>> yields -
>> stack:[2 3 ...] 1
>> =>
>
>> The next time it comes to the top of the stack, it will generate a 2,
>> thusly -
>> stack:[3 4 ...] 2
>> =>
>
> Specifically, does this mean that:
>
> => Z+ SWAP SWAP
> yields -
> stack:1 2 [4 5 ...] 3
>
> ?
>
> If so, is there any way to drop a generator from the stack?

I was thinking of having a dropY instruction, but then that couldn't be
defined as swap drop could it? This problem is also solved by using
a combinator to execute a generator.


>> This behaviour continues until the user/program decides to stop it.
>
> How?
>
>> Thus,
>> the ellipsis generator, ..., can mimic an infinite list. Z+ is
>> included in the
>> ACL standard library.
>
> Obviously the ellipsis generator is either very clever with lots of
> special cases, or it can only handle additive sequences.

That depends on the definition. [0 -1 ...] would generate negative
sequences, for example. I could also include an operator in the
definition
and make it more group-like. Nothing in ACL is cast in concrete yet
which
is one reason I appreciate your (and other's) comments and suggestions
so much!


>> ... and so on until 99 is produced, at which point the generator copy
>> self-destructs.
>
> Without being dropped? How will client code know that their stack's
> been changed?

One thing I forgot to mention is that a finite generator acts like an
empty
list after generating its last value so a null value will be
generated when
the client code asks for the next value. I suppose I could require
the client
code to drop the generator at that point...
--
Don


>
>> Don Groves
>
> -Billy
>
>
>
> Yahoo! Groups Links
>
>
>
>

William Tanksley, Jr — 2007-11-02 21:59:32

Stevan Apter <sa@...> wrote:
> who IS henry baker anyway?

He should be especially notable to THIS group for having written "The
Forth Shall Be First", a paper exploring how Forth (ahem, a
concatenative language) implemented a cutting-edge theoretical concept
long before that concept became known.

An interesting read, and applicable to almost all concatenative
languages (more so to the functional ones).

His home page is here: http://home.pipeline.com/~hbaker1/

The "Forth Shall Be First" paper is here:
http://home.pipeline.com/~hbaker1/ForthStack.html

-Billy

William Tanksley, Jr — 2007-11-03 00:05:44

Christopher Diggins <cdiggins@...> wrote:
> William Tanksley, Jr <wtanksleyjr@...> wrote:
> > Very high-level... You'll need more detailed semantics for that to
> > work at all (memory management will be a BEAR for you).

> Yes, this is where I am stuck.

I'm picturing a multiprocessor model based on Baker's model, with one
dedicated memory processor (hooks up to external memory) and a bunch
of general purpose processors, each with a decent amount (TBD) of
internal memory.

The memory processor will handle synchronizing all the internal
memories with the external shared memory, and will also continuously
compact external memory.

(Of course, this can also be done in software as a VM, in which case
the "memory processor" will simply be a compactor task.)

> > Of course,
> > you're entirely missing ADD and such. Oh, and I see you've got a
> > comparison instruction, but no conditional call or branch. I assume
> > you intended to imply those, of course.

> Whoops some mistakes. Especially the missing ADD.

The "missing comparison" claim was my mistake -- I didn't notice JZ. My bad.

> > Once you get memory management a bit more simple, you need to
> > determine the implications of your instruction encoding. You don't
> > explain your encoding, which leads me to suspect that each instruction
> > is encoded as a byte.

> 4 bits is my goal.

Whoops, I did see that later; should have corrected my
misapprehension. 4 bits is great.

Another thing to consider is that control flow instructions are
special and rarely used; if you're packing multiple instructions in a
single memory word, it makes sense to define those instructions as
valid only at the last instruction in the word, which leaves those
opcodes open for other uses in the other instruction slots. If you do
this, you can has room for all the arithmetical opcodes.

> > The worst implication of your instruction encoding -- and this isn't
> > something you've made explicit -- is that your instructions _appear_
> > to be stored in CONS cells.

> I'm looking for alternatives, I don't like the idea of CONS cells. I'm
> thinking of a linked list of mini-stacks, but this is just a vague
> notion at this point.

That's equivalent to what I was thinking of as a linked list of
arrays, so we're on the same page.

> > It's obvious
> > that one optimization would be to pack multiple instructions into a
> > single CONS cell,

> I'm thinking of 32 bit stack, which can contain 8 4 bit instructions.

Ah, I see. Yes, that's just fine.

> > but that could well give you worse problems, as then
> > you have to decide how to handle CALLs. Perhaps the answer should be
> > that a CALL is always the CDR of a CONS pair...

> Interesting idea.

It would be like the RPC-4000
(http://catb.org/jargon/html/story-of-mel.html, highly recommended
programming literature) -- every instruction word would include its
own implicit GOTO.

I guess you'd bitpack a bit more and use the last instruction slot to
choose between JUMP, BRANCH, and CALL behavior (so only 2 bits; I
guess we'd invent a CALLBRANCH instruction to use all the bits, and
we'd encode a few extra instructions in the 2 remaining bits, so every
32-bit word would have 3 full instructions, 1 short instruction, and 1
call instruction).

Now, because the instruction stream isn't on the stack, it might be
possible to load it from a linked list in shared memory (canonical
form) into a array form in-processor, thus making it much more
compact. I have no idea what would work better. (There will not be any
issues with cache locality, so it may be better to simply store the
on-chip form as a linked list -- in fact the first software model
should do that.)

> Good point. Besides I need those slots for ADD, a PUSH, and
> possibly a NOOP.

NOOP is very handy when dealing with slotted instructions... Sometimes
you just don't have 4 instructions to fill a word with.

> > Much
> > better to let users implement DIP the way they need it.
> Its just that when I generate Cat from C code algorithmically it
> generates a ton of DIPs.

Sure, but this processor isn't Cat; a decent generator for it will
generate different code than a decent one would for Cat.

To be fair, if we can do DIP in vaguely similar time to the other
instructions I've got no beef with it.

> > Although I just noticed you use CONCAT instead of CONS. Hmm. That's
> > another inherently complex instruction (it takes linear time).

> Not neccessarily, if we use lists with tail pointers. However, that
> is ugly.

Oh, yes, you could tail-CONS the dipped value on, if that's what you
meant. But tail pointers would be really, really nasty... I'm pretty
sure they wouldn't work in general (EVERY node of a list would have to
maintain a tail pointer, right?).

There's got to be a cleaner solution...

I definitely would prefer using CONS as the machine primitive; CONCAT
is (as I mentioned) linear time, and that's just not good in hardware.

> Thanks a lot for sharing your ideas!

Hard to stop me, isn't it. ;-)

> - Christopher

-Billy

Christopher Diggins — 2007-11-03 01:51:29

On 11/2/07, Rodney D Price <rodprice@...> wrote:
> Chris,
>
> This suggestion may be somewhat orthogonal to your memory management
> issue (or not), but it's something I've been interested in
> for some time: Could you build a machine with capability-based
> addressing? That is, instead of building an MMU (not that you said
> you were going to) could you protect a process' memory from other
> processes with capabilities?

Yes I could do that, but I am currently strudying minimalist system
architectures.

> So why do I suggest this for your Cat machine? Cat has a real static
> type system (not just "types"). There's a very interesting
> interaction between capabilities and type systems: if your machine
> had capabilities, the Cat type system might be able to handle memory
> management for you.

So the gain be would be increased safety, correct? Capabilities as I
understand them are essentially dynamically managed permissions. I
haven't yet been swooned by capabilities, but perhaps I just need to
do more reading on them.

Would there be commercial interest in this line of exploration? Right
about now some kind of financial sponsorship would be very motivating.
;-)

- Christopher

William Tanksley, Jr — 2007-11-03 03:45:06

Christopher Diggins <cdiggins@...> wrote:
>>for some time: Could you build a machine with
>>capability-based addressing? That is, instead
>>of building an MMU (not that you said you were
>>going to) could you protect a process' memory
>>from other processes with capabilities?

>Yes I could do that, but I am currently studying
>minimalist system architectures.

Then you'll certainly want to study Capability based security. A
capability is a pointer without which it is impossible to access a
given object.

Very powerful, very simple. Shapiro (KeyKOS, EROS, Coyotos) has
written a lot about it online. The paper
http://citeseer.ist.psu.edu/689826.html was useful to me.

The Baker model I pointed to would make Capabilities very simple to
build; it would be entirely impossible (not merely improbable) to
access a data structure which isn't linked to from something already
given to you.

> Would there be commercial interest in this line
> of exploration? Right
> about now some kind of financial sponsorship
> would be very motivating. ;-)

Grin.

I don't know, but I doubt it.

Unless maybe you can convince Chuck Moore that your work would go
perfectly on his SeaForth24 chip. He's got plenty of money now that
Intel and AMD are paying him royalties on his patents.

> - Christopher

-Billy

Daniel Ehrenberg — 2007-11-03 04:39:12

On 11/2/07, Stevan Apter <sa@...> wrote:
>
> who IS henry baker anyway?
>

Kind of funny you asked. I've heard his name referenced so many times,
but never really understood who he is, except some kind of programming
demigod who you can cite as a source of absolute truth. Wikipedia
(http://en.wikipedia.org/wiki/Henry_Baker_(computer_scientist)) says
"a computer scientist who has made contributions in garbage
collection, functional programming languages, and linear logic. He was
also one of the founders of Symbolics." but not much else. His website
(http://home.pipeline.com/~hbaker1/), in dire need of a CSS file,
shows that he invented a whole bunch of stuff. I guess he's most
notable for Cheney on the MTA (a model for a CPS translation of Lisp
to C, the inspiration for Chicken) and his Linear Lisp, a lisp which
compiles to a forth and needs no garbage collection, just a whole
bunch of copying. He also opposed relational databases, and came up
with a whole bunch of algorithms.

But really, to me, he's somewhat of a mystery, still. Where did he
come from? What was his job?

Daniel Ehrenberg

Christopher Diggins — 2007-11-04 18:23:51

On 11/2/07, William Tanksley, Jr <wtanksleyjr@...> wrote:
> Christopher Diggins <cdiggins@...> wrote:
> >>for some time: Could you build a machine with
> >>capability-based addressing? That is, instead
> >>of building an MMU (not that you said you were
> >>going to) could you protect a process' memory
> >>from other processes with capabilities?
>
> >Yes I could do that, but I am currently studying
> >minimalist system architectures.
>
> Then you'll certainly want to study Capability based security. A
> capability is a pointer without which it is impossible to access a
> given object.

For non-mission critical embedded systems, is this really a huge gain?
I can imagine though it being useful for embedded devices on cars and
rockets, and other such things.

How should an architecture react in the case of catastrophe (i.e.
capability violation)? Some kind of interrupt mechanism I imagine?

> Very powerful, very simple. Shapiro (KeyKOS, EROS, Coyotos) has
> written a lot about it online. The paper
> http://citeseer.ist.psu.edu/689826.html was useful to me.

I'll look at it soon.

> The Baker model I pointed to would make Capabilities very simple to
> build; it would be entirely impossible (not merely improbable) to
> access a data structure which isn't linked to from something already
> given to you.

The kind of architecture I am considering at this time is so tiny that
the overheard of capability pointers would be too big. However, as
soon as I move into larger architectures I can see advantages.

> > Would there be commercial interest in this line
> > of exploration? Right
> > about now some kind of financial sponsorship
> > would be very motivating. ;-)
>
> Grin.
>
> I don't know, but I doubt it.
>
> Unless maybe you can convince Chuck Moore that your work would go
> perfectly on his SeaForth24 chip. He's got plenty of money now that
> Intel and AMD are paying him royalties on his patents.

You know, that isn't a half-bad idea!

It would be interesting to implement a Cat machine using one of those
SeaForth 24 chips.

- Christopher

Christopher Diggins — 2007-11-04 18:25:53

> > QTE and COMP allows you to treat function like stacks, by adding data
> > to them, which you can accesss later by calling CALL. So
> > computationally it is a multi-stack machine.
>
> Could you describe their behaviour a bit more?

I'll be providing some C code shortly, but here is the brief description.

QTE creates a thunk (a nullary function) from a value. The value is
popped from the stack and replaced with a function that always returns
the value.

COMP composes two function pointers to create a new function that
executes the first function then the second function.

- Christopher

Don Groves — 2007-11-04 21:30:32

On Nov 4, 2007, at 19:25 , Christopher Diggins wrote:

>>> QTE and COMP allows you to treat function like stacks, by adding
>>> data
>>> to them, which you can accesss later by calling CALL. So
>>> computationally it is a multi-stack machine.
>>
>> Could you describe their behaviour a bit more?
>
> I'll be providing some C code shortly, but here is the brief
> description.
>
> QTE creates a thunk (a nullary function) from a value. The value is
> popped from the stack and replaced with a function that always returns
> the value.

Enlightenment time...

I understand the form of this construct but not its essential nature.
How does it differ from, and in what way is it more useful than, a
constant?

-- Don



> COMP composes two function pointers to create a new function that
> executes the first function then the second function.
>
> - Christopher
>
>
>
> Yahoo! Groups Links
>
>
>
>

Christopher Diggins — 2007-11-04 23:13:15

> > QTE creates a thunk (a nullary function) from a value. The value is
> > popped from the stack and replaced with a function that always returns
> > the value.
>
> Enlightenment time...
>
> I understand the form of this construct but not its essential nature.
> How does it differ from, and in what way is it more useful than, a
> constant?

It allows you to create functions dynamically out of thin air. This is
useful when performing computations with higher-order functions.

- Christopher

Don Groves — 2007-11-05 00:17:22

On Nov 4, 2007, at 19:13 , Christopher Diggins wrote:

>>> QTE creates a thunk (a nullary function) from a value. The value is
>>> popped from the stack and replaced with a function that always
>>> returns
>>> the value.
>>
>> Enlightenment time...
>>
>> I understand the form of this construct but not its essential nature.
>> How does it differ from, and in what way is it more useful than, a
>> constant?
>
> It allows you to create functions dynamically out of thin air. This is
> useful when performing computations with higher-order functions.

Ah, since higher-order functions only operate on functions... Thanks!
--
Don


>
> - Christopher
>
>
>
> Yahoo! Groups Links
>
>
>
>

William Tanksley, Jr — 2007-11-05 17:22:16

Christopher Diggins <cdiggins@...> wrote:
> William Tanksley, Jr <wtanksleyjr@...> wrote:
> > Then you'll certainly want to study Capability
> > based security.

> For non-mission critical embedded systems, is
> this really a huge gain?

Sorry, I had a different understanding of 'minimalist' than you do.
Yes, there's no need to implement capabilities when you don't need any
process security at all.

> How should an architecture react in the case of catastrophe (i.e.
> capability violation)? Some kind of interrupt mechanism I imagine?

The Baker system can't express a capability violation (it's simply not
possible). For most other systems, a capability violation is the same
as accessing a page of memory that doesn't exist -- page fault. It
doesn't mean somebody's being bad; it simply means someone tried to
access memory using something that isn't a pointer.

> It would be interesting to implement a Cat machine using one of those
> SeaForth 24 chips.

I agree.

> - Christopher

-Billy