William Tanksley, Jr — 2008-04-03 22:24:56

The Factor developers have been discovering that their new
combinators, inspired by Dr. von Thun's 'cleave' combinator documented
at http://www.latrobe.edu.au/philosophy/phimvt/joy/j01tut.html, have
been dramatically reducing stack noise in their fairly substantial
library.

Slava, the author/primary author of the Factor language, writes:

---------- Forwarded message ----------
From: Slava Pestov <slava@...>
Date: Tue, Apr 1, 2008 at 8:18 PM
Subject: [Factor-talk] cleave, 2cleave, and spread
To: factor-talk@...

Hi all,

Yesterday I discovered that using these three words together with
'call', any stack shuffle can be expressed.

Let's recap:

- cleave takes one value, and an array of quotations. It applies each
quotation to the value; the value is removed from the stack afterwards.
- 2cleave takes two values, and an array of quotations. it applies
each quotation to the two values; the two values are removed from the
stack afterwards.
- spread takes n values and an array of n quotations; it applies the
nth quotation to the nth value in turn.

Now, we can define

: drop { } cleave ;

This works because cleave always consumes one input value, and the
number of outputs is the sum of the number of outputs of each
quotation in the array. The array is empty, so it has one input and no
outputs.

We need a way to duplicate values. This does the trick:

: dup { [ ] [ ] } cleave ;

We need to be able to swap values:

: swap { [ nip ] [ drop ] } 2cleave ;

But what about nip?

: nip { [ drop ] [ ] } spread ;

Now we only need one more combinator, dip:

: dip swap { [ call ] [ ] } spread ;

Once you have drop, dup, swap and dip, you can express any stack
permutation.

While this doesn't have much practical value, it does show that there
is something fundamental about these combinators. The reason they can
replace stack shufflers in so many cases is because they're equivalent
in a sense. However they capture something that stack shufflers don't,
and that is explicit one-to-many, one-to-one and many-to-one dataflow.

Slava

_______________________________________________
Factor-talk mailing list
Factor-talk@...
https://lists.sourceforge.net/lists/listinfo/factor-talk

John Nowak — 2008-04-04 03:12:57

On Apr 3, 2008, at 6:24 PM, William Tanksley, Jr wrote:

> Slava, the author/primary author of the Factor language, writes:
>
> - cleave takes one value, and an array of quotations. It applies each
> quotation to the value; the value is removed from the stack
> afterwards.
> - 2cleave takes two values, and an array of quotations. it applies
> each quotation to the two values; the two values are removed from the
> stack afterwards.

While cleave/2cleave do seem to form a possible base, they are
impossible to type. That's not to say cleave isn't convenient, but for
convenience alone you can always offer it as a macro that translates a
statically-known list of functions to the same thing in terms of some
simpler core set. I'd argue that using cleave with arrays not known at
compile-time is very difficult for a human to reason about anyway and
likely should be avoided in all cases.

- John

William Tanksley, Jr — 2008-04-04 04:10:05

John Nowak <john@...> wrote:

> William Tanksley, Jr wrote:
> > Slava, the author/primary author of the Factor language, writes:

> > - cleave takes one value, and an array of quotations. It applies each
> > quotation to the value; the value is removed from the stack
> > afterwards.
> > - 2cleave takes two values, and an array of quotations. it applies
> > each quotation to the two values; the two values are removed from the
> > stack afterwards.

> While cleave/2cleave do seem to form a possible base, they are
> impossible to type.

That's a good point; obviously, that's not yet an issue with Factor,
and when it becomes an issue, they probably won't be able to use
untyped arrays with it (as you clarify later). I'm not sure that it's
impossible to type, by the way; it seems like it should be possible if
the arrays are typed.

> That's not to say cleave isn't convenient, but for
> convenience alone you can always offer it as a macro that translates a
> statically-known list of functions to the same thing in terms of some
> simpler core set.

What would such a thing look like -- either the macro or the "simpler
core set"? (Just curious.)

Actually, I think we may be on the same page here. Cleave and friends
do nicely get rid of stack shuffles and clarify dataflow -- but they
don't always look clear to me, and I think the problem may be
fundamental. There are just too many nested layers.

Does anyone have any ideas for a possibly more clear -- and statically
safe -- way to cleave dataflow? Obviously, implicit iteration (ala
J/K) is one.

> I'd argue that using cleave with arrays not known at
> compile-time is very difficult for a human to reason about anyway and
> likely should be avoided in all cases.

I entirely agree -- although since I enjoy dynamic typing when it's
appropriate, I'd like to modify that to "arrays not knowable at
compile time". I don't object to the compiler being clueless; I only
object to the programmer being so.

> - John

-Wm

Daniel Ehrenberg — 2008-04-04 05:09:42

On Thu, Apr 3, 2008 at 10:12 PM, John Nowak <john@...> wrote:

>
>
> While cleave/2cleave do seem to form a possible base, they are
> impossible to type. That's not to say cleave isn't convenient, but for
> convenience alone you can always offer it as a macro that translates a
> statically-known list of functions to the same thing in terms of some
> simpler core set. I'd argue that using cleave with arrays not known at
> compile-time is very difficult for a human to reason about anyway and
> likely should be avoided in all cases.
>
> - John

If cleave/2cleave/spread are considered to take an array as an
argument, they can't be typed. But if they take a heterogeneous
vector, whose length and the types of all entries are known before
runtime, then it should be possible to give it an overall type given
some mechanism to do calculations on types. I just don't know what
that type is, exactly... Given how complicated their type would be,
though, I think a dup/swap/dip/drop base would be easier to work with
in a statically typed language.

Dan

John Nowak — 2008-04-04 05:53:38

On Apr 4, 2008, at 12:10 AM, William Tanksley, Jr wrote:

> I'm not sure that it's impossible to type, by the way;
> it seems like it should be possible if the arrays are typed.

I'll cover this in my next email (responding to Daniel).

>> That's not to say cleave isn't convenient, but for
>> convenience alone you can always offer it as a macro that
>> translates a
>> statically-known list of functions to the same thing in terms of some
>> simpler core set.
>
> What would such a thing look like -- either the macro or the "simpler
> core set"? (Just curious.)

The translation looks something like this:

{[foo] [bar] [baz]} cleave ==> [foo] keep [bar] keep [baz] i

In other words, you just insert 'keep' between the functions and add
'i' at the end (as you don't want to keep the value around that last
time). For 2cleave, use 2keep, etc. Accordingly, you have an easy way
to implement 'cleave' as a macro in a typed language.

Of course, since it is a macro, you do have to know the list at
compile time (i.e. it has to be provided as a literal). According to a
discussion earlier tonight with Slava, this seems to be the most
common use case by far.

- John

John Nowak — 2008-04-04 06:48:18

On Apr 4, 2008, at 1:09 AM, Daniel Ehrenberg wrote:

> If cleave/2cleave/spread are considered to take an array as an
> argument, they can't be typed. But if they take a heterogeneous
> vector, whose length and the types of all entries are known before
> runtime, then it should be possible to give it an overall type given
> some mechanism to do calculations on types. I just don't know what
> that type is, exactly...

If you wanted to give a type to 'cleave', it would look like this:

cleave :: A(0) b {[A(0) b -> A(1)] ... [A(N-1) b -> A(N)]} -> A(N)

The reason for the complicated type is that you need to express not
only that the types of each element may be different, but also that
they are dependent on each other and can be sensibly composed.

> though, I think a dup/swap/dip/drop base would be easier to work with
> in a statically typed language.

Absolutely; they're simple, direct, and efficient to implement. Of
course, it's easy to imagine reducing this list. For example:

drop :: A b -> b
blort :: A b c [A -> D] -> D c b b

And from this we can derive the standard functions as such:

swap = [] blort drop
dup = swap [] blort
nip = swap drop
dip = [] swap blort drop nip
i = [] dip drop
...

There is possibly a cleaner typed two function base than this; it's
just my first stab at it. I am not sure though there is a cleaner base
where only one of the two functions is a combinator.

- John

William Tanksley, Jr — 2008-04-04 14:47:23

John Nowak <john@...> wrote:

> Daniel Ehrenberg wrote:
> > though, I think a dup/swap/dip/drop base would be easier to work with
> > in a statically typed language.

I think that's going to have to depend on the meaning of "easy to work with".

> Absolutely; they're simple, direct, and efficient to implement.

In other words, you're looking from the perspective of the compiler
writer (especially a type inferencing compiler).

From a programmer's perspective, the simplest base will depend on the
type of programming. This is why I so much liked the stack shuffle
notation instead of the classic named stack rearrangement words; they
replace a bunch of executable concepts with a single non-executing
pattern.

BUT... These words are also a replacement for a base, and they would
be useful for writing code where parallel dataflow is the central
concept.

> Of course, it's easy to imagine reducing this list. For example:
> drop :: A b -> A
> blort :: A b c [A -> D] -> D c b b

You have read http://tunes.org/~iepos/joy.html, right? If I'm reading
you right, the big difference between your basis and his bases is that
yours is designed to be typed (whoops, and that it doesn't enlist). In
his terms, I think your combinator is:

[C] [B] [A] blort == A [B] [C] [C]

> And from this we can derive the standard functions as such:
> swap = [] blort drop
> dup = swap [] blort
> nip = swap drop
> dip = [] swap blort drop nip
> i = [] dip drop
> ...

> There is possibly a cleaner typed two function base than this; it's
> just my first stab at it. I am not sure though there is a cleaner base
> where only one of the two functions is a combinator.

I'm not sure what you'd define as "cleaner", but I suspect a smaller
number of inputs and outputs would qualify.