a minor change to cons

John Nowak — 2010-06-08 01:43:46

I've commented before that the map fusion rule for Haskell:

map f . map g == map (f . g)

... has no nice analog in Joy. I now think this may not be because Joy is stack-based, but rather because it's not stack-based enough!

Because I need to for this to work, I separate quotations ("[]"), lists ("{}"), and stacks ("()"; these are just first class stacks similar to lists but heterogeneous) into separate data types. I believe this is a good idea regardless; see previous posts.

Currently, 'cons' and 'uncons' have the following semantics (where "'a" denotes a single element of any type and "X" denotes zero or more elements):

'a {X} cons == {'a X}
{'a X} uncons == 'a {X}

I'm proposing a small change; the idea is that 'cons' can only add a stack to a list; it is an error to attempt to add anything else (e.g. "1 {} cons"):

(Y) {X} cons == {(Y) X}
{(Y) X} uncons == (Y) {X}

Because all lists now are composed of zero or more stacks, we can write a nice version of 'map' that uses the 'infra' combinator:

(X) [F] infra == (X F)
infrad == [infra] dip

map = [swap null?]
[drop]
[[uncons] dip [infrad] keep map cons]
ifte

This version of 'map' enjoys the following property:

[G] map [F] map == [G F] map

It's also strictly more useful. For example, we can now map a function across a list that returns more than one value (such as 'dup' to give a trivial example).

This "stack on the stack" approach can be used to clean up the semantics for the cleave and spread combinators as well. For example, if we redefine Factor's 'bi', 'bi@', and 'bi*' as such:

X [G] [F] bi == X G (X F)
Y (X) [G] [F] bi* == Y G (X F)
Y (X) [F] bi@ == Y F (X F)

... we then enjoy the following laws:

[I] [H] bi [G] [F] bi* == [I G] [H F] bi
[I] [H] bi* [G] [F] bi* == [I G] [H F] bi*
[H] bi@ [G] [F] bi* == [H G] [H F] bi
[I] [H] bi [F] bi@ == [I F] [H F] bi
[I] [H] bi* [F] bi@ == [I F] [H F] bi*
[H] bi@ [F] bi@ == [H F] bi@

This approach also eliminates the need for '2bi', '3bi', et cetera; the programmer can simply elect to push more values into the stack on top of the stack.

Final point: All of the above functions are easily typeable. First-class stacks cause no issues.

Thoughts?

- jn

William Tanksley, Jr — 2010-06-08 16:02:29

John Nowak <john@...> wrote:
> I've commented before that the map fusion rule for Haskell:
>   map f . map g  ==  map (f . g)
> ... has no nice analog in Joy. I now think this may not be because Joy is stack-based, but rather because it's not stack-based enough!

Very interesting! I need to explore this, and ask you for practical
and demonstrative examples. You tend to get a little theoretical, so I
need to bring you down to earth so that I can implement your ideas in
my HIGHLY practical zeroone language (cue the laugh track, ha ha ha).

> Because I need to for this to work, I separate quotations ("[]"), lists ("{}"), and stacks ("()"; these are just first class stacks similar to lists but heterogeneous) into separate data types. I believe this is a good idea regardless; see previous posts.

I think you're implying that lists contain multiple elements of the
same data type, right? So a "list" contains an unknown number of
elements all of the same type, while a "stack" contains an unknown
number of elements of any data type. (In Joy, of course, the two would
be the same type.) It would be quite possible to make a runtime-typed
language (like Joy) that enforced the distinction you're hinting at.

> Currently, 'cons' and 'uncons' have the following semantics (where "'a" denotes a single element of any type and "X" denotes zero or more elements):
>      'a {X} cons  ==  {'a X}
>    {'a X} uncons  == 'a {X}
> I'm proposing a small change; the idea is that 'cons' can only add a stack to a list; it is an error to attempt to add anything else (e.g. "1 {} cons"):
>     (Y) {X} cons  ==  {(Y) X}
>   {(Y) X} uncons  ==  (Y) {X}

Iiiiinteresting. We probably shouldn't start talking about
multidimensional arrays now, and what happens when we expand this to
allow adding lists to lists (instead of only stacks to lists).

One thing that's REALLY interesting about this is that Manfred's
notation for [cons] is that he uses quotation syntax, that is [square
brackets], to indicate something that's pushed on the stack (whereas
you and I prefer a single-item quotation mark like \ or '). Amusingly,
under your proposal, Manfred's notation would remain literally valid
(i.e. you could type it into an interpreter and have it work), while
ours wouldn't.

> Because all lists now are composed of zero or more stacks, we can write a nice version of 'map' that uses the 'infra' combinator:
>   (X) [F] infra  ==  (X F)
>          infrad  ==  [infra] dip
>   map = [swap null?]
>         [drop]
>         [[uncons] dip  [infrad] keep  map cons]
>         ifte

Ok, here I have to ask for examples. I really want to know what this
looks like in a live context -- that is, with someone using the
results as part of an algorithm... But that's a lot to ask (and I
didn't ask for it with the banana notation or the 'cleave'
combinators). So let me just ask for some simple examples, because I
really don't get the depth of this change just by looking at your
definitions.

I'm glad you mention the cases where this change makes reasoning about
programs simpler, so that I could start to get enthusiastic about it
:-). I think I need a little more explanation, though. Sorry, I'm
slow.

> This version of 'map' enjoys the following property:
>   [G] map [F] map  ==  [G F] map

That's a really nice property, concatenatively speaking.

> It's also strictly more useful. For example, we can now map a function across a list that returns more than one value (such as 'dup' to give a trivial example).

Okay, I can see that... But, just for the sake of discussion, could
you please expand on this? It seems that the setup and takedown from a
map could be quite messy, since you have to 'enstack' and 'destack'
all of the items. Do you have an algorithm in mind that would be made
simpler by this approach?

> This "stack on the stack" approach can be used to clean up the semantics for the cleave and spread combinators as well. For example, if we redefine Factor's 'bi', 'bi@', and 'bi*' as such:
> This approach also eliminates the need for '2bi', '3bi', et cetera; the programmer can simply elect to push more values into the stack on top of the stack.

I really like hearing that. Numbered combinators annoy me. (The fact
that this is actually a doubly-numbered combinator just takes it to
new extremes... '2bi'???)

> - jn

-Wm