rewriting + second order vs. higher order expressivity

John Nowak — 2008-06-19 07:50:08

I've recently been considering a concatenative language variant
without first-class functions. In such a language, you'd have a clear
hierarchy with objects at the bottom, functions that manipulate
objects above that, and finally functionals (aka combining forms) that
manipulate functions at the top. For the sake of clarity, we'll call
this language C2 (as it is a second order concatenative language).

In order to get to my point, let me very briefly explain C2. In C2,
functions are written solely in terms of the composition of other
functions. For example:

a = b c(d, e f) g

Here, we're defining some function 'a' that is the composition of
functions 'b', 'c(d, e, f)', and 'g'. Note that 'c' is a combining
form which takes the functions 'd', 'e', and 'f' as arguments and
yields a new function as a result.

Functionals, however, are not written solely in terms of composition.
Such a thing is impossible because functions are not first class
objects. Instead, we express functionals using rewrite rules. For
example, here is the definition for 'bi@' (uppercase letters are used
for function variables):

bi@($F) = dip($F) $F

Here's an example of how the rewriting works:

2 4 bi@(8 *)
= 2 4 dip(8 *) 8 *
= 2 8 * 4 8 *
= 16 4 8 *
= 16 32

It is very important to note that, in C2, variables can only be used
in definitions for functionals and that only function variables are
allowed (i.e. you can't give a name to an object). For example, this
is not legal in C2 (lowercase letters are used for object variables):

$a $b hypot = $a sqr $b sqr + sqrt

The reason this is illegal in C2 is that '$a' and '$b' are objects,
not functions. If this were legal, functions would not be defined
solely in terms of composition of existing functions.

This restriction (a lack of object variables) leads to some
expressivity problems. Say we want to write some function 'foo' that
takes four arguments and multiplies the bottom three by the first. In
other words, '1 2 3 4 foo = 4 8 12'. In a language with higher order
functions, we might write 'foo' as such (where 'papply' is partial
application):

foo = [*] papply tri@

In C2, however, this is obviously not an option as we don't have first
class functions or 'papply'. Hence, we'd need to write something like
this, which is much uglier and opaque than the higher order version:

foo = tuck 2dip(tuck) tri@(*)

One way to solve this problem is to introduce the language C2'. C2' is
the same as C2 with the exception that functions and functionals can
use object variables. These variables can only appear at the "base"
level of a definition. For example, this is legal:

$a $b swap = $b $a

But this is not (since $a is "lifted" into a functional):

$a map-with($F) = map($a $F)

The reason for this restriction on C2' is that it means we can, using
'dip' 'dup' 'drop' and 'swap', translate any C2' program into a C2
program. In other words, C2' is essentially just a sort of syntactic
sugar.

Using C2', we can now write 'foo' more easily (if still not quite as
nicely as the higher order version):

$a $b $c $d foo = $a $d * $b $d * $c $d *

Unfortunately, C2' also has expressivity problems. Let's return to the
'map-with' example above. There's no way to write this functional in
C2' unless the function given to 'map' is allowed to access the entire
stack (as opposed to be restricted to an arity of 1 -> 1). If we
assume the function can access the entire stack, we can write 'map-
with' as such:

map-with($F) = swap map(over $F)

What this does is call 'map' with some value under the list that gets
copied above the value pushed onto the list each time '$F' is called.
This works, but it's rather ugly compared to the higher order version:

map-with = papply map

That, however, it only half of the problem. The other half is that we
need an n-ary map. N-ary combinators are more difficult to explain and
arguably more prone to error. They also have rather ugly rewriting
rules. For example, the best rule we can come up with for an n-ary
'map-with' in C2' is this:

{$x0, $x1, ... $xN} $a map-with($F) ->
null $a $x0 over $F cons $x1 over $F cons ... $xN over $F cons

It is necessary to expand everything like this because $F is allowed
to access the entire stack. This is much uglier than how 'map-with'
works if '$F' is restricted to an arity of 1 -> 1 and we lift '$a'
directly:

{$x0, $x1, ... $xN} $a map-with($F) ->
{$x0 $a $F, $x1 $a $F, ... $xN $a $F}

The difference between the two is not merely cosmetic. In the second
version, it is easy to reduce each list element separately and in any
order you please. In the first version, each resulting list element
depends on the call to 'over', and hence you're stuck doing everything
left to right.

The way to solve this problem is to introduce the final language, C2R
(C2 with unrestricted rewrite rules). In C2R, it is legal to "lift"
objects into functionals. This essentially gives us back a way of
doing partial application without needing first-class functions. For
example, in C2R, we can write 'map-with' as such:

$a map-with($F) = map($a $F)

This version is as clear as can be and permits the second version of
the rewriting semantics given above.

Alright, so we have three languages. C2 is nice in that the right side
of function definitions only consists of compositions. The benefit of
C2' is that it makes writing certain functions (primarily mathematical
formulas) easier while still being translatable with C2.

C2R, however, is not translatable to C2. The addition of unrestricted
rewrite rules objectively increases the expressiveness of the
language. This illustrates a big difference between the C2 languages
and their higher order cousins: In a higher order concatenative
language, variables can always be translated away, even if the lifting
of objects into functions is permitted.

For 5th, it seems that C2R is the way to go. However, I have some
serious reservations. In particular, I'm not sure if C2R is even a
concatenative language, as there's no way to get rid of the variables
and hence you can't deal solely with function composition. In fact,
the introduction of unrestricted rewriting essentially gives you let
expressions (although I'm not sure if I'd allow overlapping scopes).

I'm not really sure what I'm asking here. Perhaps I'm just looking for
a gut response to the C2R proposal. If anyone has any thoughts, I'd
very much appreciate them. This is the last "big decision" for 5th
before I can release something.

- John

John Nowak — 2008-06-19 08:49:30

Slava has given an interesting example that has made me rethink
getting rid of the n-ary 'map' functional:

Given an array like {1, f, f, f, 2, f, f, 3, 4, f}, turn it
into {1, 1, 1, 1, 2, 2, 2, 3, 4, 4}.

This sort of thing is easy to do with a fold. However, if you want to
avoid generating a new array, it seems that you *need* an n-ary map
combinator that works as such:

{$x0, $x1, ... $xN} map($F) =
null $x0 $F cons $x1 $F cons ... $xN $F cons

Of course, given that we're dealing with arrays, 'map' wouldn't
*actually* be consing and could avoid the generation of another array.
Of course, we're also assuming the array is singly referenced so that
it can be modified in place without visible mutation (which can be
enforced on the type level in 5th).

If we were to write 'map-with' using this n-ary 'map' in C2R, we could
manage to avoid the ugly 'over's of the C2 version and still be able
to reduce each list item independently:

{$x0, $x1, ... $xN} $a map-with($F) =
null $x0 $a $F cons $x1 $a $F cons ... $xN $a $F cons

Interesting. I guess 'map 'should remain n-ary and an n-ary fold
(called 'each' in Factor) should be included as well.

- John

William Tanksley, Jr — 2008-06-19 13:27:46

John Nowak <john@...> wrote:
> I'm not really sure what I'm asking here. Perhaps I'm just looking for
> a gut response to the C2R proposal. If anyone has any thoughts, I'd
> very much appreciate them. This is the last "big decision" for 5th
> before I can release something.

Well, you've got my sympathy... That's an unfortunate dilemma. It
looks like this model of a first-order language works very poorly with
concatenativity. It this purely a result of this model -- might it be
possible to partly solve this by other means?

I'm not coming up with any enlightenment. Forth solves this with a
full macro system, which is WAY too powerful. Would it be possible to
define some kind of pseudofunctional that can be called from inside a
functional to mean "lift a stack item into here"? My mind is rebelling
from this, so it's probably a really bad idea. I can't tell.

> - John

-Wm

John Nowak — 2008-06-19 15:24:52

On Jun 19, 2008, at 9:27 AM, William Tanksley, Jr wrote:

> John Nowak <john@...> wrote:
>
>> I'm not really sure what I'm asking here. Perhaps I'm just looking
>> for
>> a gut response to the C2R proposal. If anyone has any thoughts, I'd
>> very much appreciate them. This is the last "big decision" for 5th
>> before I can release something.
>
> Well, you've got my sympathy... That's an unfortunate dilemma. It
> looks like this model of a first-order language works very poorly with
> concatenativity. It this purely a result of this model -- might it be
> possible to partly solve this by other means?
>
> Would it be possible to
> define some kind of pseudofunctional that can be called from inside a
> functional to mean "lift a stack item into here"? My mind is rebelling
> from this, so it's probably a really bad idea. I can't tell.


Such an approach is the only way I can think of to solve the problem.
Unfortunately, such an approach is also impossible (or, at least,
requires far too much magic to be worth it). I could give some
examples that demonstrate this, but you probably already have a good
intuition for why this can't work.

I should note that a postfix language based on term rewriting is
actually a very nice thing! I'm not exactly disappointed to be in this
situation; it's just something of a surprise.

I should also note that allowing the programmer to define things in
terms of rewrite rules doesn't complicate the language. The reason is
that you need some notion of substitution in order to make sense of
the language primitives in the first place.

Being able to work with objects directly also makes proofs possible
without a ton of function-level axioms. Let's say, for example, that I
want to prove that the program 'dip(swap)' is equivalent to the
program '-rot swap rot'. In order to do this, we first need to know
the semantics of these functions and functionals:

$a dip($F) = $F $a
$a $b swap = $b $a
$a $b $c -rot = $c $a $b
$a $b $c rot = $b $c $a

The next step, of course, is to set the programs equal to each other:

dip(swap) = -rot swap rot

At this point, if we're not allowed to deal with objects, we're stuck
unless we have some additional set of function-level axioms. However,
if we're allowed to deal with objects, this is easy. The first thing
we need to do is employ one of the rules of "concatenative algebra" (I
need to post something about this eventually) that says we can prepend
or append any expression to both sides of the equality:

$a $b $c dip(swap) = $a $b $c -rot swap rot

Now, we just simplify both sides using the rewriting semantics for
these function(al)s:

$a $b $c dip(swap) = $a $b $c -rot swap rot
$a $b swap $c = $a $b $c -rot swap rot
$b $a $c = $a $b $c -rot swap rot
$b $a $c = $c $a $b swap rot
$b $a $c = $c $b $a rot
$b $a $c = $b $a $c

While Backus does talk much about how FP's pointfree nature makes
proofs simpler, it's really the second order nature of the language
that is responsible for most of the easy manipulatability. Of course,
pointfree manipulations are still possible in many cases even if the
programmer is allowed to deal with objects directly.

Anyway, I hope that shows that it's beneficial to be able to
manipulate objects, at least in proofs (if that wasn't already
obvious). If the language is defined in terms of rewrite rules, I
don't think it adds much complication to allow programmers to use the
same system to define new functions and functionals. The only
unfortunate thing is that you *need* to express programs in terms of
rewrite rules in some cases to get around the lack of first-class
functions. That said, I see this neither as a major theoretical issue
nor a major practical issue.

- John