recursion is too hard
John Nowak — 2009-03-09 21:41:31
A non-controversial title...
One issue I have with stack-based languages is that operating on
recursively-defined data structures is awkward and has subtle
pitfalls. To keep this simple, I'll use a list; this problem is only
magnified with more complex structures however.
Assume these operations (where curly braces denote lists):
cons = \((s, x), xs) -> (s, x:xs)
uncons = \(s, x:xs) -> ((s, x), xs)
Here's an implementation of 'map' in a second-order concatenative
language; I use the second-order approach as 'map' with a first-class
function is more difficult and obscures the issue:
map(F) = ifte(null?, id, uncons spread(F, map(F)) cons)
Can you spot the problem? Is there a *general* method for avoiding it?
To prove my point that the stack is to blame, I note that the exact
same definition in a language that uses tuples to bundle elements only
when necessary does not have the same problem. Its primitives would
work as follows:
cons = \(x, xs) -> x:xs
uncons = \(x:xs) -> (x, xs)
- John
P.S. Apologies for the Haskell, but I needed a uniform syntax for
giving the semantics of the primitives in both languages.
John Nowak — 2009-03-09 21:56:48
On Mar 9, 2009, at 5:41 PM, John Nowak wrote:
> Assume these operations (where curly braces denote lists):
>
> cons = \((s, x), xs) -> (s, x:xs)
> uncons = \(s, x:xs) -> ((s, x), xs)
Sorry. Ignore that parenthetical. ':' denotes 'cons' and (,) denotes a
tuple as in Haskell.
- John
William Tanksley — 2009-03-09 22:06:19
John Nowak wrote:
> map(F) = ifte(null?, id, uncons spread(F, map(F)) cons)
> Can you spot the problem? Is there a *general* method for avoiding it?
I hope you get a clueful reply. I'll speak for all whose answer is "no"
to say, "no, I don't spot the problem."
> - John
-Wm
John Nowak — 2009-03-09 22:44:41
On Mar 9, 2009, at 6:06 PM, William Tanksley wrote:
> John Nowak wrote:
>
>> map(F) = ifte(null?, id, uncons spread(F, map(F)) cons)
>>
>> Can you spot the problem? Is there a *general* method for avoiding
>> it?
>
> I hope you get a clueful reply. I'll speak for all whose answer is
> "no"
> to say, "no, I don't spot the problem."
I'll give a (big) hint then. Here's the fixed version:
map(F) = ifte(null?, id, uncons swap spread(map(F), F) swap cons)
- John
John Nowak — 2009-03-12 06:35:19
On Mar 9, 2009, at 6:06 PM, William Tanksley wrote:
> John Nowak wrote:
>
>> map(F) = ifte(null?, id, uncons spread(F, map(F)) cons)
>> Can you spot the problem? Is there a *general* method for avoiding
>> it?
>
> I hope you get a clueful reply. I'll speak for all whose answer is
> "no"
> to say, "no, I don't spot the problem."
Alright, if no one will bite...
The problem is that all calls to 'F' except the first see the values
meant to be later consed onto the list from previous calls of 'F'. One
solution is to swap the values out of the way, call 'map(F)' below the
values to be later consed on, and then swap them back before the cons:
map(F) = ifte(null?, id, uncons swap spread(map(F), F) swap cons)
My alternative "elegant" solution to this is a spread combinator
variant that hides the result of the first function from the second.
Slava has said this is possible to implement efficiently using smart
combinators.
Despite some strong reservations about having semantics depend on
types or stack effects, I think that smart combinators should probably
be used more heavily. Producing better smart combinators is probably
the most interesting "practical" research topic that could be worked
on in the context of stack-based languages at the moment. I hope
someone here is interested in pursuing it.
- John
Chris Double — 2009-03-12 09:04:48
On Thu, Mar 12, 2009 at 7:35 PM, John Nowak <
john@...> wrote:
> The problem is that all calls to 'F' except the first see the values
> meant to be later consed onto the list from previous calls of 'F'.
So is this the issue whereby you want the quotation passed to 'map' in
the Factor example below to be able to access the values on the stack
to use in the calculation?
5 { 1 2 3 } [ over + ] map nip
=> { 6 7 8 }
Combinators require that calls to the quotation can't see
implementation details on the stack and this makes them hard to write?
Of so, I agree and I often fall back to locals to implement them (in
Factor).
Chris.
--
http://www.bluishcoder.co.nz
John Nowak — 2009-03-12 09:18:44
On Mar 12, 2009, at 5:04 AM, Chris Double wrote:
> On Thu, Mar 12, 2009 at 7:35 PM, John Nowak <john@...>
> wrote:
>> The problem is that all calls to 'F' except the first see the values
>> meant to be later consed onto the list from previous calls of 'F'.
>
> So is this the issue whereby you want the quotation passed to 'map' in
> the Factor example below to be able to access the values on the stack
> to use in the calculation?
Yes; or at least not have access to intermediate states of its own
computation.
> 5 { 1 2 3 } [ over + ] map nip
> => { 6 7 8 }
>
> Combinators require that calls to the quotation can't see
> implementation details on the stack and this makes them hard to write?
> Of so, I agree and I often fall back to locals to implement them (in
> Factor).
Not sure how locals would fix this particular problem. In the example
I gave, the function provided was already pulled in via substitution.
Maybe you can give an example so I can understand what you mean.
- John
Chris Double — 2009-03-12 09:32:03
On Thu, Mar 12, 2009 at 10:18 PM, John Nowak <
john@...> wrote:
>
> Not sure how locals would fix this particular problem. In the example
> I gave, the function provided was already pulled in via substitution.
> Maybe you can give an example so I can understand what you mean.
I didn't actually understand your example, sorry. Can you break down
your example, explaining what each word does and what stack effect it
has? I don't know what 'spread' does and I didn't know whether the
language given is like joy, in that it preserves the stack, or like
Factor, in that it doesn't.
Chris.
--
http://www.bluishcoder.co.nz
John Nowak — 2009-03-12 09:48:48
On Mar 12, 2009, at 5:32 AM, Chris Double wrote:
> On Thu, Mar 12, 2009 at 10:18 PM, John Nowak <john@...>
> wrote:
>>
>> Not sure how locals would fix this particular problem. In the example
>> I gave, the function provided was already pulled in via substitution.
>> Maybe you can give an example so I can understand what you mean.
>
> I didn't actually understand your example, sorry. Can you break down
> your example, explaining what each word does and what stack effect it
> has?
Sure thing.
map(F) = ifte(null?, id, uncons spread(F, map(F)) cons)
1. 'map(F) = ...' declares a second-order function named 'map' that
takes some statically-known function 'F' as an argument
2. 'ifte' is the same as Joy's 'ifte' function; the first argument is
the conditional, the second is the "then" branch, and the last is the
"else" branch; the stack is saved before the conditional and restored
for the "then" or "else" branch
3. 'id' is the identify function
4. spread(F, G) == dip(F) G
5. 'uncons' takes a list and returns the head of the list and the tail
of the list (with the tail on the top of the stack); 'cons' does the
reverse
The equivalent code in Factor would be as follows, with the exception
that the function wouldn't actually be passed on the stack:
: uncons ( xs -- x xs ) [ first ] [ rest ] bi ;
: cons ( x xs -- xs ) ... ; ! the inverse of 'uncons'
:: map ( xs f -- ys )
xs dup [ empty? ] [ ] [ uncons [ f ] [ f map ] bi* cons ] if ;
- John
William Tanksley — 2009-03-12 15:05:01
John Nowak wrote:
> William Tanksley wrote:
> > John Nowak wrote:
> >> map(F) = ifte(null?, id, uncons spread(F, map(F)) cons)
> >> Can you spot the problem? Is there a *general* method for avoiding
> >> it?
> The problem is that all calls to 'F' except the first see the values
> meant to be later consed onto the list from previous calls of 'F'. One
Got it (sorry, I was busy).
> My alternative "elegant" solution to this is a spread combinator
> variant that hides the result of the first function from the second.
> Slava has said this is possible to implement efficiently using smart
> combinators.
> Despite some strong reservations about having semantics depend on
> types or stack effects, I think that smart combinators should probably
> be used more heavily. Producing better smart combinators is probably
> the most interesting "practical" research topic that could be worked
> on in the context of stack-based languages at the moment. I hope
> someone here is interested in pursuing it.
With a first-order language like yours, smart combinators make a lot of
sense. With Factor's combinators it's a little harder to justify
(although, again, it's possible).
My preference is to rethink the combinator to not _need_ to be so smart.
The first "smartness" these combinators have is that they share the
stack between all the executions (some of your variants copy the stack,
which I absolutely hate).
The first thing I'd want is to _completely_ isolate the functions from
each other by running each infra their own fresh stack. The problem then
becomes how to get the data into each combinator, and how to pull it
out.
My first knee-jerk solution is to have the programmer provide two
_additional_ quotations, one that runs before each quotation on the main
data stack to return a tuple that'll be used as the infra stack and the
other one to run after after each quotation on the result stack of each
quotation and return a tuple that'll be expanded onto the stack after
the entire execution.
: split-dataflow-parallel ( before-each {quotations} after-each -- ?? )
I've named it "-parallel" simply because the quotations themselves have
no effect on each other. A "-sequential" version would work like
Factor's current combinators in allowing each quotation to access
previous work (but would give the same problem in Nowak's case).
Some specialized variants are imaginable.
* obviously a version that required each quotation to take and produce
a single array wouldn't require any before-after quotations (at the cost
of the array production and consumption code in each quotation, and at
the cost of "manually" adding the desired arguments to an array before).
* Using type inference, we can obviously make a smart variant. I agree
with you that smart combinators are a little tricky, but perhaps if we
understand the "dumb" underlying combinator well enough this one'll
work.
> - John
-Wm
John Nowak — 2009-03-12 15:54:46
On Mar 12, 2009, at 11:05 AM, William Tanksley wrote:
> My preference is to rethink the combinator to not _need_ to be so
> smart.
> The first "smartness" these combinators have is that they share the
> stack between all the executions (some of your variants copy the
> stack,
> which I absolutely hate).
They wouldn't copy it in a real implementation. I only explain it that
way because you can understand it without getting into types.
> The first thing I'd want is to _completely_ isolate the functions from
> each other by running each infra their own fresh stack. The problem
> then
> becomes how to get the data into each combinator, and how to pull it
> out.
>
> My first knee-jerk solution is to have the programmer provide two
> _additional_ quotations, one that runs before each quotation on the
> main
> data stack to return a tuple that'll be used as the infra stack and
> the
> other one to run after after each quotation on the result stack of
> each
> quotation and return a tuple that'll be expanded onto the stack after
> the entire execution.
The problem is that this is just a pain in the ass to use.
Ideally, we'd have a 'spread' that just "does the right thing". One
way to do this is to allow each quotation access to *only* one value.
The results of the quotations would then be concatenated to the main
stack. More precisely, it would work as such:
S {T} cat == S T
S `x `y [F] [G] jn-spread == S {`x F} cat {`y G} cat
This is simple and doesn't require you to explain things in terms of
types. The downside is that F and G can't access the rest of the
stack. The only way to make this happen is to use "smart" combinators
that only make sense if you know the types. Such combinators would
need to inspect the types of the functions they were given in order to
work properly. I can give a description of how this might work if
anyone is interested, but it's downright tedious to explain so I won't
do it here.
Instead of the type-introspective approach, an easy workaround is to
provide a 'spread-with' word that takes another argument which is a
stack to use in both quotations:
S x y {T} [F] [G] jn-spread-with == S {x T F} cat {y T G} cat
It's true that neither of these suggestions would make writing a
Factor-style map that uses the entire stack as an accumulator easier
to write. The type-introspective version would, but given that I don't
even want to explain it, I can't imagine I'd endorse using it.
I should also note that none of these suggestions are currently
typeable because they would require row variable concatenation.
- John
William Tanksley — 2009-03-12 17:09:05
John Nowak wrote:
> William Tanksley wrote:
> > My preference is to rethink the combinator to not _need_ to be so
> > smart.
> > The first "smartness" these combinators have is that they share the
> > stack between all the executions (some of your variants copy the
> > stack,
> > which I absolutely hate).
> They wouldn't copy it in a real implementation. I only explain it that
> way because you can understand it without getting into types.
I know -- "copy the stack" is shorthand for "pass the same starting
stack to every function, then merge the results together." Only mutable
stacks need to be actually copied.
> > The first thing I'd want is to _completely_ isolate the functions from
> > each other by running each infra their own fresh stack. The problem
> > then
> > becomes how to get the data into each combinator, and how to pull it
> > out.
I still think this is the essential part of any solution.
> > My first knee-jerk solution is to have the programmer provide two
> > _additional_ quotations, one that runs before each quotation on the
> > main
> > data stack to return a tuple that'll be used as the infra stack and
> > the
> > other one to run after after each quotation on the result stack of
> > each
> > quotation and return a tuple that'll be expanded onto the stack after
> > the entire execution.
> The problem is that this is just a pain in the ass to use.
I should have emphasised this -- my "knee-jerk solution" is AKA a
"programmer wants to code a solution before understanding the problem".
Yes, it's a PITA to use, which is characteristic of attempts to
construct a generalized solution to a non-problem.
However, in order to understand the problem, it helps to understand it
in general. (Sadly, my quickly constructed idea isn't _fully_ general.)
> Ideally, we'd have a 'spread' that just "does the right thing". One
> way to do this is to allow each quotation access to *only* one value.
This doesn't "do the right thing" in general; Factor's library uses the
two-argument versions of cleave and spread fairly frequently.
> This is simple and doesn't require you to explain things in terms of
> types. The downside is that F and G can't access the rest of the
> stack. The only way to make this happen is to use "smart" combinators
> that only make sense if you know the types.
Not completely true -- my solution can also allow access to "the rest of
the stack" without a smart combinator, so long as all the parallel
quotations access the stack the same way. The 'before-each' quotation
provides that access.
If the quotations don't all access the stack the same way, neither my
solution nor a smart combinator will help.
> Such combinators would
> need to inspect the types of the functions they were given in order to
> work properly. I can give a description of how this might work if
> anyone is interested, but it's downright tedious to explain so I won't
> do it here.
Nope, I already know how they work -- essentially, they'd infer the
stack effects and generate a before-each and after-each function :-).
> Instead of the type-introspective approach, an easy workaround is to
> provide a 'spread-with' word that takes another argument which is a
> stack to use in both quotations:
> S x y {T} [F] [G] jn-spread-with == S {x T F} cat {y T G} cat
That seems (in general) harder than writing before-each and after-each,
actually.
> I should also note that none of these suggestions are currently
> typeable because they would require row variable concatenation.
Except mine -- we talked about that earlier.
> - John
-Wm
John Nowak — 2009-03-13 01:07:27
On Mar 12, 2009, at 1:09 PM, William Tanksley wrote:
> John Nowak wrote:
>
>> William Tanksley wrote:
>>
>> They wouldn't copy it in a real implementation. I only explain it
>> that
>> way because you can understand it without getting into types.
>
> I know -- "copy the stack" is shorthand for "pass the same starting
> stack to every function, then merge the results together." Only
> mutable
> stacks need to be actually copied.
Not sure what you mean. I've not proposed anything that would require
any copying, any sub-stacks, or any persistent stacks. They'd all work
in Factor with smart combinators. I wouldn't accept anything else on
efficiency grounds.
>> Ideally, we'd have a 'spread' that just "does the right thing". One
>> way to do this is to allow each quotation access to *only* one value.
>
> This doesn't "do the right thing" in general; Factor's library uses
> the
> two-argument versions of cleave and spread fairly frequently... If the
> quotations don't all access the stack the same way, neither my
> solution
> nor a smart combinator will help.
A smart combinator could still work just fine, *and* it could always
do the right thing without having multiple versions. Really, it can do
most anything because it knows the effects of the functions involved.
For example, you could make these work:
"genius" spread:
3 6 7 {|sq, +|} == 9 13
3 6 7 {|+, sq|} == 9 49
"genius" cleave:
3 6 7 [|sq, +|] == 3 49 13
3 6 7 [|+, sq|] == 3 13 49
Such "genius" combinators would always do the "right thing" and would
not require versions for different arities. The only downside is that
I think it's too hard to read as you need to know the effects involved
for it to make any sense. Tool support would makes this easier (e.g. I
could select a region of text and get the stack effect for the
function it represents), but I think it's probably still intolerable.
Once you permit type introspection, you can go as far as you'd like;
it's just a matter of it getting too difficult for the programmer to
keep up. You also throw a lot of the simple reasoning benefits of
concatenative languages out the window.
- John