Re: [stack] Re: Another implemenatation of Joy.

Martin Young — 2000-08-09 11:49:24

On Aug 8, 10:32pm, ghoul6@... wrote:
> Subject: [stack] Re: Another implemenatation of Joy.
>
> Louis Madon <madonl@...> wrote:
> > f [P] [Q] fork
> >
> > fork expects three values on the stack, a flag and two quotations. The
> > flag can have one of three values:
> >
> > Left
> > Right
> > Both
> >
> > If Left, the left quotation P is executed. If Right, the right
> > quotation Q is executed. If Both, then both P and Q are executed
> > simultaenously. In this case, execution resumes after the fork only
> > when P and Q have both completed. Stack results will be presented as if
> > P and then Q had been executed sequentially on the same input stack.

This is *really* difficult. You're saying that the stack after P and Q have
executed concurrently must be identical to the stack as it would appear if P
and Q had executed sequentially. This is difficult because P and Q will
actually execute entirely asynchronously but, logically at least, on the same
data.

If there are actually executing on the *same* stack then the resultant stack
will depend on the temporal nature of their asynchronisity whilst if they
execute on separate stacks the system is required to "merge" them which, again,
requires knowledge of their exact temporal interactions. Automatically
producing a result which is a combination of several threads' work is really
tricky.

There's an algebra of concurrently called CSP, largely the work of C.A.R.
Hoare. It attempts to describe how concurrent processes can interact in a well
specified manner.

On Aug 8, 10:32pm, ghoul6@... wrote:
> It seems to me the simplest mechanism of concurrency is futures. Say you
> could write:
[snip convincing description of futures]

I'm not sure this is the simplest. Would it not be simply to have a word such
as split (or fork) such that:

P1 P2 fork -> L1 L2

Which P1 and P2 are concurrently executed on separate copies of the stack.
When they have both finished their final stacks are left (that's L1 and L2).

Both of these mechanism would make it difficult to implement concurrent
producer-consumer which is a mainstay of concurrent design. Both futures and
"fork" require a thread to finish executing thus leaving a result.

You *could* extend futures so that the future will eventually be a list, and
certain operations (most notably uncons) are allowed on a future list without
forcing the whole list to be computed. This is the rudiment of lazy
evaluation, but explicitly expressed in the langauge.

[snip description of lazy evaluation]

Adding laziness to a Joy-like system would be interesting. I'm concerned that
it might be doomed from the outset because for laziness to be useful requires
the language to be pure (i.e. no side effects, and no dependence on order of
execution). Having the whole stack passed from word to word makes the order of
execution pretty rigid which limits how much laziness can be employed. I
think.


--
Martin Young working for STMicroelectronics at `(o)_(o)' The fat wise /
1000 Aztec West, Almondsbury, Bristol, BS32 4SQ. ( V ) owl eats only >
+44 1454 462 523 `v' Martin.Young@... `.___,' clean mice. /
_(_)_ -="==="=============='

Louis Madon — 2000-08-09 12:35:21

Mark van Gulik wrote:

> It seems to me the simplest mechanism of concurrency is futures. Say you
> could write:
>
> [100 factorial] future [50 factorial] future +
>
> This mechanism would push two futures on the stack, and arrange for
> processes to be created to start computing their results. The futures turn
> into their results when the computation completes (indirection objects work
> well for this).
>
> An alternative variant of futures would make conversion from a future to its
> value more explicit:
>
> [100 factorial] future [50 factorial] future force swap force +
>
> Again, as soon as the future is constructed a process starts computing its
> result. The force operation waits for the computation to complete (if it
> hasn't already) and then extracts the result.

Yes, nice. I guess that an unforced future can act as a handle to the
thread and this may open up the possibility of inter-thread
communications (among other things). On the other hand ...


> Consider a function "race" that takes a list of futures and returns the
> future that completed first (or maybe its result). A general rendezvous
> mechanism can probably be constructed from this.
>
> One can also imagine such operations as finished (to test for completion)
> and lazy-future (the work process is only created when forced).

It seems that all these functions {future, lazy-future, forced,
finished, race} would have to be primitives. Moreover it is not clear
even with this set how additional capabilites like access to shared
resources could be provided without adding even more primitives.
I wonder what the "minimal complete" basis for concurrency in Joy might
be. By "complete" I mean that you could in principle specify any
parallel algorithm.


>
> > Another approach I've heard of for reducing copying is to maintain a
> > history of changes, ie. a list of deltas. The idea is that whenever a
> > change is made that must not be visible to other references, you simple
> > insert a delta (eg. (delete 2) -> [3] -> [2] -> [1] -> []) and then take
> > those into account when accessing the structure. I think it can be done
> > quite efficiently for lists, but I can't remember where I read it.
>
> You may want to look at a really incredible book titled "Purely Functional
> Data Structures" by Chris Okasaki (Cambridge, ISBN 0 521 63124 6). He
> presents an interesting collection of data structures. At first the data
> structures have amortized bounds on their operations. He then refines this
> for precise worst-case costs on the operations.

Yes its a great book, though its been a while since I lasted looked at
it.

>
> The main "trick" to narrowing worst-case bounds is to use lazy evaluation to
> do things like gradually producing the reverse of a list or some other
> rebalancing activity. As "pseudo-destructive" changes are made to the main
> data structure (e.g., cons), a small increment of work is performed, forcing
> one or more suspended lazy computations to be converted to non-lazy objects.
> This is vaguely similar to some schemes for incremental garbage collection.
> Note that converting the lazy-evaluation nodes into "real" nodes isn't
> really a side-effect, other than its influence on performance of later
> operations.

Indeed the breadth of possibilities for implementing data structures
(even "basic" types like lists) is enourmous. I think this is more
reason to _not_ have lists as primitives. If a particular context can't
benefit from a particular implementation strategy (like lazy
computation) I'd prefer having the choice to specify a different
strategy.


--
Louis.

Louis Madon — 2000-08-09 13:16:21

Martin Young wrote:
>
> On Aug 8, 10:32pm, ghoul6@... wrote:
> > Subject: [stack] Re: Another implemenatation of Joy.
> >
> > Louis Madon <madonl@...> wrote:
> > > f [P] [Q] fork
> > >
> > > fork expects three values on the stack, a flag and two quotations.
> The
> > > flag can have one of three values:
> > >
> > > Left
> > > Right
> > > Both
> > >
> > > If Left, the left quotation P is executed. If Right, the right
> > > quotation Q is executed. If Both, then both P and Q are executed
> > > simultaenously. In this case, execution resumes after the fork
> only
> > > when P and Q have both completed. Stack results will be presented
> as if
> > > P and then Q had been executed sequentially on the same input
> stack.
>
> This is *really* difficult. You're saying that the stack after P and Q have
> executed concurrently must be identical to the stack as it would
> appear if P and Q had executed sequentially. This is difficult because P and Q
> will actually execute entirely asynchronously but, logically at least, on
> the same data.

I wasn't really thinking about ease of implementation (although
efficient implementation is important to me). From a use perspective, it
seems simple.

>
> If there are actually executing on the *same* stack then the resultant
> stack will depend on the temporal nature of their asynchronisity whilst if
> they execute on separate stacks the system is required to "merge" them
> which, again, requires knowledge of their exact temporal interactions.

The join point after the fork is an implicit synchronisation point, the
idea being that (maybe!) we can do away with explicit synchronisation
primitives. For example,

[[P] [Q] fork] while

Here, you could get a string of invocations of P and Q and each time
around the loop the two processes could exchange data. If this style of
concurrency isn't too unintuitive or limited (a big if), it would not be
neccesary to introduce inter-thread communication or synchronisation at
all.

> Automatically producing a result which is a combination of several threads' work is
> really tricky.

Can you elaborate? I don't see any problem at all. There are _exactly_
two threads and their results are presented in order (P then Q). A
simple implementation would be:

P-num-inputs, P-num-outputs <- arity P (analysis of P to determine
arity)
Q-num-inputs, Q-num-outputs <- arity Q (ditto for Q)

P-stack <- copy P-num-inputs from input stack
Q-stack <- copy Q-num-inputs " "

exec P on P-init-stack
exec Q on Q-init-stack

wait for _both_ P and Q to finish

append P-num-outputs from final P-stack to input stack
append Q-num-outputs " "


Perhaps you're thinking that this can't be done efficiently. However, I
don't see that fork introduces any fundamentally new problems that don't
already come up in writing an optimising compiler. In other words, if
you do sufficient static analysis to do good optimisation you also have
sufficient information to "merge" the threads efficiently (as well as
avoid unnecesary copying from the input stack).

>
> There's an algebra of concurrently called CSP, largely the work of
> C.A.R.
> Hoare. It attempts to describe how concurrent processes can interact
> in a well
> specified manner.

Thanks, I'll have a look. If its algebraic it may fit in well with the
rest of Joy.

>
> [snip description of lazy evaluation]
>
> Adding laziness to a Joy-like system would be interesting. I'm concerned that
> it might be doomed from the outset because for laziness to be useful requires
> the language to be pure (i.e. no side effects, and no dependence on order of
> execution).

Joy currently is pure (well - 'get' and 'put' are exceptions)

> Having the whole stack passed from word to word makes the order of
> execution pretty rigid which limits how much laziness can be
> employed. I think.

Its not rigid, a compiler can easily ignore the explicit order if it
determines that no data dependencies will be violated.


--
Louis.

Martin Young — 2000-08-09 13:54:18

On Aug 9, 11:16pm, madonl@... wrote:
> > Automatically producing a result which is a combination of several threads'
work is
> > really tricky.
>
> Can you elaborate? I don't see any problem at all. There are _exactly_
> two threads and their results are presented in order (P then Q). A
> simple implementation would be:
>
> P-num-inputs, P-num-outputs <- arity P (analysis of P to determine
> arity)
> Q-num-inputs, Q-num-outputs <- arity Q (ditto for Q)
>
> P-stack <- copy P-num-inputs from input stack
> Q-stack <- copy Q-num-inputs " "
>
> exec P on P-init-stack
> exec Q on Q-init-stack
>
> wait for _both_ P and Q to finish
>
> append P-num-outputs from final P-stack to input stack
> append Q-num-outputs " "

Ah. <the light dawns> I read your original statement as saying that the
program [ [P1] [P2] fork ] would be defined as yielding the same result as the
program [ P1 P2 ], which is not the case. I tried to write a sequential
program which yields the same result but I couldn't see how.

The "analysis if P to determine arity" is provably non-computable, btw. It's a
special case of the halting problem. I can write a program which pops a number
of values off the stack and then either finishes (i.e. halts), or loops around
and pops some more. How can you tell, statically, when (and if) it will halt
and thus its arity? You can't, and there's a proof of this.

Here's a reference I've just dug up on the 'net. I assume it's a correct
description of the HP.

http://faculty.juniata.edu/rhodes/intro/theory2.htm

Of course this doesn't stop your scheme working - you just have to copy the
whole stack.

<thinks> It makes detemining how to construct the eventual stack difficult
because you need to define "num outputs" for each process. Normally in Joy the
result is the whole stack but in this case I understand you to mean some upper
subset of the stack, but what subset? We're back to a non-computable problem
:-(

> Perhaps you're thinking that this can't be done efficiently. However, I
> don't see that fork introduces any fundamentally new problems that don't
> already come up in writing an optimising compiler.

I think the compiler gets a relatively easy ride because all the clever stuff
is explicit (other than analysis for arity). The programmer has a harder time
because they have to decide which words can usefully be executed concurrently,
and to collect the results.

> Joy currently is pure (well - 'get' and 'put' are exceptions)

Indeed.

> > Having the whole stack passed from word to word makes the order of
> > execution pretty rigid which limits how much laziness can be
> > employed. I think.
>
> Its not rigid, a compiler can easily ignore the explicit order if it
> determines that no data dependencies will be violated.

...and that's the difficult bit.

The stack is effectively a big piece of global data which is passed to every
word. Because almost every word alters the stack, there is dependence between
most words. The most obviously helpful word here is 'dip' but you need to know
the arity of all the preceeding words to safely execute it out-of-order.

I suspect that stack based languages are inherently strict, rather than lazy,
but this is bare-faced assertion on my part and I'd be pleased to be shown to
be incorrect.


--
Martin Young working for STMicroelectronics at `(o)_(o)' The fat wise /
1000 Aztec West, Almondsbury, Bristol, BS32 4SQ. ( V ) owl eats only >
+44 1454 462 523 `v' Martin.Young@... `.___,' clean mice. /
_(_)_ -="==="=============='

wtanksley@bigfoot.com — 2000-08-09 16:33:18

From: Mark van Gulik [mailto:ghoul6@...]
>Louis Madon <madonl@...> wrote:

>> Speaking of concurrency, I'm interested in this too. One way
>> it might be
>> added to Joy is as a generalisation of ifte (I'll call it 'fork'):

>> f [P] [Q] fork

>> fork expects three values on the stack, a flag and two
>> quotations. The flag can have one of three values:

>> Left
>> Right
>> Both

>> If Left, the left quotation P is executed. If Right, the right
>> quotation Q is executed. If Both, then both P and Q are executed
>> simultaenously.

Why would you want that instead of a more typical fork:

[process] fork

?

>> This is just a start, practical concurrency would also need
>> to provide a means of communication between threads.

True and vital.

>It seems to me the simplest mechanism of concurrency is
>futures.

Hmm... It does provide semaphores and such in a transparent way. I
wouldn't call it the simplest _mechanism_, but it might be the mechanism
which fits Joy best.

-Billy

wtanksley@bigfoot.com — 2000-08-09 16:47:14

From: Martin Young [mailto:martin.young@...]

>Adding laziness to a Joy-like system would be interesting.
>I'm concerned that
>it might be doomed from the outset because for laziness to be
>useful requires
>the language to be pure (i.e. no side effects, and no
>dependence on order of
>execution). Having the whole stack passed from word to word
>makes the order of
>execution pretty rigid which limits how much laziness can be
>employed. I think.

It certainly does limit the amount of laziness. However, it does so in a
very predictable way, and the compiler can analyse and remove the
limitations. Someone here -- my memory fails me -- is working on exactly
this system.

It's also possible to categorise side-effects; for example, the word 'put'
causes a side effect, yet its side effect is precise and measurable. A
compiler which knew about all of the words which could cause side effects
would be able to optimise them.

>Martin Young working for STMicroelectronics at `(o)_(o)'

-Billy

Louis Madon — 2000-08-09 21:40:37

Martin Young wrote:
>
> On Aug 9, 11:16pm, madonl@... wrote:
> > > Automatically producing a result which is a combination of several
> threads'
> work is
> > > really tricky.
> >
> > Can you elaborate? I don't see any problem at all. There are
> _exactly_
> > two threads and their results are presented in order (P then Q). A
> > simple implementation would be:
> >
> > P-num-inputs, P-num-outputs <- arity P (analysis of P to
> determine
> > arity)
> > Q-num-inputs, Q-num-outputs <- arity Q (ditto for Q)
> >
> > P-stack <- copy P-num-inputs from input stack
> > Q-stack <- copy Q-num-inputs " "
> >
> > exec P on P-init-stack
> > exec Q on Q-init-stack
> >
> > wait for _both_ P and Q to finish
> >
> > append P-num-outputs from final P-stack to input stack
> > append Q-num-outputs " "
>
> Ah. <the light dawns> I read your original statement as saying that the
> program [ [P1] [P2] fork ] would be defined as yielding the same result as the
> program [ P1 P2 ], which is not the case. I tried to write a sequential
> program which yields the same result but I couldn't see how.
>
> The "analysis if P to determine arity" is provably non-computable, btw.

That actually depends on how you define arity. I didn't want to
complicate my original explanation too much, but all that the compiler
really needs to know is the "shape" of the arity, not neccesarily the
concrete number. For example,

[pop] [[pop] times] ifte

has "shape" pop (pop | pop*)

| means alternation
* means zero or more

* indicates the presence of a "region" - a part of the stack that can
grow to an arbitrary size, but for the purpose of something like fork,
you treat the whole region as a single "value".

This is just control flow analysis and it is definetly computable.

> It's a special case of the halting problem. I can write a program which pops
> a number of values off the stack and then either finishes (i.e. halts), or
> loops around and pops some more. How can you tell, statically, when (and if) it
> will halt and thus its arity? You can't, and there's a proof of this.

Right I can't, but I don't need to go that far.

>
> Here's a reference I've just dug up on the 'net. I assume it's a
> correct description of the HP.
>
> http://faculty.juniata.edu/rhodes/intro/theory2.htm
>
> Of course this doesn't stop your scheme working - you just have to copy the
> whole stack.
> <thinks> It makes detemining how to construct the eventual stack difficult
> because you need to define "num outputs" for each process. Normally in Joy the
> result is the whole stack but in this case I understand you to mean some upper
> subset of the stack, but what subset? We're back to a non-computable problem
> :-(

No, the regexp scheme above is quite precise. "num-outputs" can include
some hierarchy of regions.

>
> Perhaps you're thinking that this can't be done efficiently.
> However, I don't see that fork introduces any fundamentally new problems that
> don't already come up in writing an optimising compiler.
>
> I think the compiler gets a relatively easy ride because all the
> clever stuff is explicit (other than analysis for arity). The programmer has a
> harder time because they have to decide which words can usefully be executed
> concurrently, and to collect the results.

Right, it could be unintuitive.

>
> > Joy currently is pure (well - 'get' and 'put' are exceptions)
>
> Indeed.
>
> > > Having the whole stack passed from word to word makes the order of
> > > execution pretty rigid which limits how much laziness can be
> > > employed. I think.
> >
> > Its not rigid, a compiler can easily ignore the explicit order if it
> > determines that no data dependencies will be violated.
>
> ...and that's the difficult bit.
>
> The stack is effectively a big piece of global data which is passed to every
> word. Because almost every word alters the stack, there is dependence between
> most words. The most obviously helpful word here is 'dip' but you need to know
> the arity of all the preceeding words to safely execute it
> out-of-order.

Not at all, again you need to think in terms of "shape" and you'll see
that the stack can always be delineated into reachable and unreachable
parts. Moreover, most words have a very small dependance.

>
> I suspect that stack based languages are inherently strict, rather than lazy,
> but this is bare-faced assertion on my part and I'd be pleased to be shown to
> be incorrect.

I think side-effects are a much more serious issue here (especially
foreign functions which the compiler has to treat as black boxes).


--
Louis.