Re: [stack] Stack analysis (was Another implementation...)

Louis Madon — 2000-08-12 09:43:21

Martin Young wrote:
>
> On Aug 10, 7:40am, madonl@... wrote:
> > Martin Young wrote:
> > > The "analysis if P to determine arity" is provably non-computable, btw.
> >
> > That actually depends on how you define arity. I didn't want to
> > complicate my original explanation too much, but all that the compiler
> > really needs to know is the "shape" of the arity, not neccesarily the
> > concrete number. For example,
> >
> > [pop] [[pop] times] ifte
> >
> > has "shape" pop (pop | pop*)
> >
> > | means alternation
> > * means zero or more
> >
> > * indicates the presence of a "region" - a part of the stack that can
> > grow to an arbitrary size, but for the purpose of something like fork,
> > you treat the whole region as a single "value".
> >
> > This is just control flow analysis and it is definetly computable.
>
> I'm still a little confused. Given a programs P1 and P2:
>
> P1 == [ [ pop ] times ] ] (* pop N values off the stack *)
> P2 == [ dup [ dup ] times ] (* push N, N+1 times *)
>
> I /think/ the arity shape of these is:
>
> P1 : pop*
> P2 : push push*

Not quite, they would be:

P1: pop pop*
P2: pop push push*

If P1 were just "pop*" that would mean that under some circumstances it
could execute without popping anything of the stack. However, it always
first pops the value N of the stack, hence the shape "pop pop*".

I made a similar mistake in my earlier example, [pop] [[pop] times]
ifte, should have shape pop (pop | pop pop*). Each regexp defines a set
of strings (possibly infinite) made up of the symbols pop and push. Each
such string defines the stack effect of a particular program run. The
program could never run in a way that could not be identified with one
of those strings.

For P2, the program must consume one value, though it happens to push it
back. Nonetheless, its neccesary to retain that information so as to
record the dependance on the input value (ie. record a _path_ rather
than an end result).


> If I run these concurrently on the stack [ 7 8 9 2 ] (where 2 is the most
> recently pushed item), I will see these two stacks (S1 and S2 from P1 and P2
> respectively).
>
> S1: [ 7 ]
> S2: [ 7 8 9 2 2 2 ]

Ok.


> What would the final combined stack look like? All we know is that P1 has
> removed 0 or more items and that P2 has added zero or more. How can you tell
> through static analysis how many in each case? For simple examples you could
> decide at run-time by stack marking, I suppose, but this might be inefficient.

Static analysis can't always tell you "how many" - if it could there
would obviously be no need to use regexps or any other kind of symbolic
representation. Nonetheless it is useful. The compiler could analyse
your example like so:

(notes:
--- indicates unknown part of stack
. indicates top-of-stack (TOS)
left of TOS are active values
right of TOS are popped values with most recently popped to the
right)


first P1:

[--- . ] completely unknown stack initially
[--- . a ] pop (first element popped)
[--- . a b*] pop* (some more elements popped)


now P2:

[--- . ]
[--- . c ] pop (some element popped)
[--- d . c ] push (some element pushed)
[--- d e* . c ] push* (push some more items)


Putting the two together (with the TOS markers lined up):

[--- . a b*] P1
[--- d e* . c ] P2


Clearly, P1 pushes nothing - so assuming its free of side effects - the
compiler can just throw it out and execute P2 via nullary.


If you don't want to preserve the input stack but rather just keep the
"untouched" part of the stack (as you say below, this might be useful)
then its useful to note that:

c < a b*

In other words, every possible matching of pattern 'c' is also a
matching of the pattern 'a b*' as long as you assume a == c. Thats ok
here since the patterns define removal rather than addition of items and
a and c are singleton items.

So for this example the compiler only needs to copy c and then let P1
trim the input stack. Afterwards push c back on and run P2.
ie. [P1] [P2] fork => dup [P1] dip P2


> Consider the programs P3 and P4:
>
> P3 == [ [ [ [ [ pop ] dip ] dip ] dip ] ]
> p4 == 4
>
> These have arity shapes
>
> P4: push

Yes.


> The shape of P3 I find harder to form. What about:
>
> pop pop pop pop push push push
>
> Which describes which item was removed. One might simplify it to:
>
> pop
>
> Which states only that the stack has one less element. This is
> probably not useful.

I agree.


> Running P3 and P4 on [ 1 2 3 4 5 6 7 ] we see:
>
> S3: [ 1 2 4 5 6 7 ]
> S4: [ 1 2 3 4 5 6 7 4 ]
>
> What does the resultant stack look like?
>
> [ 1 2 4 5 6 7 4 ] Maybe? This is [1 2] which were untouched,
> followed by the
> the parts of P3 and P4's stacks above [1 2].

Actually its [1 2 3] thats untouched. Anyway, assuming thats the
desired semantics, the analysis is:


first P3:

[-- . ]
[-- . a b c d ] pop pop pop pop
[-- g f e . a b c d] push push push


now P4:

[-- . ]
[-- h . ] push


Here P4 doesn't pop anything. So P3 alone determines what gets left
untouched in the input stack (ie. nothing < a b c d). Therefore [P3]
[P4] fork => P3 P4.

Ofcourse, there was no non-determinism in this example, so it probably
wasn't all that enlightening.


> How does one achive it automatically?
>
> I could describe the above algorithm like this: find the highest item
> which neither program touched, remove everything about it. Added the
> segment of P1's result stack about the untouched item, then do the same for
> P2. This might be useful, but (a) it can only be done at runtime, and (b)
> it doesn't require you to know anything about the shape of P1 and P2's
> arities.

Ofcourse you could do it all at runtime but you can improve efficiency
with some static analysis. When you said the problem is "undecidable"
you implied that everything _has_ to be done at runtime - I'm saying
that its not so. While you can't neccesarily eliminate the runtime cost
completely, you can improve it, perhaps significantly. In the above
examples, the overhead was eliminated completely.


> I assume that you have an algorithm other than this. What is it?

Not specifically for fork no. But here's how it might be approached:

let Ij, Oj denote the input and output shapes of Pj respectively

case 1: I1 is nothing.
[P1] [P2] fork => P2 <save O2> P1 <restore O2>

case 2: I2 is nothing.
[P1] [P2] fork => P1 P2

case 3: I1 < I2.
[P1] [P2] fork
=> <save I1> P2 <save O2> <restore I1> P1 <restore O2>

case 4: I2 < I1.
[P1] [P2] fork => <save I2> P1 <restore I2> P2

otherwise: do it all at runtime (eg. using your algorithm)

Here x < y denotes language inclusion (ie. everything matchable by x is
also matchable by y but not neccesarily vice versa).
The save/restore ops are implemented as simple copies if the shape does
not contain any asterated terms. Otherwise, do a rewrite of the
associated program so that as it executes it also copies.


>
> > No, the regexp scheme above is quite precise. "num-outputs" can include
> > some hierarchy of regions.
>
> Hrm, but how do you match items on the stack to parts of the regexp?
> The expression pop* push* pop* can be satified numerous ways by the
> stack
> [ 1 2 3 4 ].

You seem to be thinking in terms of runtime rather than compile time.
The regexps don't even exist at runtime. They are a tool of analysis.
They could be used to infer the shapes of values/regions at compile time
and reason about some relationships between such shapes. Many cases will
be decidable this way even if this is not so generally. If undecidable,
they may be able to help to determine a more efficient runtime strategy.
Regexps are by no means the only way to do an analysis, just an approach
I'm finding useful in my compiler.


> > Not at all, again you need to think in terms of "shape" and you'll
> see that the stack can always be delineated into reachable and
> unreachable parts. Moreover, most words have a very small dependance.
>
> But surely a shape of pop* push* can reach any part of the stack?
> Any word which performs a number of stack operations determined by a number on
> the stack will have a shape of this form, and it's actualy effect on the stack
> is undecidable.

Ok, I went a bit too far in that statement. Nonetheless, my basic point
is that undecidability of the _exact_ stack effect does not mean you
can't optimise.
Lets say we need to preserve the popped elements in pop* push*. Just
rewrite the loop that does pop* to copy each value somewhere. This is in
effect a runtime solution and represents the worst case scenario. Now
lets say, we want to compare "pop* push*" with "pop pop pop" for the
purposes of optimising fork. Here we only need to preserve at most the
top three values, which is more efficient. We didn't need to know how
large pop* was.

Its also possible to make the analysis more effective by letting the
pop/push symbols in the regexp specify types. For example, instead of
pop* push* we might have pop-string* push-int*. Then, we might actually
be able to place an upper-bound on the number of elements popped by
looking at how many strings (including string-regions, eg s* where s is
a string type) are on the top of the symbolic stack.

Generally, the more detailed the analysis, the more you can discover at
compile time.


--
Louis.