type checking
wtanksley@bigfoot.com — 2000-05-12 23:28:50
From:
iepos@... [mailto:
iepos@...]
>By the way, there's one thing I'd like to bring up. While fiddling
>with my interpreter, I've been getting a lot of mysterious
>Segmentation Faults because of accidentally running things like
>"call", "dip", and "if" with integers on the stack instead of programs.
Right...
>Thus, I've been thinking about adding some type-checking. But,
>I've run into a brick wall. I realized that if we have branching,
>it is sometimes impossible to tell at compile-time what the types
>of things on the stack will be.
Not entirely correct. It's possible to require nearly perfect compile-time
type knowledge, although it requires the programmer to take a little more
care in designing his programs.
>I'm just curious to know what you suggest to remedy this. Here are
>some possible policies the system could have, if it tries to
>find out the type of a thing on the stack but can't, yet would
>like to know because it is getting ready to execute, say, a "+" and
>wants to be sure it's two ints on the stack:
> 1) Emit an error message that the programmer is too sneaky.
> 2) Install a runtime check to make sure the types are right
> (this would involve carrying around some type tags).
> 3) Don't do any checks, and just hope we're okay.
>Right now, I'm leaning toward #1. However, I'm pretty this means
>that there will be some cases that the programmer will be absolutely
>sure is program is safe, yet the system rejects it because it is
>not smart enough. My guess is that this could happen regardless
>of how smart the system is.
I'm afraid you're right. Not only is it possible that the system can't ever
be smart enough, but it's also possible that no human could ever be smart
enough. There's no way of knowing.
So we just emit a safe error message. See ML for examples.
We should also make provision for dynamic typing; there's a lot of times
when it's needed. Fortunately, this is very easily handled by, for each
type named "foo", having the system add a type named "foo-dynamic", and
adding a couple for words to do things like add type information to a static
type "dynamic ( foo -- foo-dynamic )", return a type signature "typeof (
foo-dynamic -- type-signature-foo )".
Have I explained the compile-type-stack method of compile time type checking
and polymorphism? It's very nice, and fits in elegantly with Forth.
>- "iepos" (Brent Kerby)
-Billy
srenner@mail.ru — 2000-05-13 01:51:23
Dynamic type checking is not free, but I think that any reasonable Joy
implementation has to include it. You don't want the programmer to
"have to take reasonable care" to avoid a "mysterious segmentation
fault." Forth has often been associated with very low-level (close to
the hardware) programming. I can see the point of a Joy-like language
written in assembler and running on an embedded system. Type checking
might be out of place. At any higher level, it is needed.
Oberon has dynamic type checking which only costs if it is used. Say
that our Joy-like VM is to execute a word (literal, box, whatever). If
the word is "+" or "dup" it must be performed. If the box is "3" it
must be pushed onto the stack. Of course to Joy this is not two kinds
of thing. Both are functions of type stack-->stack. But the VM
certainly has to make a distinction. There are a limited number of
"ops" like +, dup, and i, perhaps including some user-defined ones.
But there are millions of integers. Each op can be associated with a
function in Oberon, but there would be no point in constructing a
function for each integer encountered that pushed the integer onto the
stack. The VM looks like this:
PROCEDURE VM(VAR word, stack: Box); (* the stack is just another box
== chain of boxes*)
VAR top: Box;
BEGIN
REPEAT
IF word IS opBox THEN (* IS: Type test *)
WITH word:opBox DO word.op END; (* WITH: Type guard *)
ELSE
top := word.Clone; (*clone (copy) of word *)
top.next := stack;
stack := top; (*copy of word is on top of the stack*)
END;
word := word.next;
UNTIL word = NIL (*No more boxes to execute *)
END;
Ops get executed, everything else gets pushed. Remember the
multiple-VMs-as-dataflow-threads idea? That model would work best with
a VM object instead of a VM procedure (object == POINTER TO RECORD, so
that the stack can be an instance variable. That way, multiple VMs are
instances with different stacks.) But that complication aside, the
above procedure is actually THE WHOLE THING. To get Joy, one would
write all the ops, including code to handle different types, a scanner
for input files, and other "frills". But the above procedure really is
a complete virtual machine. I think.
About GC: Yes, it is true that this machine shouldn't need an external
garbage collector. The only thing that gets "lost" each time around
the loop is the word that is executed. The reason that a clone is
pushed instead of the original copy is consistency: either branch
gives us one box of garbage, which we could "collect" at that point.
(Unless the box contains an object that is referenced in the enclosing
context, in which case GC would still be needed.) Baker's linear logic
papers show that if you always clone instead of copying pointers and
always throw things away you don't need GC.
Remember that there is no cost associated with GC being present in a
system if the collector is never called. So it isn't a disadvantage
that Oberon provides GC. Like type tests and type guards, GC is paid
for when it is used.
Oberon is a little tricky about PROCEDURE type instances. I am having
difficulty assigning them to variables. Specifically, the compiler
doesn't like this when the procedures are bound to a record type. ( ==
class methods) It's probably an issue with the specific compiler, not
the language specification.
Tornadoes never come near the lake, maybe because there are no
trailers here.
sr
iepos@tunes.org — 2000-05-13 13:26:16
> >Thus, I've been thinking about adding some type-checking. But,
> >I've run into a brick wall. I realized that if we have branching,
> >it is sometimes impossible to tell at compile-time what the types
> >of things on the stack will be.
>
> Not entirely correct. It's possible to require nearly perfect compile-time
> type knowledge, although it requires the programmer to take a little more
> care in designing his programs.
Yes. That was the kind of thing I was thinking of doing... Okay,
here's one rule the system might enforce: the two branches of an
"if" must always have the same type, taking the same type inputs and
giving the same type outputs. The important thing is that after
the "if" is finished, the compiler must know what types are then on
the stack, regardless of which branch was chosen.
Okay, here's another one: when one reaches the end of a loop,
returning back to the beginning, the same type things must be on
the stack as when the loop was originally entered.
I'm not sure if these two rules are enough ...
> I'm afraid you're right. Not only is it possible that the system can't ever
> be smart enough, but it's also possible that no human could ever be smart
> enough. There's no way of knowing.
Right. Here's an example of a program that would be impossible to decide:
foo [0] [[]] if call
Suppose that "foo" is a program that pushes an unknown boolean onto
the stack. If it cannot be determined what it pushes (for instance,
because it relies on user input), then the system has no way of determining
which of the "0" or the "[]" will end up getting on the stack. If it's
a "0", then we're toast, because we're going to try to "call" it.
> So we just emit a safe error message. See ML for examples.
Yeah... ML has a pretty nice type system, I suppose. I've fiddled around
with Objective CAML, but haven't done much with it.
I guess the type rules ML enforces are similar to the ones C enforces ...
Both branches in "if" must have the same types; this is guaranteed by
the fact that "if"s in those languages never have return values, and
only take parameters from the environment. Similarly, loops do not
have return values, and only take parameters from the environment.
> We should also make provision for dynamic typing; there's a lot of times
> when it's needed. Fortunately, this is very easily handled by, for each
> type named "foo", having the system add a type named "foo-dynamic", and
> adding a couple for words to do things like add type information to a static
> type "dynamic ( foo -- foo-dynamic )", return a type signature "typeof (
> foo-dynamic -- type-signature-foo )".
Yes. In particular, this could be useful when writing an interactive
interpreter. However, it seems like you'd want just a single "dynamic"
type, rather than a bunch of them; that way, you could mix up your
new dynamic things through "if"s without the system complaining.
> Have I explained the compile-type-stack method of compile time type checking
> and polymorphism? It's very nice, and fits in elegantly with Forth.
Maybe you have ... I'm not sure ...
> >- "iepos" (Brent Kerby)
>
> -Billy
One other thing relevant to type-checking ... I think it would be useful
if the user could assert what types of things are on the stack.
Maybe something like FORTH stack-comments:
( inputtypes -- outputtypes )
Except that the system would enforce them. But, maybe we'd want to avoid
the same syntax as ordinary comments (with parentheses), since
stack-comments would no longer really be just comments.
Ahh... one other thing... We are going to need more than just a few
primitive types, such as "int", "float", and "boolean". We'll
need whole schemes of types for programs-pushing and program-popping
programs. We might use Joy's convention and say that if
int -- float
is the type of a program that takes a int and yields a float, then
-- [int -- float]
is a program that pushes such a program. We might allow the "--" to
be omitted when there are no inputs, so that we'd say "int" instead
of "-- int". Or, we could use the approach brought up earlier
of not using "--", but instead using "-" (and maybe "+"), using
something like
-int -int +int
for the type of "+", "-", "*", "/", etc.
Oh, and one other thing... what about out-of-bounds exceptions for
arrays? It would be cool if the type system could be made strong
enough so that it could be guaranteed at compile-time that array
fetches and stores were always in-bounds. But, that sounds tricky;
I don't think I'd attempt it yet. But, it's an interesting idea
to keep in mind ...
- "iepos" (Brent Kerby)
wtanksley@bigfoot.com — 2000-05-14 00:58:57
From:
iepos@... [mailto:
iepos@...]
>> >Thus, I've been thinking about adding some type-checking. But,
>> >I've run into a brick wall. I realized that if we have branching,
>> >it is sometimes impossible to tell at compile-time what the types
>> >of things on the stack will be.
>> Not entirely correct. It's possible to require nearly
>> perfect compile-time
>> type knowledge, although it requires the programmer to take
>> a little more care in designing his programs.
>Yes. That was the kind of thing I was thinking of doing... Okay,
>here's one rule the system might enforce: the two branches of an
>"if" must always have the same type, taking the same type inputs and
>giving the same type outputs. The important thing is that after
>the "if" is finished, the compiler must know what types are then on
>the stack, regardless of which branch was chosen.
That sounds reasonable.
>Okay, here's another one: when one reaches the end of a loop,
>returning back to the beginning, the same type things must be on
>the stack as when the loop was originally entered.
Close enough. Each loop or conditional word can run its own typechecking,
so that the proper constraints can be achieved. For example, some loops
have to be able to loop zero times as well as n times.
>I'm not sure if these two rules are enough ...
A more powerful version of those two typechecks would involves backtracking
until convergence. In other words, you can have an unbalanced 'if' so long
as a later conditional sets things right. I personally don't think that's a
good idea; it encourages long definitions, which is not something we want to
encourage.
>I guess the type rules ML enforces are similar to the ones C
>enforces ...
C doesn't really enforce any real type rules, does it?
>Both branches in "if" must have the same types; this is guaranteed by
>the fact that "if"s in those languages never have return values, and
>only take parameters from the environment. Similarly, loops do not
>have return values, and only take parameters from the environment.
'if's in ML do have return values. Perhaps you're thinking of the wrong
language, because I can't think of a worse comparison than one between ML
and C. ML is strongly typed; C is hardly typed at all (almost everything's
an integer). ML uses an immensely powerful backtracking engine which allows
type inference; C requires everything to be declared, and even with that it
gets things wrong.
>> We should also make provision for dynamic typing; there's a
>> lot of times
>> when it's needed. Fortunately, this is very easily handled
>> by, for each
>> type named "foo", having the system add a type named
>> "foo-dynamic", and
>> adding a couple for words to do things like add type
>> information to a static
>> type "dynamic ( foo -- foo-dynamic )", return a type
>> signature "typeof (
>> foo-dynamic -- type-signature-foo )".
>Yes. In particular, this could be useful when writing an interactive
>interpreter. However, it seems like you'd want just a single "dynamic"
>type, rather than a bunch of them; that way, you could mix up your
>new dynamic things through "if"s without the system complaining.
I forgot to mention that there's a type hierarchy. For example, both
pointers and integers are "cells", and there's a version of SWAP which will
therefore handle them. double integers are double cells, and there's a
version of SWAP to handle double cells, and another version which takes as
input a double and a single, and another one whose input is the other way
around.
An "integer-dynamic" is almost certainly a double-cell type, whereas an
"object-dynamic" is probably a single-cell type (its type info is probably
stored in the object itself). Both are subtypes of "dynamic", just as they
are also subtypes of "integer" or "object".
>> Have I explained the compile-type-stack method of compile
>> time type checking
>> and polymorphism? It's very nice, and fits in elegantly with Forth.
>Maybe you have ... I'm not sure ...
I'm lacking enthusiasm right now (very strange of me, I know), so I'm going
to point you to Dejanews and then go back to programming Omega.
http://x45.deja.com/%5bST_rn=ps]/getdoc.xp?AN=601942654&search=thread&CONTEXT=
958265382.201785353&HIT_CONTEXT=958265382.201785353&HIT_NUM=2&hitnum=12
I really, really wish I knew how to shorten that link -- I know it can be
done, but after about 15 minutes of working with it I'm at a loss. Please
don't let its length stop you from checking it out; it's a facinating, low
overhead solution for typechecking which seems to be a perfect fit for
concatenative languages.
>- "iepos" (Brent Kerby)
-Billy
iepos@tunes.org — 2000-05-14 01:47:43
> >I guess the type rules ML enforces are similar to the ones C
> >enforces ...
>
> C doesn't really enforce any real type rules, does it?
Hmm... well my "gcc" complains if I try to call an "int" or add two
functions, or pass wrong-type parameters to functions. Of course,
the rules aren't really "enforced", since you can always get around
them with a cast; I guess that's what you meant.
> >Both branches in "if" must have the same types; this is guaranteed by
> >the fact that "if"s in those languages never have return values, and
> >only take parameters from the environment. Similarly, loops do not
> >have return values, and only take parameters from the environment.
>
> 'if's in ML do have return values. Perhaps you're thinking of the wrong
> language, because I can't think of a worse comparison than one between ML
> and C.
Okay, you're right. I'd better stop talking about ML :-)
> I forgot to mention that there's a type hierarchy. For example, both
> pointers and integers are "cells", and there's a version of SWAP which will
> therefore handle them. double integers are double cells, and there's a
> version of SWAP to handle double cells, and another version which takes as
> input a double and a single, and another one whose input is the other way
> around.
Okay, that makes sense.
> >> Have I explained the compile-type-stack method of compile
> >> time type checking
> >> and polymorphism? It's very nice, and fits in elegantly with Forth.
>
> >Maybe you have ... I'm not sure ...
>
> I'm lacking enthusiasm right now (very strange of me, I know), so I'm going
> to point you to Dejanews and then go back to programming Omega.
>
> http://x45.deja.com/%5bST_rn=ps]/getdoc.xp?AN=601942654&search=thread&CONTEXT=
> 958265382.201785353&HIT_CONTEXT=958265382.201785353&HIT_NUM=2&hitnum=12
Ahh... this is the discussion on comp.lang.forth ...
> >- "iepos" (Brent Kerby)
>
> -Billy
- "iepos (Brent Kerby)