5 concat L's: Factor (draft)

Slava Pestov — 2004-09-13 02:49:26

Hi everybody,

I apologize for the very long time it took to prepare this. This is a
draft, and I intend to change a lot of things before the final version.
Pretty much everything I wanted to describe is in there, except
continuations. Feedback is welcome.

Slava

------8<------

THE CONCATENATIVE LANGUAGE FACTOR

* Introduction

Factor supports various data types; atomic types include numbers of
various kinds, strings of characters, and booleans. Compound data types
include lists consisting of cons cells, vectors, and string buffers.

Factor encourages programming in a functional style where new objects
are returned and input parameters remain unmodified, but does not
enforce this. No manifest type declarations are necessary, and all data
types use exactly one slot each on the stack (unlike, say, FORTH).

The internal representation of a Factor program is a linked list. Linked
lists that are to be executed are referred to as ``quotations.'' The
interpreter iterates the list, executing words, and pushing all other
types of objects on the data stack. A word is a unique data type because
it can be executed. Words come in two varieties: primitive and compound.
Primitive words have an implementation coded in the host language (C or
Java). Compound words are executed by invoking the interpreter
recursively on their definition, which is also a linked list.

* A note about code examples

Factor words are separated out into multiple ``vocabularies''. Each code
example given here is preceeded with a series of declarations, such as
the following:

USE: math
USE: streams

When entering code at the interactive interpreter loop, most
vocabularies are already in the search path, and the USE: declarations
can be omitted. However, in a source file they must all be specified, by
convention at the beginning of the file.

* Control flow

Control flow rests on two basic concepts: recursion, and branching.
Words with compound definitions may refer to themselves, and there is
exactly one primitive for performing conditional execution:

USE: combinators

1 10 < [ "10 is less than 1." print ] [ "whoa!" print ] ifte
==> 10 is less than 1.

Here is an example of a word that uses these two concepts:

: contains? ( element list -- remainder )
#! If the proper list contains the element, push the
#! remainder of the list, starting from the cell whose car
#! is elem. Otherwise push f.
dup [
2dup car = [ nip ] [ cdr contains? ] ifte
] [
2drop f
] ifte ;

An example:

USE: lists

3 [ 1 2 3 4 ] contains?
==> [ 3 4 ]

5 [ 1 2 3 4 ] contains?
==> f

It recurses down the list, until it reaches the end, in which case the
outer ifte's 'false' branch is executed.

A quick overview of the words used here, along with their stack effects:

Shuffle words:

dup ( x -- x x )
nip ( x y -- y )
2dup ( x y -- x y x y )
2drop ( x y -- )

Linked list deconstruction:

car ( [ x | y ] -- x )
cdr ( [ x | y ] -- y ) - push the "tail" of a list.

Equality:

= ( x y -- ? )

More complicated control flow constructs, such as loops and higher order
functions, are usually built with the help of another primitive that
simply executes a quotation at the top of the stack, removing it from
the stack:

USE: math
USE: prettyprint

[ 2 2 + . ] call
==> 4

Here is an example of a word that applies a quotation to each element of
a list. Note that it uses 'call' to execute the given quotation:

: each ( list quotation -- )
#! Push each element of a proper list in turn, and apply a
#! quotation to each element.
#!
#! In order to compile, the quotation must consume one more
#! value than it produces.
over [
>r uncons r> tuck >r >r call r> r> each
] [
2drop
] ifte ;

An example:

USE: lists
USE: math
USE: stack

[ 1 2 3 4 ] [ dup * . ] each
==> 1
4
9
16

A quick overview of the words used here:

Printing top of stack:

. ( x -- ) print top of stack in a form that is valid Factor syntax.

Shuffle words:

over ( x y -- x y x )
tuck ( x y -- y x y )
>r ( x -- r:x ) - move top of data stack to/from 'extra hand'.
r> ( r:x -- x )

Writing >r foo r> is analogous to '[ foo ] dip' in Joy. Occurrences of
>r and r> must be balanced within a single word definition.

Linked list deconstruction:

uncons ( [ x | y ] -- x y )

* Variables

Factor supports a notion of ``variables''. Whereas the stack is used for
transient, intermediate values, variables are used for more permanent
data.

Variables are retreived and stored using the 'get' and 'set' words. For
example:

USE: math
USE: namespaces
USE: prettyprint

"~" get .
==> "/home/slava"

5 "x" set
"x" get 2 * .
==> 10

The set of available variables is determined using ``dynamic scope''.
A ``namespace'' is a set of variable name/value pairs. Namespaces can be
pushed onto the ``name stack'', and later popped. The 'get' word
searches all namespaces on the namestack in turn. The 'set' word stores
a variable value into the namespace at the top of the name stack.

While it is possible to push/pop the namestack directly using the words
>n and n>, most of the time using the 'bind' combinator is more
desirable.

Good examples of namespace use are found in the I/O system.

Factor provides two sets of words for working with I/O streams: words
whose stream operand is specified on the stack (freadln, fwrite,
fprint...) and words that use the standard input/output stream (read,
write, print...).

An I/O stream is a namespace with a slot for each I/O operation. I/O
operations taking an explicit stream operand are all defined as follows:

: freadln ( stream -- string )
[ "freadln" get call ] bind ;

: fwrite ( string stream -- )
[ "fwrite" get call ] bind ;

: fclose ( stream -- )
[ "fclose" get call ] bind ;

( ... et cetera )

The second set of I/O operations, whose stream is the implicit 'standard
input/output' stream, are defined as follows:

: read ( -- string )
"stdio" get freadln ;

: write ( string -- )
"stdio" get fwrite ;

( ... et cetera )

In the global namespace, the 'stdio' variable corresponds to a stream
whose operations read/write from the standard file descriptors 0 and 1.

However, the 'with-stream' combinator provides a way to rebind the
standard input/output stream for the duration of the execution of a
single quotation. The following example writes the source of a word
definition to a file named 'definition.txt':

USE: prettyprint
USE: streams

"definition.txt" <filebw> [ "with-stream" see ] with-stream

The 'with-stream' word is implemented by pushing a new namespace on the
namestack, setting the 'stdio' variable therein, and execution the given
quotation.

phimvt@lurac.latrobe.edu.au — 2004-09-17 03:56:36

On Sun, 12 Sep 2004, Slava Pestov wrote:

> Hi everybody,
>
> I apologize for the very long time it took to prepare this. This is a
> draft, and I intend to change a lot of things before the final version.
> Pretty much everything I wanted to describe is in there, except
> continuations. Feedback is welcome.
>
> Slava

[..]

Thanks for this. Since you marked it "draft" and asked for
comments, I won't put it into the final version just yet.
(I did not get any criticism for my two bits when I sent them.)

Keep it up.

- Manfred

stevan apter — 2004-09-18 20:02:10

here's a draft of the XY paper, with one section -- Programming
in XY -- incomplete. at this point, i've shut down further work
on cK to concentrate on XY, so i won't be writing anything on
that language.

the permanent link for the paper below (which i will continue
to work on) is:

http://www.nsl.com/k/xy/xy.txt

comments, suggestions, and criticism welcome.

--------------------

THE CONCATENATIVE PROGRAMMING LANGUAGE XY
*
(FIRST DRAFT)

* Introduction

XY is a concatenative programming language with roots in Joy and K.

The elements of XY are unary functions, each of which takes and returns
a pair of the form:

[stack queue]

The stack is a list which represents the past of the computation.

The queue is a list which represents the future of the computation.

XY is written in K in continuation-passing style.

The datatypes of XY are the atoms and lists of K: integers, floats,
characters, symbols, dictionary, null, and lists.

The 20 K verbs are the primitive functions of XY:

~ ! @ # $ % ^ & * - _ = + | : < , > . ?

If v is a verb, then v denotes the associated dyadic function, v:
denotes the the associated monad, and v. denotes the commute of the
dyad. For example,

x y - x minus y
x -: negate x
x y -. y minus x

* Lists and Quotations

XY supports both lazy lists ("quotations") and eager lists.

Quotations are written using square brackets:

[2 3 +]
[2 3 +]

Eager lists are written using parentheses:

(2 3 +)
5

An eager list such as (2 3 +) is recursively transformed at parse-time
into the expression

[2 3 +] /

which pushes [2 3 +] onto the stack, and then evaluates it.

* Evaluation

In the course of execution, XY has two objects: a list of values X
and a list of tokens z^Y, where z is the next token to be processed
and Y is a list of the remaining tokens. X is the result stack and
Y is the input queue. Evaluation of z consists of applying z to the
list [X Y]. The result of the application is a new stack X' and a
new queue Y'.

z is either an atom or a list. An atom is either an integer, a float,
a character, a dictionary, null, or a symbol. A symbol is either
defined or undefined.

If z is anything other than a defined symbol, its evaluation rule is:

[X z^Y] -> [X^z Y]

That is, z is removed from the queue and appended to the stack.

XY maintains a dictionary W which maps symbols to evaluation rules.
For each defined symbol, there is an evaluation rule in W. For example,
the evaluation rule for the primitive + is:

[X^x1^x2 Y] -> [X^x1+x2 Y]

That is, + takes X^x1^x2 and Y, pops x1 and x2, adds them, appends the
result to X, and returns the new stack and the unaltered queue Y.

* Core

<- stack [X^z Y] -> [z Y]

<- replaces the stack with the item found at the top
of the stack:

10 20 [2 3 +] <-
2 3 +

<- is Joy's "unstack" operator.

-> queue [X^z Y] -> [X z]

-> replaces the queue with the item found at the top
of the stack:

10 20 [2 3 +] -> 30 40 50
10 20 5

-> has no analogue in Joy.

/ use [X^z Y] -> [X z^Y]

/ prepends to the queue the item found at the top of the stack.

10 20 [2 3 +] / 30 40 50
10 20 5 30 40 50

/ is Joy's "i" combinator.

\ mention [X z^Y] -> [X^z Y]

\ moves the item found at the head of the queue to the top of
the stack:

10 20 \ + 30 40
10 20 + 30 40

\ has no analogue in Joy.

; define [X name^..code..^;^Y] -> [X Y]

; add-mul + * ;
2 3 4 add-mul
14

All definitions in XY are inline.

{ stack pattern [X template^..code..^}^Y] -> [X^Z Y]

{ matches names in the template to elements on the stack, and
then substitutes those values for matching names in the code.
_x denotes the stack, and _y the queue, as they are at run-time.
The result of the subtitution Z is appended to the stack:

10 20 30 { [a b] a b b a }
10 20 30 30 20

{{ queue pattern [X template^..code..^}}^Y] -> [X Z^Y]

{{ matches and substitutes like {, but instead of appending
the Z to the stack, it prepends it to the queue:

10 20 30 {{ [a b] a b + a * }}
10 1000

* Auxilliary

The following words are included in the current implementation, but are not
used.

.a amend shallow amend with assignment: @[x;y;:;z]
.d dmend deep amend with assignment: .[x;y;:;z]

.g get "name" .g -> value of name
.s set value "name" .s -> associate name with value

.p parse invoke the XY parser
.r represent invoke the XY representation function

.i input accept input to the stack
.o output produce output from the stack

.e execute alternate evaluate P, if that fails evaluate Q
.xy XY evaluate X and z^Y.

* Scripts

XY supports scripts. An XY script is a text file which contains XY code.
Typically, the filename ends in ".xy", although this is not necessary.

On initialization, XY loads the system script xy.xy, which contains core
definitions, including stack manipulation words like dup and swap, K adverbs
like over, each, and prior, and Joy combinators like linrec and dip.

At this point, XY does not support comments.

* Programming in XY

This section will contain examples.

* Implementation

Interpretation in XY consists of the standard read-eval-print loop. The
three steps correspond to three functions: the parser p, the evaluator e,
and the function r, which takes a list (either the stack or the queue) and
represents it as a string, suitable for printing.

The evaluator takes a stack x and a queue to be evaluated y:

e:{({if[T>0;t[x;y]];a[*y,();x,()](1_ y),()}.)/(x;y)}

e repeatedly evaluates x and y. At each step, it checks to see whether
the trace variable T is positive. If so, it displays the current states
of x and y:

t:{if[(#x)|#y;`0:(40$r x)," : ",-40$r y;:[#y;if[#0:`;T::0];`0:,""]]}

It then applies the following algorithm: if the queue is empty or the
stack is null, it returns the current stack and queue; else if the next
element to be evaluated is not a defined symbol, it appends it to the stack
and returns that and the queue; else it applies the evaluation rule for
that symbol:

a:{:[(~#z)&_n~x;(y;z);~4=4:x;(y,,x;z);(#**W)=i:W[0;0]?x;(y,x;z);W[1;i][y;z]]}

* Open Questions

- Can { and {{ be defined in terms of the other core words?

* Links

XY: http://www.nsl.com/papers/xy.htm.
K: http://www.kx.com

stevan apter — 2004-09-19 12:45:09

i've added comments to XY:

// this comment ends at the newline

notationally, this is rather nice, since not only does it echo
k's use of / for commentary, but it complements the use of \
for atomic quotation.

that is, the word \ takes the next item on the queue and moves
it to the stack:

2 3 \ + 4 5
2 3 + 4 5

in queue programming (as opposed to stack programming) this
comes up with some frequency, so i decided to support a syntactic
form as well:

2 3 \+ 4 5
2 3 + 4 5

prefix anything with \ to move it to the stack.

this means that \\ can't be a word in XY, since it already
means "move \ to the stack."