infix syntax - iverson notation

stevan apter — 2005-06-26 14:09:58

i hear that dan is working on an infix parser for factor.
here's a short introduction to the notation invented by
ken iverson, and used in APL, J, and K. it is independent
of the vector semantics of those languages.


a token has type noun (n), unary (n), verb (v), or adverb (a).

nouns are data, like

12
"foo"
a.b
[10 20 30]

verbs are operators which take two nouns and return a noun,
like

+

unaries are operators which take one noun and return one
noun. in K, a unary is written with a trailing ":", like

+: (+: is transpose)

except where adverbs are involved, the parser given below
can determine by context whether + is a verb or a unary.
you only really need to specify the unary form when passing
the operator to an adverb, since context may not be available
at parse-time.

adverbs are operators which take and return verbs and
unaries, like

+/ (/ takes + and returns summation)
+:' (' takes +: and returns transpose-each)

a tokenized expression is therefore a sequence of typed
tokens:

2+3 -> n:2 v:+ n:3

Infix expressions are parsed as:

(2+)3

that is, verb + is curried on noun 2 and the result --
a unary verb -- is applied to the noun 3. The derived
category of a curried verb is u (unary).

parsing an expression requires two tables: a binding-
strength table B, which gives the binding strengths of
category pairs, and a category-result table C, which
gives the category of the result of binding two objects.

except for the hierarchy implied by B, there is no
precedence relation defined for operators in an iverson
language. parentheses are used, infrequently, to redirect
order of execution.

the algorithm A -- developed by the IBM research scientists
Bunda and Gerth -- works this way:

(a b) is notation for binding a and b.

take an expression E. starting at the right, scan left.

if E is empty then return E.

if E consists of a single element then return E.

if E consists of two elements only:

E = a b

then return

E' = C(a,b):(a b).

else E is:

E = .. a b c d e f

scanning from the right, bind the first pair E[i] and E[i+1]
which satisfies the following condition:

E[i] is a noun and E[i+1] is a noun OR
B(E[i],E[i+1]) > B(E[i-1],E[i])

E' might now look like this:

.. a b C(c,d):(c d) e f

so apply A to E'.

For example:

2 + - 4 * foo 5
n v v n v n n

2 + - 4 * (foo 5) / because foo and 5 are nouns
n v v n v n

2 + - (4*) (foo 5)
n v v u n

2 + - ((4*)(foo 5))
n v v n

2 + (-((4*)(foo 5)))
n v n

(2+) (-((4*)(foo 5)))
u n

((2+)(-((4*)(foo 5))))
n

For K, the binding-strength table B is:

n v u a
- - - -
n 0 4 . 5
v 2 1 1 5
u 3 1 1 5
a . . . .

and the category-result table C is:

n v u a
- - - -
n n u . v
v n v v v
u n v v v
a . . . .

e.g. the parse tree for 2++/!3-4*5 is:

+-------+-------+
| |
+-+-+ +-----+------+
| | | |
v:+ n:2 +-+--+ +----+-----+
| | | |
a:/ v:+ u:!: +----+-----+
| |
+-+-+ +--+--+
| | | |
v:- n:3 +-+-+ n:5
| |
v:* n:4

other iverson-languages use different tables and
different sets of basic categories. for example,
in APL and J, there are conjunctions. a conjunction
take a pair of verbs/unaries and returns a verb/
unary. also, the noun-noun modification to the B-G
algorithm is needed to handle right-associativity
of function-application and indexing in K.

notice also that the K tables provide for expressions
like

2+
-+

the first expression evaluates to a unary, while the
second, which has the form v v, evaluates to a verb,
namely, the unary which applies unary - to the result
of adding two nouns.

although expressions in iverson notation are parsed
right-to-left, they are read left-to-right. In nearly
every instance, it is possible to directly transliterate
into english:

2++/!3-4*5

2 plus (the result of) sum over (the result of) 3 -
(the result) of 4 * 5.

ori_berger — 2005-06-26 22:43:08

Stevan, I think this description is wonderful.
Just to make sure I'm following everything
correctly, a few questions:

--- In concatenative@yahoogroups.com, "stevan apter" <sa@n...> wrote:

..snip
> a token has type noun (n), unary (n), verb (v), or adverb (a).

I assume unary should be "u", based on the explanation below.
But why don't you call it monadic like Ken, Arthur and everyone
else always does (and you did to -- having looked at your PBG
for K a long time ago ...?)

..snip
> the algorithm A -- developed by the IBM research scientists
> Bunda and Gerth -- works this way:

If I understand the PBG algorithm correctly, this is what
the Dragon book calls "operator precedence grammar" or
"operator grammar". Is that correct? (2nd edition: sec
4.6, pg 203)

Ori.