Hi!

Consider the following example. Is this language concatenative?

The set of terminal symbols T = { 'a', '(', ')' }.

The grammar G_0 has the following production rules:
S -> e # empty string
S -> S S
S -> ( S1 )
S1 -> a | a S1

The grammar G_0 generates the language L \subset T*.
Some elements of L are
e, (a), (a)(aa), (aaa)(a), etc.

We have LL = L, i.e. the concatenation of two elements of L belongs to L.

It seems to me that L is a concatenative language, but it doesn't satisfy
the Christopher's definition (there is no concatenative grammar which
generates this language - the proof is a simple exercise).

What do you think?

Michael Nedzelsky

Very nice counter-example!
Any ideas on how to refine the definition?

-Christopher

On 4/29/07, Michael Nedzelsky <MichaelNedzelsky@...> wrote:
>
> Hi!
>
> Consider the following example. Is this language concatenative?
>
> The set of terminal symbols T = { 'a', '(', ')' }.
>
> The grammar G_0 has the following production rules:
> S -> e # empty string
> S -> S S
> S -> ( S1 )
> S1 -> a | a S1
>
> The grammar G_0 generates the language L \subset T*.
> Some elements of L are
> e, (a), (a)(aa), (aaa)(a), etc.
>
> We have LL = L, i.e. the concatenation of two elements of L belongs to L.
>
> It seems to me that L is a concatenative language, but it doesn't satisfy
> the Christopher's definition (there is no concatenative grammar which
> generates this language - the proof is a simple exercise).
>
> What do you think?
>
> Michael Nedzelsky
>
>
>

[Non-text portions of this message have been removed]

Michael Nedzelsky <MichaelNedzelsky@...> wrote:

> Hi!
> Consider the following example. Is this language concatenative?

Hmm. It seems that it could be. It's not very different from a flat
concatenative language. It kind of looks like you've placed the lexer
inside the parser (there's nothing wrong with that). Your language
would look more like a typical flat concatenative language (but would
still fail this definition) if it accepted any letter of the alphabet
inside the parens.

Another obvious example of a language disqualified under the strict
reading of this definition would be a normal concatenative language
with literals, or with shuffle notation.

We seem to be getting closer, though.

> Michael Nedzelsky

-Billy

Let T be a non-empty set of terminal symbols.
Let T* denotes the set of all strings over alphabet T, where e \in T* denotes
the empty string.
The word 'program' means lement of L.
Definition:
Simple program is a program which is not a concatenation of two non-empty
programs.

The language L \subset T* is concatenative if and only if it satisfies the
following requirements:
1) The empty string is a program:
e \in L
2) The concatenation of two programs is a program:
(\forall x, y \in T*) x,y \in L --> xy \in L
3) The unique factorization property holds for programs. That is, every
program other than empty program can be factored into simple programs in
exactly one way.

The third requirements can be difficult to verify, so the following definition
can be useful.

The language L \subset T* is strict left (right) concatenative if and only if
it satisfies the following requirements:
1) The empty string is a program:
e \in L
2) The concatenation of two programs is a program:
(\forall x, y \in T*) x,y \in L --> xy \in L
3) (P = XY) /\ (P, X are programs) --> Y is a program. # for left
(P = XY) /\ (P, Y are programs) --> X is a program. # for right

The unique factorization property holds for strict left (right) concatenative
languages.

It seems to me that the Cat and Joy are both strict left and right
concatenative languages according to this defintion.

Michael Nedzelsky

Michael Nedzelsky <MichaelNedzelsky@...> wrote:

> The language L \subset T* is concatenative if and only if it satisfies the
> following requirements:

Nice. Actually, I would point out that this is a definition for flat
concatenative languages; a concatenative language may not be
factorable in the manner you're describing.

I also need to point out that this describes only the syntax, whereas
the traditional usage also contrains the semantics.

> It seems to me that the Cat and Joy are both strict left and right
> concatenative languages according to this defintion.

Quotations make it impossible to textually factor a program into
primitives, don't they?

> Michael Nedzelsky

-Billy

On Mon, 30 Apr 2007 06:19 pm, William Tanksley, Jr wrote:

> Michael Nedzelsky <MichaelNedzelsky@...> wrote:
> > The language L \subset T* is concatenative if and only if it satisfies
> > the following requirements:
>
> Nice. Actually, I would point out that this is a definition for flat
> concatenative languages; a concatenative language may not be
> factorable in the manner you're describing.

I disagree. For example:
[1 2 +] 3
is a concatenation of two simple programs
[1 2 +]
3
The inner structure of the simple program can be complex.
Of course, this definition of concatenative language covers only the first
layer.

> I also need to point out that this describes only the syntax, whereas
> the traditional usage also contrains the semantics.

I am aware of it. At first I want to see what can be achieved with syntax
only.

> > It seems to me that the Cat and Joy are both strict left and right
> > concatenative languages according to this defintion.
>
> Quotations make it impossible to textually factor a program into
> primitives, don't they?

Quotation is a simple program - it is not a concatenation of non-empty
programs.

Michael Nedzelsky

Michael Nedzelsky <MichaelNedzelsky@...> wrote:

> > Quotations make it impossible to textually factor a program into
> > primitives, don't they?
> Quotation is a simple program - it is not a concatenation of non-empty
> programs.

But a quotation is not a member of T*, is it? Therefore it's not a
"simple program".

> Michael Nedzelsky

-Billy

On Sun, 29 Apr 2007, Michael Nedzelsky wrote:

> Consider the following example. Is this language concatenative?

A couple of comments...

* There is the lexical level and the grammatical level, and care must
be taken not to confuse the two. eg. The number token
"-2.1e-10" is not postfix concatenative.

* In Joy [ and ] seem to have a special status. Namely the
concatenative property only seems to work if you treat [ and ] and
everything between a matching pair as a single entity.

* Chris says, "The semantics of a concatenative language are that
each term is a state function (e.g. from a set of stacks to a set
of stacks), and the concatenation of two terms implies the
composition of those functions."

I'm not convinced the word "stack" needs to appear in the
definition at all.

Any fragment of a concatenative language is a function mapping a
set X onto itself. In Joy X is the set of stacks. I'm not convinced
that is the only valid domain for concatenative languages.

I would also hypothesize a family of languages where "string
concatenation represents XXX" where XXX is some other useful
mathematical operation apart from functional composition.

Conversely another family exists where "functional composition is
represented by SOME TRIVIAL OPERATION ON FAMILIAR PRIMITIVE
DATATYPE". eg. List append, tree join...

Of course the whole thing can be generalized in both directions to
create families of languages where "BASIC MATHEMATICAL OPERATIONS
are represented by TRIVIAL OPERATIONS on FAMILIAR PRIMITIVE
DATATYPES".

So one could compile..
* a long list of all basic and interesting mathematical operations
* and there algebraic properties
and then
* another list of primitive datatypes and
* basic operations on those datatypes and
the algebraic properties
and overlay the two wherever the algebras match and create a new language.

And then select from that vast family those languages (that language?) which
affords the greatest simplifications.

The answer may well be "represent functional composition by string
concatenation", ie. Joy.

Ok, so let me distill this wild lateral thinking down a bit....

Question 1:

Is "string concatenation" or "functional composition" or the
representation by one of the other the fundamental thing about
concatenative languages? Or is the broader concept "isomorphisms
between algebras of higher mathematical constructs and algebras of
primitive computer datatypes lead to
(deceptive?|interesting?|useful?) simplifications"

Question 1b :

Can you propose an interesting "non-von thunnian" language (like a
non-euclidean space) where we either....

* Use other primitive datatype / operation instead of strings and
concatenation..

* Or represent some other basic mathematical operation by concatenation.

Candidates I can think of off the top of my head is
* multiplication "ab === a * b"
* convolution

and / or

* operates non-trivially and interestingly on something other than stacks.

Question 2:

Are stacks the only meaningful domain and range for the functions we
are composing?

John Carter Phone : (64)(3) 358 6639
Tait Electronics Fax : (64)(3) 359 4632
PO Box 1645 Christchurch Email : john.carter@...
New Zealand

> But a quotation is not a member of T*, is it? Therefore it's not a
> "simple program".

It is a member if T*!
T* means the set of all sequences of terminal symbols, and '[', ']' are
terminal symbols.

According to the description of the Joy syntax in
http://www.latrobe.edu.au/philosophy/phimvt/joy/j09imp.html
we have

token ::=
reserved-character | reserved-word
| integer-constant | float-constant
| character-constant | string-constant
| atomic-symbol

Consider the grammatical level: the tokens are the terminal symbols, right?

Also the article contains the following text:
-----------------------------------------------------------------------------------------------------------------------------------
A factor can be a single token such as an atom or one of the three constants:
integer constant, character constant or string constant. A factor can also
consists of a (possibly empty) sequence of integer or character constants
enclosed in the curly brace tokens { and }. Finally, a factor can also be a
(possibly empty) term enclosed in the square bracket tokens [ and }.

factor ::=
atomic-symbol
| integer-constant | float-constant
| character-constant | string-constant
| "{" { character-constant | integer-constant } "}"
| "[" term "]"

A term consists of zero or more factors.
term ::=
{ factor }
------------------------------------------------------------------------------------------------------------------------------------
It seems the factor is exactly the simple program in my defintion. The factor
sounds better, though.

Michael Nedzelsky

Michael Nedzelsky <MichaelNedzelsky@...> wrote:

> > But a quotation is not a member of T*, is it? Therefore it's not a
> > "simple program".
> It is a member if T*!
> T* means the set of all sequences of terminal symbols, and '[', ']' are
> terminal symbols.

Oh. Then no, I don't think your definition is adequate; it admits all
ill-formed strings, such as "]]]]a]]]". Your grammar can't be parsed.

> According to the description of the Joy syntax in
> http://www.latrobe.edu.au/philosophy/phimvt/joy/j09imp.html
> we have

We're defining concatenative languages in general, not Joy in specific.

> Michael Nedzelsky

-Billy

On Tue, 1 May 2007 07:00 pm, William Tanksley, Jr wrote:

> Michael Nedzelsky <MichaelNedzelsky@...> wrote:
> > > But a quotation is not a member of T*, is it? Therefore it's not a
> > > "simple program".
> >
> > It is a member if T*!
> > T* means the set of all sequences of terminal symbols, and '[', ']' are
> > terminal symbols.
>
> Oh. Then no, I don't think your definition is adequate; it admits all
> ill-formed strings, such as "]]]]a]]]". Your grammar can't be parsed.

Which grammar??
My definition is about languages, i.e. subsets of T*.

I'll try to explain by example.

Example 1.

Terminal symbols: a, b, [, ], +, (, )
Production rules:
S --> e | a | b | S S | [ S ] | (S + S)

Let l be the language which generated by this grammar.
Then we have "]]]]a]]]" \in T*, but "]]]]a]]]" \notin L.
The set of all simple programs for language L is generated by the following
grammar
S1 --> a | b| [S] (S+S)
S --> e | a | b | S S | [ S ] | (S + S)
Examples of simple programs: a, b, [], (a+b), [a b],(+), (a+(+)), (+b)
The last three programs looks strange, but they are syntactically valid.
The language L is (syntactically) concatenative by my defintion, since all the
three requirements are satisfied.

Example 2.

Terminal symbols: a, b, [, ], +
Production rules:
S --> e | ab | ba | a | S S | [ S ]

Let l be the language which generated by this grammar.
The set of all simple programs for language L is generated by the following
grammar
S1 --> a | ab| ba | [S]
S --> e | ab | ba | a | S S | [ S ]
Examples of simple programs: a,ab, ba, [], [ab], [a]
The language L is not (syntactically) concatenative by my defintion, since the
third requirement is not satisifed: the program
aba \in L can be factored in simple two different ways: "a" "ab" and "a"
"ba".

Michael Nedzelsky

Michael Nedzelsky <MichaelNedzelsky@...> wrote:

> William Tanksley, Jr wrote:
> > Michael Nedzelsky <MichaelNedzelsky@...> wrote:
> > > > But a quotation is not a member of T*, is it? Therefore it's not a
> > > > "simple program".

> > > It is a member if T*!

> > Oh. Then no, I don't think your definition is adequate; it admits all
> > ill-formed strings, such as "]]]]a]]]". Your grammar can't be parsed.
> Which grammar??
> My definition is about languages, i.e. subsets of T*.

Oh. Okay, now I see what I've been missing. Yes, your definition is
strictly about languages, not (per se) about grammars. That's a good
thing.

> S1 --> a | b| [S] (S+S)
> S --> e | a | b | S S | [ S ] | (S + S)
> Examples of simple programs: a, b, [], (a+b), [a b],(+), (a+(+)), (+b)
> The last three programs looks strange, but they are syntactically valid.
> The language L is (syntactically) concatenative by my defintion, since all the
> three requirements are satisfied.

Yes, I see your point.

Your definition seems to have the same problem our previous
definitions had: they allow things which are unmistakably not
concatenative (your infix (S+S) sublanguage isn't concatenative).

But your definition is better than most, because you include the
"unique factorization" property, and because it is expressed in terms
of concatenation and valid splitting, both fundamental to
concatenative languages.

Perhaps I'm wrong. Should our definition allow for languages which are
perhaps not _entirely_ concatenative, but whose "top layer" is
concatenative?

> Michael Nedzelsky

-Billy

On Wed, 2 May 2007 12:34 am, William Tanksley, Jr wrote:

> Should our definition allow for languages which are
> perhaps not _entirely_ concatenative, but whose "top layer" is
> concatenative?

It seems to me (I can be wrong here), that already Joy is not _entirely_
concatenative because of quotation. I think the quotation as basic operation
has the same status as concatenation.

Michael Nedzelsky

Michael Nedzelsky scripsit:

> It seems to me (I can be wrong here), that already Joy is not _entirely_
> concatenative because of quotation. I think the quotation as basic operation
> has the same status as concatenation.

Joy does not have quotation as such. It does have literal lists, but whether
a list is executed or not depends on the surrounding program. Furthermore,
whatever can be done with literal lists can be done without them:
[a b c] is equivalent to this flat program:

"a" intern "b" intern "c" intern cons cons

--
John Cowan cowan@... http://ccil.org/~cowan
[R]eversing the apostolic precept to be all things to all men, I usually [before
Darwin] defended the tenability of the received doctrines, when I had to do
with the [evolution]ists; and stood up for the possibility of [evolution] among
the orthodox -- thereby, no doubt, increasing an already current, but quite
undeserved, reputation for needless combativeness. --T. H. Huxley

Michael Nedzelsky <MichaelNedzelsky@...> wrote:

> William Tanksley, Jr wrote:
> > Should our definition allow for languages which are
> > perhaps not _entirely_ concatenative, but whose "top layer" is
> > concatenative?

> It seems to me (I can be wrong here), that already Joy is not _entirely_
> concatenative because of quotation. I think the quotation as basic operation
> has the same status as concatenation.

Very interesting. I've felt that Joy's quotation (which is, of course,
actually a combination of literal lists and function quotation) didn't
fit in quite correctly.

I was able to design a concept I called flatness, which helped me somewhat.

But I think the ability to include program literals is obviously
useful. And there's some use in insuring that the program literals are
in the same (concatenative) language as you're coding it.

So it seems to me that a non-flat concatenative language is still
useful. And I don't see why it should be impossible to define
formally... Especially since we haven't even defined a flat
concatenative language formally. (We seem to have exceptions to every
proposed definition, although IMO several have come very close.)

> Michael Nedzelsky

-Billy

On Wed, 2 May 2007 02:12 am, John Cowan wrote:

> Michael Nedzelsky scripsit:
> > It seems to me (I can be wrong here), that already Joy is not _entirely_
> > concatenative because of quotation. I think the quotation as basic
> > operation has the same status as concatenation.
>
> Joy does not have quotation as such. It does have literal lists, but
> whether a list is executed or not depends on the surrounding program.
> Furthermore, whatever can be done with literal lists can be done without
> them:
> [a b c] is equivalent to this flat program:
>
> "a" intern "b" intern "c" intern cons cons

I think you mean the following:
"a" intern "b" intern "c" intern [] cons cons cons

It is interesting. The article
www.latrobe.edu.au/philosophy/phimvt/joy/jp-flatjoy.html
contains the definition of Floy which is subset of Joy in which quotations
are not allowed to contain more than one atom.

But if any literal list can be constructed using only string literals, intern,
[] and cons, then the subset of Floy in which the only quotation is empty
list is already sufficient. Am I missing anything here?

Michael Nedzelsky

Michael Nedzelsky scripsit:

> I think you mean the following:
> "a" intern "b" intern "c" intern [] cons cons cons

Yes, of course. That comes of posting code late at night....

> But if any literal list can be constructed using only string literals,
> intern, [] and cons, then the subset of Floy in which the only quotation
> is empty list is already sufficient. Am I missing anything here?

Only a historical fact: that it was I who added intern to Joy's
primitive words when modifying the Joy interpreter to add
general gc, floating-point support, and files, and therefore
Manfred's papers don't reflect it.

--
John Cowan http://ccil.org/~cowan cowan@...
We want more school houses and less jails; more books and less arsenals;
more learning and less vice; more constant work and less crime; more
leisure and less greed; more justice and less revenge; in fact, more of
the opportunities to cultivate our better natures. --Samuel Gompers

On Wed, 2 May 2007 02:31 am, William Tanksley, Jr wrote:

> But I think the ability to include program literals is obviously
> useful. And there's some use in insuring that the program literals are
> in the same (concatenative) language as you're coding it.
> So it seems to me that a non-flat concatenative language is still
> useful.

I completely agree.

> And I don't see why it should be impossible to define
> formally... Especially since we haven't even defined a flat
> concatenative language formally. (We seem to have exceptions to every
> proposed definition, although IMO several have come very close.)

Below is my attempt to define a flat concatenative language. It is only an
attempt, so any critical remarks are welcome.

Let T be a non-empty set of terminal symbols.
T* denotes the set of all strings over alphabet T, where e \in T* denotes
the empty string. Let L be a language over alphabet T, i.e. L \subseteq T*.

Definition:
u \in L is simple <=> (\forall x, y \in L) u = xy --> x=e \/ y=e
Simple(L) = { u \in L | u is simple } - set of all simple elements of L.

The language L \subset T* is a (syntactically) flat concatenative language if
and only if it satisfies the following requirements:
1) e \in L
2) (\forall x, y \in T*) x,y \in L --> xy \in L
3) The unique factorization property holds for elements of L. That is, every
element other than empty can be factored into simple elements in exactly one
way.
4) The set of all simple elements of L forms a regular language, i.e.
the language Simple(L) can be defined by some regular expression.

This definition is only about syntax. What can be said about semantics?

Definition.
Let L be a a (syntactically) flat concatenative language.
Let s be a mapping from L into some semantic domain M.
The mapping s defines semantics of L, so for program x \in L s(x) is a meaning
of x.
(In real situation M can be some algebraic system with some operations and
relations on base set).

The language L is a flat concatenative language with respect to semantics s if
and only if there is a mapping b: M x M -> M such that
I) for every x, y \in L s(xy) = b(s(x), s(y))
II) b is associative

Notes:
I) means that for every x, y \in L the meaning of program xy is uniquely
determined by the meaning of programs x and y
II) means that for x, y, z \in L the two ways to determine the meaning of xyz
(by meaning of xy and meaning of z, by meaning of x and meaning of yz) yield
the same result.

Michael Nedzelsky

On 1/5/07 2:41 PM, "John Carter" <john.carter@...> wrote:

> [..]
> * Chris says, "The semantics of a concatenative language are that
> each term is a state function (e.g. from a set of stacks to a set
> of stacks), and the concatenation of two terms implies the
> composition of those functions."
>
> I'm not convinced the word "stack" needs to appear in the
> definition at all.
>
> Any fragment of a concatenative language is a function mapping a
> set X onto itself. In Joy X is the set of stacks. I'm not convinced
> that is the only valid domain for concatenative languages.

Entirely agree. Stacks seem to be particularly good for implementing
unary, binary, n-ary operators of fixed arity. I did fool around once with
a queue machine instead, but did not pursue it further. Also, as John
Cowan first pointed out explicitly, in the presence of ³get² and ³put²
IO, Joy really uses a stack and a file system. As the Lisps have shown,
sequences containing sequences containing sequences ... are a truly
remarkable data structure. This is true irrespective of whether sequences
are implemented as linked lists (of one kind or another), or as arrays
in consecutive memory locations. The difference in the implementation
implies a difference in the efficiency of various operations. The Lisps
and Joy use simply linked lists, Stevan Apter¹s X and XY languages use
arrays borrowed from K (Stevan?). Anyhow, the Joy stack is of course
just a sequence.

> I would also hypothesize a family of languages where "string
> concatenation represents XXX" where XXX is some other useful
> mathematical operation apart from functional composition.
>

I have tried this, but got nowhere. Do you have anything more explicit?

>
> [..]
> So one could compile..
> * a long list of all basic and interesting mathematical operations
> * and there algebraic properties
> and then
> * another list of primitive datatypes and
> * basic operations on those datatypes and
> the algebraic properties
> and overlay the two wherever the algebras match and create a new language.
>

And importantly, ³overlay² means ³homomorphism² - the semantic meaning
function maps a syntactic algebra onto a semantic algebra. In very
non-technical language, a conjunction ³p and q² is true in the intersection
of the set of worlds in which p is true with the set of worlds in which q is
true. This maps ³and² onto ³intersection². This is possible because the
sentential operators and the set operators form Boolean algebras. In Joy
the meaning function maps the syntactic operation of concatenation onto
the semantic operation of function composition. This possible because
syntax and semantics form algebras called monoids.

I did once toy around with the far more general notion of function
composition
which is used in recursive function theory: Given a bunch of m functions
f1 f2 ...fm which all take n parameters, and one other function g which
takes
m parameters, define another function h of n parameters: Take the n
parameters
of h and feed them (concurrently, in any order) to the various f1 f2..fm.
This gives m intermediate results. Feed these into g to get the final value.
Useful? for theoretical purposes, yes, tremendously. For practical
programming,
probably not. The familiar composition of unary functions is just m=n=1.
(This general composition is definable in Joy, of course. But I have never
seen any use for it.)

>
> [..]
> Is "string concatenation" or "functional composition" or the
> representation by one of the other the fundamental thing about
> concatenative languages? Or is the broader concept "isomorphisms
> between algebras of higher mathematical constructs and algebras of
> primitive computer datatypes lead to
> (deceptive?|interesting?|useful?) simplifications"
>

I think you mean ³homomorphism². Certainly all meaning functions that I know
of seem to map (abstract) syntax homomorphically onto semantics.

>
> Question 1b :
>
> Can you propose an interesting "non-von thunnian" language (like a
> non-euclidean space) where we either....

Blush. von Neumann ...

>
> [..]
> Are stacks the only meaningful domain and range for the functions we
> are composing?

The plain old ³store² (indexed by addresses) is another possibility.

- Manfred

>

[Non-text portions of this message have been removed]

On 3/5/07 1:22 AM, "John Cowan" <cowan@...> wrote:

> Michael Nedzelsky scripsit:
>
>> > I think you mean the following:
>> > "a" intern "b" intern "c" intern [] cons cons cons
>
> Yes, of course. That comes of posting code late at night....
>
>> > But if any literal list can be constructed using only string literals,
>> > intern, [] and cons, then the subset of Floy in which the only quotation
>> > is empty list is already sufficient. Am I missing anything here?
>
> Only a historical fact: that it was I who added intern to Joy's
> primitive words when modifying the Joy interpreter to add
> general gc, floating-point support, and files, and therefore
> Manfred's papers don't reflect it.

Actually I discussed Floy after your modifications of Joy. In all modesty,
at that time I did not really grasp the possible uses of the intern
operator. After your recent post I did try out intern, with some surprises:

³1² intern ³2² intern ³3² intern ³4² intern [] cons cons cons cons
produces a nice looking list of what seem to be integers.
But now try: [dup*] map

³Anna² intern ³Bob Charles² intern ³Donald² intern [] cons cons cons
produces a nice looking list of what seem to be four names.
But now try: reverse

- Manfred

[Non-text portions of this message have been removed]

Manfred Von Thun scripsit:

> ³1² intern ³2² intern ³3² intern ³4² intern [] cons cons cons cons
> produces a nice looking list of what seem to be integers.
> But now try: [dup*] map
>
> ³Anna² intern ³Bob Charles² intern ³Donald² intern [] cons cons cons
> produces a nice looking list of what seem to be four names.
> But now try: reverse

Common Lisp solves this problem by printing such weirdly-spelled
symbols either as \1 and Bob\ Charles or as |1| and |Bob Charles|.

--
That you can cover for the plentiful John Cowan
and often gaping errors, misconstruals, http://www.ccil.org/~cowan
and disinformation in your posts cowan@...
through sheer volume -- that is another
misconception. --Mike to Peter

On 1/5/07 2:41 PM, "John Carter" <john.carter@...> wrote:

> Conversely another family exists where "functional composition is
> represented by SOME TRIVIAL OPERATION ON FAMILIAR PRIMITIVE
> DATATYPE". eg. List append, tree join...
>
> Of course the whole thing can be generalized in both directions to
> create families of languages where "BASIC MATHEMATICAL OPERATIONS
> are represented by TRIVIAL OPERATIONS on FAMILIAR PRIMITIVE
> DATATYPES".

Your wide-ranging posting was so suggestive that I have another reply to
make. This was in part prompted by a discussion, mainly between Chris, Billy
and Michael about regular expressions. My methodology for the day: start
with regular expressions, define some sort of a language, what do you get?

Most programmers probably know regular expressions from various
unix utilities in which regular expressions serve to describe searches
and matches. The impression one might get is that they are particularly
concerned with strings of characters. But this is not so. Here is my expo:

A string or word or list or vector or n-tuple or sequence consists on zero
or more discrete entities, with repetitions allowed. For example a line of
boy and girl ducklings following their mother is a sequence of boy/girl
bits. (Here: fantasy about a biological computer using 2^32 lines of 32
ducklings..) More commonly the entities are characters, numbers, words of a
language. I shall use < and > to enclose sequences, to avoid any suggestion
that we are dealing with sequences of linguistic things. Inside the < and >
I write the names or other referring expressions which denote the items, so
that <London Paris Rome> is a sequence of cities and not a sequence of
names. Sequences can be concatenated, and I shall use an infix underscore
for that. So, <a b> _ <c d e> = <a b c d e >. There is also the empty
sequence <> which is the left and right unit element of concatenation.

A language is a set of sequences (and the word ³language² is particularly
misleading here, but everybody uses it). Since languages are sets, one can
form their union. Use an infix vertical bar | for union. Thus if L and M are
languages, the L|M is their union. Since the members of languages are
sequences and hence concatenable, one can define an operation of
concatenation for languages, again using infix underscore _. Thus if L and M
are languages, then L_M is the set of all those sequences that can be formed
by taking a member from L and one from M and concatenating them in that
order. There are two very special languages. One is {}, the empty language
that hhas no members at all. The other is {<>}, the language whose sole
member is the empty sequence. {} is the unit element for language union, and
the (left and right) zero element for language concatenation. {<>} is the
(left and right) unit element for language concatenation.

A language can be concatenated with itself any number of times. Write the
number as an exponent: so L^2 = L_L, L^3 = L_L_L. Call these the powers of
the language. Normal exponent arithmetic applies: (L^m)_(L^n) = L^(m+n). L^1
= L, L^0 = {<>}. (Q: negative exponents? fractional exponents?). Three
postfix operators: L+ = the union of all positive (1,2..) powers, L* = the
union of all (0,1,2..) powers, L? = the union of {<>} with L.

The two binary infix operators _ and | and the three unary postfix operators
can be used to build arbitrarily complex expressions, using ordinary
parentheses
( and ) to disambiguate for the infix operators. But the operators need some
basic operands, which have to be languages simple languages from which
all else is to be constructed.

In regular expressions a basic operand is a language that is a set of
just one sequence which in turn contains just one basic element. So, using
this overly explicit notation, we should write

({<a>} | {<b>}) _ ({<c>} | {<d>})

But for good reasons the concatenation operator is generally not written at
all, and the operand languages are also given a useful abbreviation which
looks obvious but is in fact quite involved:

(a | b) (c | d)

The two notations denote the same language: {<ac> <bc> <ad> <bd> },
or if the letters are actually characters, the language {³ac² ³ad² ³bc²
³bd²}.

The three postfix operators do not need any special treatment.

A language which is the extension of a regular expression is called a
regular language. Definitions of regular expressions can be allowed, and if
they do not use recursion then they do not extend the class of regular
languages.

What we have here is that concatenation of regular expressions denote
concatenations of sets of sequences, of languages. But that is not all that
interesting. However, the sequences themselves could be endowed with
meaning, which could be commands as in a procedural language, or they could
be unary functions as in a concatenative language. And what are we to make
of the union operator | that does not occur explicitly or implicitly in
procedural or concatenative languages?

The answer must be that we should not assume that a program is a sequence as
described above, but that it is a set of sequences. To use a really awful
and confusing pun: program must be a language as described above. And the
programming language must be a set of languages.

That is not as silly as it sounds. In procedural and in concatenative
languages the atomic programs (factors in my terminology) denote changes to
complex entities, a von Neumann store or a stack. Changes are just unary
functions from such thing to another. Unary functions are many-one relations
binary relations which can hold between several things and one other, what
we call the value of the function. Concatenations of programs map onto
composition of many-one relations. What else can we do with relations? Given
any two relations of the same arity we can form their disjunction, or
technically their union. So given two binary many-one relations we can form
their disjunction, which will generally be a many-many binary relation.

What then might a programming language look like if its basic programs
denote not
unary functions from stacks to stacks but (many-many) binary relations
between
stack? Or instead of stacks some other entity? It is really quite obvious:
the program

(2 | 3) *

denotes the relation that holds between a stack whose top element is an
integer, and another stack whose top element is instead twice or thrice that
integer. An implementation would first put 2 on top of the stack and then
continue, and when told to backtrack would put 3 on top of the stack and
then continue. There may be various ways in which the backtracking could be
controlled. Prolog does this sort of thing very well. It should be easy to
see what

(2 | 3) (+ | *)

does it gives four results. Even the three unary postfix operators from
regular expressions find a place here, and note the potential ambiguity of
the star * symbol:

1 (2 * )*

produces all the powers (0,1,2,3..) of 2, which are 1,2,4 8,16...
Of course, many choices have to be made between in the implementation,
and one of these concerns backtracking: depth first is easy to do but
can be tricky for the user.

What does all this amount to: First there is the syntax, the expressions
which denote languages. In turn the languages denote binary relations.
A double homomorphism, I very tentatively propose.

A programming language in which binary alternation or union (and the three
unary operators) are available can be grafted onto a concatenative language
by using various exotic combinators. I have done some experiments with
my Joy-in-Prolog. But this is only an awkward grafting.

A programming language in which concatenation and alternation are really on
a par would be very different from what I am used to. I have to say this
even though I am quite comfortable now with Prolog, in which the two
operations are indeed on a par. However Prolog of course uses (so-called
logical) variables that serve as parameters, which are precisely what the
stack languages aim to eliminate. So, I would find it easy to implement a
relational stack language in Prolog, but I cannot yet see any uses. Possibly
this is just lack of familiarity.

Any ideas?

- Manfred

[Non-text portions of this message have been removed]

Is this language concatenative?

Michael Nedzelsky — 2007-04-29 15:45:56

Christopher Diggins — 2007-04-29 16:37:09

William Tanksley, Jr — 2007-04-29 18:17:33

Michael Nedzelsky — 2007-04-30 10:11:37

William Tanksley, Jr — 2007-04-30 14:19:37

Michael Nedzelsky — 2007-04-30 17:28:30

William Tanksley, Jr — 2007-04-30 20:32:30

John Carter — 2007-05-01 04:41:43

Michael Nedzelsky — 2007-05-01 07:19:42

William Tanksley, Jr — 2007-05-01 15:00:42

Michael Nedzelsky — 2007-05-01 16:25:20

William Tanksley, Jr — 2007-05-01 20:34:55

Michael Nedzelsky — 2007-05-01 21:43:31

John Cowan — 2007-05-01 22:12:28

William Tanksley, Jr — 2007-05-01 22:31:41

Michael Nedzelsky — 2007-05-02 10:53:02

John Cowan — 2007-05-02 15:22:46

Michael Nedzelsky — 2007-05-02 15:56:04

Manfred Von Thun — 2007-05-04 07:45:14

Manfred Von Thun — 2007-05-07 03:52:48

John Cowan — 2007-05-07 05:14:54

Manfred Von Thun — 2007-05-07 10:27:32