Joy, Genetic Programming, and Handwriting Recognition
cpcogan — 2007-01-08 02:15:55
[This is rather long. You may want to just skim it first time through,
to see if it's even something you want to bother with.]
Imagine a version of Joy in which all the terms used would never cause
an executing program to crash. Instead, they would leave an error code
somewhere, so subsequently-executed code could reference it if it
wanted to. Now, imagine that we generate strings of valid Joy code,
and try executing them with given input and with a place to put their
output. Then imagine that we build a mechanism that generates strings
of Joy code, executes them, and checks the actual output against
test-output. If there is a match, the string of joy is retained. If
not, not.
The strings of Joy are built in batches of, say fifty at a time. They
are all tried on (let us suppose) a hundred examples of input and
hoped-for output. For each generation, the top few are saved and used
as the "baseline" for the next generation of fifty strings of Joy
code, each of which is a variation on one or more of the saved strings
(at some point we can do recombination of bits of Joy, and thus have
"The Sex of Joy" !!! ;-) )
This process is repeated until we get at least one string of Joy code
that does a very good job of producing the right outputs for all of
the inputs. Then, we give it a test on input data it hasn't seen, so
as to exclude cases where the evolver evolves code that is effectively
hard-coded for the particular input we have. When we get something
that really seems to work, we increase the difficulty of the task by
providing more-difficult input-output pairs (making the inputs more
complicated, generally). Each time we get a very good success rate on
what has been required so far; we increase the difficulty.
The reason for wanting errors not to crash the whole program is now
obvious: We don't know what code will be generated, by the evolver,
and we don't want to needlessly constrain the code to non-crashing
code (even if we knew how to do this mechanically while still doing
anything interesting).
My own desired application would be handwriting recognition. We start
with input-output pairs that consist of bitmaps of single-stroke
symbols (backslashes, slashes, hyphens, and maybe parentheses, etc.).
For each bitmap we have an output that the generated code is to
produce if it works.
We set up the system as described above, with a bunch of input-output
pairs, and let it run (overnight, a week -- it doesn't much matter, if
we are letting the computer do all the work).
Eventually, we would, if this method works, produce a program that
could take any amount of handwritten text that's sufficiently similar
to that used during development and process it with about as much
accuracy as we can expect from software that doesn't know a lot about
the meaning or context in which the material was written.
Why am I interested in handwriting recognition? Well, I'm not, really,
which is why I would use this approach instead of a more nearly
conventional approach (the conventional approaches don't seem too
successful anyway). By using an evolver to evolve handwriting
recognition, I don't have to know what I'm doing. More accurately, I
don't have to know what the code the evolver produces is doing. I
don't have to know about edge-detection, curve recognition, the shapes
of letters, whether they are connected to each other (one recognizer I
tried only works if you feed it handwriting in which each letter is
disconnected from the previous and the next letters -- but my
handwriting is more or less normal script, with each letter in a word
connected to at least one other letter, so the program was a failure).
With an evolver, I can even let it evolve its own general approach. I
don't have to decide whether to try some sort of Bayesian analysis or
God knows what. I just give it what it has to do and let it find its
own way of doing things. This may mean that re-starting from scratch
and using the current time for the randomization seed could cause it
to ultimately produce wildly different solutions at the end, and that
would be very interesting but, if my goal is handwriting recognition,
I don't really care about that. As a programmer, and a student of
computer science, I might find it fascinating if one "from-scratch"
run produced a solution that was truly different in fundamental
approach from another "from-scratch" run, but the immediate goal is to
produce a good handwriting recognizer (and only for my own handwriting
at that).
But, I have about 200 POUNDS of handwritten material (and many
thousands of pages more already in a computer-editable form) that I've
written over the past forty years, and I'd like to convert some
significant portion of it to editable text, so that at least a small
portion of it could possibly be published. (I joke that I have
"Asimov's Disease" but without all that annoying fame and monetary
success.).
Worse, there don't seem to be any commercial or freeware or even
shareware programs that do handwriting recognition on text that is
already written (i.e., scanned, not written directly into a device
such as a PDA). Much of the existing software depends on processing
the strokes as they are entered, so it knows whether one stroke or
part of a symbol was done before another part. I came across a paper
describing an experiment in recognizing more or less unconstrained
scanned handwritten text, but it was an old paper and I got no
response to my one e-mail, and it is likely that the program would not
be available for others to use anyway.
Further, I have come across experiments in very limited genetic
programming, where genetic programming was used in conjunction with
pre-determined methods based on a pre-determined theory about how to
do handwriting recognition. I suspect that this is not the best way to
do it. Instead, I'm hoping, it would be better to let the evolver
itself eventually produce its own methods that might or might not end
up being roughly equivalent to some method that people have tried
already. What if the evolver would develop, for example, some
"Gestalt" method of recognizing handwriting, similar to the way people
do it? Should we prevent it from finding such a method by imposing on
the evolver a prior fixed idea about the type of method we want it to
develop?
Why Joy? Why not a conventional language? Why not Prolog?
Conventional languages are too complex, though generating pre-defined
bits of code is easier than recognizing arbitrary code correctly, it
still makes for needless complexity, because we have to make sure that
all the language constructs are well-formed (even if they extend over
pages of code). Also, we don't normally have the ability to make sure
that no errors in coding will crash the program, as we do if we have
our own Joy processor. We can write a collection of functions for a
conventional language that automatically ignore errors, of course, but
that's almost what we do when we write a Joy processor anyway.
Prolog is another prospect, because it has a simple syntax, and
because the Prolog "engine" automatically searches the solution space
for whatever code it has, and we can fairly easily make it pretty much
immune to crash-causing errors. For years, whenever I considered doing
genetic programming, I thought I would do it in Prolog, for just these
reasons. Also, full Prologs allow the executing code itself to
generate new rules (the inability to do this cleanly was a problem for
some early compiled Prologs, like the one Borland sold), so a generate
and test "engine" (which is what a genetic programming evolver is,
basically, with the refinement of producing each new trial as a
variation on an existing one, and with the ability to do things in
batches per cycle (or "generation")).
But I still think Joy is a better choice, because it is even easier
than Prolog to generate, and has most of Prolog's other advantages
(including, in my implementation, the ability to define new atoms at
run-time). Prolog's one main "trick" is its automatic ability to
search the entire "tree" of the solution space if you let it (more on
this below). However, I see no reason why this cannot be implemented
in Joy. Even better, if it is implemented in Joy, the code that it
executes can contain normal Joy code. Let me explain, for those who
may happen not to know about Prolog.
Prolog works from a set of "rules" and "facts." A rule has the form:
name(parameters) conditions;
The conditions are specified as a series of Boolean expressions
separated by commas, like this:
name(parameters) condition1, condition2, condition3, ... conditionN;
In normal execution, the conditions are processed sequentially until
they are all used up or one of them returns false. This means that the
string of conditions is essentially a lazy-execution string of
conditions ANDed together. Thus, we might have a rule like this:
dog(animalname) hasfourlegs(animalname), barks(animalname)
(Somebody correct me if I've gotten this wrong; it's been a while, and
as with Joy, I never actually wrote more than the most trivial of
programs in Prolog).
This rule would normally mean, roughly, "A thing is a dog if it has
four legs and if it barks."
Each rule is thus a Boolean expression in its own right, so when it is
used as a condition in another rule, it applies its own defining
conditions as criteria, and returns whether all of the conditions were
met or not. Believe it or not, this (and a little bit more) allows
Prolog to do almost any kind of logic programming (such as theorem
proving) you might want.
Now, for the "magical" part of Prolog. If a rule fails partway through
the sub-conditions, execution backs up to the last successful
sub-condition, and then, if any more rules for the failed predicate
are available that match the parameters for this reference, the first
of them will automatically be tried (and, if it fails, the next one
will be tried). Prolog will keep on working this way until it finds a
completely successful series of rules or it runs out of rules to try.
That is, given a database with a whole bunch of rules for each
predicate, and given an initial question (entered in the format of a
rule without the conditions, like "dog(fido)?" -- meaning "is fido a
dog?"), it will keep trying possibilities until the hardware gives out
or a crash occurs or it simply runs out of possibilities to try.
Of course, real Prologs have ways to constrain searches, and to
prematurely terminate a search for a solution if some condition occurs
(such as something that indicates that no solution is possible), and
they also have ways to do things like user-input and user-output, and
ways to make predicates that will only be executed one time when they
are encountered so the user isn't inflicted with millions of outputs
where one will do, etc.).
But the real key to Prolog is the automatic tree-searching. The
programmer only needs to define the predicates that define the domain
he is working with and what he wants done. If all the needed rules
have been specified, the Prolog engine will do the rest (though not
necessarily in the most efficient way, of course, because it is
searching mechanically, not intelligently).
The thing is, this mechanism is not (in its bare forms) terribly
complicated, so it's reasonable to implement it in a language like
Joy, even as a primitive. Doing so would allow Prolog processing to be
freely intermingled with regular programming, thus giving us the power
of the Prolog engine without the limitations of having to make
everything fit into the Prolog way of doing things. The predicates
would be in the form of tables formed by lists of lists, one such
table for each predicate. Each predicate would go into the dictionary
as a Joy atom, with internal code for the appropriate processing of
its rules.
The entire Prolog-like engine would just be a combinatory that would
be passed a list of lists and that would return a Boolean value. Code
that doesn't return a Boolean to be tested could be inserted at any
point, as long as executing it at that point would be appropriate and
as long as it didn't leave disruptive values on the stack.
We wouldn't need to add all the tricks real Prolog uses, because we
wouldn't have to fit everything into the Prolog way of doing things.
Instead, we would be intersperse chunks of Prolog-like activity into
otherwise normal Joy code.
Back to the Evolution Question
The evolver would only have to be made smart enough to make sure that
all lists were properly terminated at some point (so we don't reach
the end of a program with lists still un-terminated and with no code
actually to execute following the placement of the list on the stack.
Unterminated lists (and unterminated strings and function definitions,
as well) might not prevent our evolver from evolving solutions to our
problems, but it would tend to slow it down.
Back to handwriting recognition. The reason for giving the evolver
simple cases initially is that there is a better chance of it being
able to find a solution quickly for those cases (such as recognizing a
forward slash). If we give it large blocks of handwriting to
recognize, it would be very difficult to know if it has found a
partial solution because the partial solution might not look very much
like the desired output (for example, it might recognize all the
letters in the input but output them all in the wrong order, thus
producing garbage that doesn't look at all like what we are after).
Using a bunch of simple cases makes it possible to do meaningful
fitness testing automatically (if the output is not the desired
character, we can be fairly sure the code is still doing something
fairly far off the mark (though we MIGHT want to test for approximate
matches, such as dashes for hyphens, brackets for parentheses, etc.)).
A Visit to "Handwriting World"
A variant possibility is this: We would construct a "handwriting
world" in which there would be "ecological niches" for a range of
possible species of Joy functions, all of them related (somehow) to
handwriting recognition. The range that I have in mind is not just
many of the possible simple-case solutions, but also a range of
complexity, so that, over time, functions would evolve that would be
able to "eat" larger and larger chunks of handwriting and produce the
correct output. I say "ideally" for a reason. Constructing such a
world would not be particularly easy, although the more I think about
it, the more nearly feasible it seems. Disk, memory, and processing
constraints could all be imposed as well as fitness for handwriting
recognition (large functions that took a long time and that occupied a
lot of disk space would be killed off before smaller ones that were
fast and that used very little disk space, if they both produced the
same results. These constraints would only become part of the fitness
criteria when necessary, so as to allow as much opportunity as
possible for handwriting success to be the main criterion.
Multiple bits of successful code could join together in a symbiotic
way to produce longer chunks. Thus, if a bit of code was good at
recognizing the word "the" in the "handwriting world," and another was
good at recognizing "cat," they might join together to recognize "the
cat," when it was encountered (obviously, if "the cat" did not occur
often in the "world," and if memory or disk was precious, this
combination might be killed off in favor of bits of code that were
more general in their abilities). Helper functions could evolve, as
well. These would be functions that might do such things as marking
the parts of the "world" that were just white-space, so as to reduce
the amount of work for other functions.
Various things could be tweaked, such as the degree of variation
introduced in each generation, the specific Joy constructs used, the
rigor of the external constraints, and different methods of judging
partial success at producing correct output.
This approach would have the main advantages of the earlier,
more-incremental approach, without some of its disadvantages. One
disadvantage of that approach is that the fact that the input-output
pairs are simple might bias the evolver to overly-specific solutions
that would later be a liability (a simple recognizer for a single
straight stroke of handwriting might evolve well into a
more-nearly-global recognition method (i.e., the "Gestalt" approach
mentioned earlier), for example). Having a "world" filled with
handwriting would allow all sorts of things to be tried that might
work in such a world but not in an incremental approach.
Of course, the evolution process itself would still be incremental,
but, because there would be this "world" of handwriting for the
"organisms" to live in, they could do their own incrementing as far as
what they do is concerned. Since fitness would be judge largely on the
basis of the amount of handwriting correctly recognized, there would
be constant evolutionary "pressure" to be able to "eat" larger and
larger amounts of handwriting compatible with correct output. The ones
that only do a little, or that don't actually "eat" or produce any
output text would be given fitness ratings based partly on whether
they were used by ones that do produce correct output successfully
(after all, we wouldn't want to cripple a major chunk of successful
by removing the separate functions that it depends on, would we?).
I THINK all of this could be handled mechanically, once a large enough
"world" of handwriting was scanned in, and once a mechanism was
devised for being able to compare the input to a function with its
output (this could get tricky, because we'd have to know where in the
"world" the input came from in order to know whether or not the output
was correct for whatever was at that location, which could involve a
lot of tedious work by actual humans (such as, ugh, myself).
Maybe the incremental approach would be better after all. It would
certainly simplify things for the developer of the evolver.
At any rate, I don't see any insurmountable barriers to making such an
evolver for evolving a handwriting recognizer in Joy.
Or, quite possibly, for other things as well, such as image
recognition, electronic circuit development (as long as we have a way
of adequately testing each proposed design, either in actual hardware
or by very good circuit simulation in software), process design
(again, in cases where we have some easy means of testing design
fitness). [Note: I read of a case where an evolver was to evolve an
oscillator, and it did so by devising circuits that picked up an
oscillation from some other device and feeding it to the output,
showing that evolvers may solve the problem you give them without
solving the problem you actually have in mind.]
I hope I live long enough to try some of the things I've talked about.
Especially some of the handwriting recognizer ideas. I'd like to see
comments, if you have any.
Also, if anyone knows of available handwriting software that will
really work on scanned handwriting and that doesn't cost more than a
mansion in New York City, please let me know. If good handwriting
software actually exists and doesn't cost too much, I'd still like to
do some of the things I talked about, but more as research, just
because it would be so wonderful to evolve something like that from
the ground up (and then try to figure out how it does what it does).
Now, if I could only evolve a good artificial intelligence that would
automatically convert and edit my bazillions of drafts into
publishable documents . . . .