Factor's Parser
Factor's parser is a recursive decent parser that can be programmed by the user via parsing words. Parsing words are functions executed the instant they are read. This is the core mechanism allow for Factor's extensiability.
This page is focused on tracing the process of parsing Factor source code.
The parsing step begins at parse-string
which eventually calls parse-until
. This word will keep reading until it reaches the end token supplied to it. In this case, we parse until we rad f
from the parser. f
is used as the EOF value.
parse-until
invokes the main parsing loop (parse-until)
as followed:
100 <vector> swap (parse-until)
The core to (parse-until)
is parse-until-step
. This word is responsible for reading datums from the input stream and validating. The first step to this is calling ?scan-datum
.
?scan-datum
handles conditional word look-up and number. It can leave one of 3 values on the stack: a word, a number, or f
if the token stream is empty. A token is turned into a datum using parse-datum
. parse-datum
attempts to look up a token in the current manifest/dictionary returning the corresponding word. If that fails, it attempts to parse the string into a number. Finally, if that also fails, then a no-word
error is raised.
Once ?scan-datum
is completed, control is returned to parse-until-step
. parse-until-step
checks the following conditions:
: parse-until-step ( accum end -- accum ? ) ?scan-datum { { [ 2dup eq? ] [ 2drop f ] } { [ dup not ] [ drop throw-unexpected-eof t ] } { [ dup delimiter? ] [ unexpected t ] } { [ dup parsing-word? ] [ nip execute-parsing t ] } [ pick push drop t ] } cond ;
- We've reached the
end
delimiter - We've run out of tokens before reaching the
end
delimiter - We've hit a delimiter, but it's the wrong one
- We've encountered another parsing word
- We've parsed a datum that can be pushed to the compiled word accumulator.
Condition 1, is our base case which ends the recursive loop. Meanwhile, conditions 2 and 3 are error cases. These cases reject a malformed program. Next, condition 4, immediately executes the nested parsing word. This allows for embedding parsing words in literals, definitions, and other parsing words. Finally, condition 5 continues the loop adding the datum we've found to the compiled code accumulator.