Factor's Parser

Factor's parser is a recursive decent parser that can be programmed by the user via parsing words. Parsing words are functions executed the instant they are read. This is the core mechanism allow for Factor's extensiability.

This page is focused on tracing the process of parsing Factor source code.

The parsing step begins at parse-string which eventually calls parse-until. This word will keep reading until it reaches the end token supplied to it. In this case, we parse until we rad f from the parser. f is used as the EOF value.

parse-until invokes the main parsing loop (parse-until) as followed:
100 <vector> swap (parse-until)

The core to (parse-until) is parse-until-step. This word is responsible for reading datums from the input stream and validating. The first step to this is calling ?scan-datum.

?scan-datum handles conditional word look-up and number. It can leave one of 3 values on the stack: a word, a number, or f if the token stream is empty. A token is turned into a datum using parse-datum. parse-datum attempts to look up a token in the current manifest/dictionary returning the corresponding word. If that fails, it attempts to parse the string into a number. Finally, if that also fails, then a no-word error is raised.

Once ?scan-datum is completed, control is returned to parse-until-step. parse-until-step checks the following conditions:

: parse-until-step ( accum end -- accum ? )
    ?scan-datum {
        { [ 2dup eq? ] [ 2drop f ] }
        { [ dup not ] [ drop throw-unexpected-eof t ] }
        { [ dup delimiter? ] [ unexpected t ] }
        { [ dup parsing-word? ] [ nip execute-parsing t ] }
        [ pick push drop t ]
    } cond ;
    
  1. We've reached the end delimiter
  2. We've run out of tokens before reaching the end delimiter
  3. We've hit a delimiter, but it's the wrong one
  4. We've encountered another parsing word
  5. We've parsed a datum that can be pushed to the compiled word accumulator.

Condition 1, is our base case which ends the recursive loop. Meanwhile, conditions 2 and 3 are error cases. These cases reject a malformed program. Next, condition 4, immediately executes the nested parsing word. This allows for embedding parsing words in literals, definitions, and other parsing words. Finally, condition 5 continues the loop adding the datum we've found to the compiled code accumulator.